SharePoint is a core foundational element in Microsoft 365. It is primarily used for the storage of digital objects (including pages) in document libraries and rows and columns of data in lists. It is ubiquitous and almost impossible to remove from a Microsoft 365 licence because it ‘powers’ so many different things.
While the idea that anyone can easily create a SharePoint site seems a good idea in some ways, from a recordkeeping of view this starts to look like network file shares all over again.
Microsoft’s response to the default ‘free for all’ ability to create SharePoint sites is to use the so-called ‘records management’ functionality (via the more expensive E5 licence) to auto-classify content and auto-apply retention labels. The problem is that those (more expensive options) provide limited functionality, including inadequate metadata details to make decisions on disposal, and similarly inadequate metadata (for records subject to disposition review labels only) as ‘proof of disposition’.
So, records managers are more often than not left with a network file share-like sprawl of uncontrolled content.
Unfortunately, the ability to create a new SharePoint site is fairly easy, almost as easy as creating a folder on a … network file share.
The following is a list of the main ways a person can create a SharePoint site. Have I missed any?
This option also allows the administrator to provision new SharePoint sites.
2. Via the SharePoint Admin portal (+ Create)
This option allows the creation of three main types of sites: modern team sites (Team site), communication sites, and non-Microsoft 365 Group-linked sites (Other options).
3. By creating a Microsoft 365 Group
Microsoft 365 Groups are created in the Microsoft 365 Admin portal, in the Groups section, Add a group > Microsoft 365. This is also where Security Groups and Distribution Lists (both collectively known as ‘AD Groups’) are created.
Every new Microsoft 365 Group creates both a SharePoint site and an Exchange mailbox that is visible in the Outlook application (under ‘Groups’) of everyone who is an Owner or a Member of the Group.
The new Group creation process allows the Group email address to be created (it really should be the same as the Group name), the Group to be made public or private, and a new Team to be created.
Because the Microsoft 365 Group name becomes the SharePoint site (URL) name, it is a good idea to consider naming conventions.
4. By an end-user creating a new Team in MS Teams
Unless the creation of Microsoft 365 Groups is not restricted, an end-user can create a new SharePoint site (possibly without realising it) by creating a new Team in MS Teams. There is nothing in the creation process to indicate that (a) they will create a SharePoint site or a Microsoft 365 Group, or (b) that they will be the Owner of the Team, Group and SharePoint site – and therefore have responsibility for managing the Team/Group membership.
Every new Team creates a Microsoft 365 Group which always has a SharePoint site and an Exchange Online mailbox that is not visible in Outlook.
5. By creating a Private Channel in MS Teams
If the option is not disabled in the MS Teams admin portal under Teams > Teams Policies, end users will be able to create private channel in a Teams channel. Every private channel creates a new SharePoint site with a name that is an extension of the ‘parent’ Team site name.
For example, if the parent site name is ‘Finance’ and the private channel is named ‘Invoice chat’, the new SharePoint site will be ‘Finance-Invoicechat’. These new site is not connected with the ‘parent’ site and is not visible in the list of Active Sites from the SharePoint admin portal (and so the SharePoint Admin won’t know it exists). It is only visible in the list of Sites under the Resources section of the Microsoft 365 Admin portal.
A private channel does not create a new Microsoft 365 Group. A ‘compliance copy’ of the chats in the private channel are stored in the Exchange Online mailboxes of individual participants in the chat.
6. By the Teams Admin creating a new Team
The MS Teams admin area includes the ability for the Teams admin to go to Manage Teams, click +Add and create a new Team.
As with the end-user creation process, a new Team creates a Microsoft 365 Group that has an Exchange mailbox and a SharePoint site.
7. From the end-user SharePoint portal (+ Create site)
This process creates a Microsoft 365 Group that has a SharePoint site and an Exchange mailbox. It also creates a new Team with the same name.
It is recommended that the ability for end-users to create new sites this way is disabled, at least initially. This is done from the SharePoint admin portal under Settings > Site Creation.
8. From OneDrive for Business as a ‘shared library’
This option is relatively new. When the end-user opens their OneDrive for Business, they will see ‘Create shared library’ directly under a list of sites they have access to under a heading ‘Shared libraries’ (they are actually SharePoint sites; when you click on the site name, it (confusingly) displays the document libraries as … folders.
9. When a new Plan is created in Planner
If end-users open the Planner app, they will see ‘New Plan’ on the top left. This opens a dialogue to create a New Plan or add one to an existing Microsoft 365 Group. The process of creating a new Plan creates a new Microsoft 365 Group with a SharePoint site.
10. When a new Yammer community is created
End users with access to Yammer can click on ‘Create a Community’ from Yammer.
To quote from the Microsoft 365 documentation ‘Join and create a community in Yammer‘: ‘When a new Office 365 connected Yammer community is created, it gets a new SharePoint site, SharePoint document library, OneNote notebook, plan in Microsoft Planner, and shows up in the Global Address Book.’
Why have Microsoft allowed this?
It’s a smarter way to manage access.
Some years back, Microsoft moved away from the idea of having Security Groups that give access to individual IT resources, to having individual Microsoft 365 Groups that provide access to multiple IT resources, in this case resources across Microsoft 365. One Microsoft 365 Group controls access to a SharePoint site, an Exchange mailbox, a Team, a Plan, and a Yammer Community. Security Groups don’t have that sort of functionality.
The trade off is that you get all of these options with a Microsoft 365 Group, whether you like it or not.
But, some of the decisions don’t seem to make sense.
Why allow end-users to create a private channel in Teams when they can simply use the 1:1 chat area?
Why allow the creation of a so-called ‘Shared Library’ from OneDrive, limited to and controlled by the person who created it, when a SharePoint site provides that functionality.
Why does an end-user need an Exchange mailbox (for the Microsoft 365 Group) when they create a new site from the ‘Create site’ option in SharePoint?
And why does a new Plan create a SharePoint site? For what purpose?
Perhaps there is a reason for it. It’s just not clear.
Microsoft 365 includes a range of connectors, in three categories, that can be used to support the management of records created by other applications. The three categories are:
Search connectors, that find content created by and/or stored in a range of internal and external applications, including social media.
Archive connectors, that import and archive content created by third-party applications.
API connectors, that support business processes such as capturing email attachments.
This post how these connectors can assist with the management of records.
The recordkeeping dilemma
Finding, capturing and managing records across an ever increasing volume of digital content and content types has been one of the biggest challenges for recordkeeping since the early 2000s.
The primary method of managing digital records for most of the past 20 years has been to require digital records (mostly emails and other digital content created on file shares) to be saved to or stored in an electronic document and records management system (EDRMS). The EDRMS was established as ‘the’ recordkeeping system for the organisation.
EDRM systems were also used to manage paper records which, over the past 20 years, have mostly contained the printed version of born-digital records that remain stored in the systems where they were created or captured.
There were two fundamental flaws in the EDRMS model. The first was an expectation that end-users would be willing to save digital records to the EDRMS. The second was that the original digital record remained in place where it was created or captured, usually ignored but often the source of rich pickings for eDiscovery.
The introduction of web-based email and document storage systems, smart phones, social media and personal messaging applications from around 2005 (in addition to already existing text messaging/SMS messages) further challenged the concept of a centralised recordkeeping system; in many cases, the only option to save these records was to print and scan, screenshot and save the image, or save to PDF, none of which were particularly effective in capturing the full set of records.
The hasty introduction from early 2020 of ‘work from home’ applications such as Zoom and Microsoft Teams has been a further blow to these methods.
In place records management
To the chagrin of records managers around the world, Microsoft never made it easy to save an email from Outlook to another system. Emails stubbornly remained stored in Exchange mailboxes with no sign of integration with file shares.
And for good reason – they have a different purpose and architecture to support that purpose. It would be similar to asking when it would be possible to create and send an email in Word.
The introduction of Office 365 (later Microsoft 365) from the mid 2010s changed the paradigm from a centralised model – where records were all copied to a central location and the originals left where they were created or captured, to a de-centralised or ‘in place’ model – where records are mostly left where they were created or captured.
The decentralised model does not exclude the ability to store copies of some records (e.g., emails) in other applications (e.g., SharePoint document libraries), but these are exceptions to the general rule.
It also does not exclude the ability to import or migrate content from third-party applications where necessary for recordkeeping purposes.
Microsoft 365 connectors
Microsoft 365 includes a wide range of options to connect with both internal and external systems. Many of these connectors simplify business processes and support integration models.
Connectors may also be used to support recordkeeping requirements, in three broad categories.
Archive connectors allow organisations to import and archive data from third-party systems such as social media, instant messaging and document collaboration* platforms. Most of this data will be stored in Exchange mailboxes, where it can be subject to retention policies, eDiscovery and legal holds.
(*This option is still limited via connectors, but also see below under Search).
The social media and instant messaging data that can be archived in this way currently includes Facebook (business pages), LinkedIn company page data, Twitter, Webex Teams, Webpages, WhatsApp, Workplace from Facebook, Zoom Meetings. For the full listing, and a detailed description of what is required to connect each service, see this Microsoft description ‘Archive third-party data‘.
An important thing to keep in mind is that the data will be archived to an Exchange mailbox; this will require an account to be created for the purpose. Any data archived ot the mailbox will contribute to the overall storage quotas.
Search connectors (also known as Microsoft Graph connectors) index third-party data that then appears in Microsoft search results, including via Bing (the ‘Work’ tab), from http://www.office.com, and via SharePoint Online.
Most ECM/EDRM systems are listed, which means that organisations that continue to use those systems can allow end-users to find content from a single search point, only surfacing content that users are permitted to see.
The following is an example of what a Bing search looks like in the ‘Work’ tab (when enabled).
Note: as at 17 November 2020, Microsoft’s page ‘Overview of Microsoft Graph connectors‘ (which includes a very helpful architecture diagram) states that these are ‘currently in preview status available for tenants in Targeted release.’
There are two main types of search connector:
Microsoft built: Azure Data Lake Storage Gen2, Azure DevOps, Azure SQL, Enterprise websites, MediaWiki, Microsoft SQL, and ServiceNow.
Partner built. Includes the following on-premise and online document management/ECM/EDRM connectors – Alfresco, Alfresco Content Services, Box, Confluence, Documentum, Facebook Workplace, File Share (on prem), File System (on prem), Google Drive, IBM Connections, Lotus Notes, iManage, MicroFocus Content Manager (HPE Records Manager, HP TRIM), Objective, OneDrive, Open Text, Oracle, SharePoint (on prem), Slack, Twitter, Xerox DocuShare, Yammer
A consideration when deploying search connectors is the quality of the data that will be surfaced via searches. Duplicate content is likely to be a problem in identifying the single – or most recent – source of truth of any particular digital record, especially when the organisation has required records to be copied from one system (mailbox/file share) to another (EDRMS).
API connectors provide a way for Microsoft 365 to access and use content, including in third-party applications. To quote from the Microsoft ‘Connectors‘ web page:
‘A connector is a proxy or a wrapper around an API that allows the underlying service to talk to Microsoft Power Automate, Microsoft Power Apps, and Azure Logic Apps. It provides a way for users to connect their accounts and leverage a set of pre-built actions and triggers to build their apps and workflows.’
Actions. These are changes initiated by an end-user.
Triggers. There are two types of triggers: Polling and Push. Triggers may notify the app when a specific event occurs, resulting in an action. See the above web page for more details.
API connectors can support records management requirements in different ways (such as triggering an action when a specific event occurs) but they should not be confused with archiving or search connectors.
The connectors available in Microsoft 365 support the model of keeping records in place where they were first created or captured. They enable the ability to archive data from third-party cloud applications, search for data in those (and on-premise) applications, and triggers actions based on events.
The use of connectors should be part of an overall strategic plan for managing records across the organisation. This may include a business decision to continue using an ECM/EDRMS in addition to the content created and captured in Microsoft 365. Ideally, however, the content in the ECM/EDRMS should not be a copy of what already exists in Microsoft 365.
Ever since emails first appeared as a way to communicate more than 30 years ago they have been a problem for records management, for two main reasons.
Emails (and attachments) are created and captured in a separate (email) system, and are stored in mailboxes that are inaccessible to records managers (a bit like ‘personal’ drives).
The only way to manage them in the context of other records was/is to print and file or copy them to a separate recordkeeping system, leaving the originals in place.
Thirty-plus years of email has left a trail of mostly inaccessible digital debris. An unknown volume of records remains locked away in ‘personal’ and archived mailboxes. Often, the only way to find these records is via legal eDiscovery, but even that can be limited in terms of how back you can go.
The report noted (from page 58) three common approaches to the preservation of legacy emails:
Migration (to MBOX, EML or even XML)
In a follow up article, the Australian IDM magazine published an article in March 2020 by one of the CLIR report authors (Chris Prom). The article, titled ‘The Future of Past Email is PDF‘, suggested that PDF may be (or become) a more suitable long-term solution for preservation of legacy emails.
Preservation is one thing, what about access
There is little point in preserving important records if they cannot be accessed. The two must go together. In fact, preservation without the ability access a record is not a long different from destruction through negligence.
Assuming emails can be migrated to a long-term and accessible format, what then?
No-one (except possible well-funded archival institutions perhaps) is seriously likely to attempt to move or copy individual legacy emails to pre-defined and pre-existing containers or aggregations of other records. This would be like printing individual emails and storing them in the same paper file or box that other records on the same subject are stored.
Access to legacy emails in an digitally accessible, metadata-rich format like PDF provides a range of potential opportunities to ‘harvest’ and make use of the content, including through machine learning and artificial intelligence.
These options have been available for close to twenty years in the eDiscovery world, but to support specific legal requirements.
Search, discovery and retention/disposal tools available in the Microsoft 365 Compliance portal, along with the underlying Graph and AI tools (including SharePoint Syntex) provide the potential to manage legacy content, including emails.
The starting point is migrating all those old legacy emails to an accessible format.
When people chat in Microsoft Teams (MS Teams), a ‘compliance’ copy of the chat is saved to either personal or (Microsoft 365) Group mailboxes. This copy is subject to retention policies, and can be found and exported via Content Search.
But what happens if there is no Exchange Online mailbox? It seems the chats become inaccessible which could be an issue from a recordkeeping and compliance point of view.
This post explains what happens, and why it may not be a good idea (from a compliance and recordkeeping point of view) not to disable the Exchange Online mailbox option as part of licence provisioning.
Licences and Exchange Online mailboxes
When an end-user is allocated a licence for Microsoft 365, a decision (sometimes incorporated into a script) is made about which of the purchased licences – and apps in those licences – will be assigned to that person.
E1, E3 and E5 licences include ‘Exchange Online’ as an option under ‘Apps’. This option is checked by default (along with many of the other options), but it can be disabled (as shown below).
If the checkbox option is disabled as part of the licence assigning process (not after), the end-user won’t have an Exchange mailbox and so won’t see the Outlook option when they log on to office.com portal. (Note – If they have an on-premise mailbox, that will continue to exist, nothing changes).
Having an Exchange Online mailbox is important if end-users are using MS Teams, because the ‘compliance’ copy of 1:1 chat messages in MS Teams are stored in a hidden folder (/Conversation History/Team Chat) in the Exchange Online mailbox of every participant in the chat. If the mailbox doesn’t exist, those copies aren’t made and so aren’t accessible and may be deleted.
If end-users chat with other end-users who don’t have an Exchange mailbox as shown in the example below, the same thing happen – no compliance copy is kept. The chat remains inaccessible (unless the Global Admins take over the account).
The exchange above, between Roger Bond and Charles, includes some specific key words. As we will see below, these chats cannot be found via a Content Search.
(On a related note, if the ability to create private channels is enabled and they create a private channel and chat there, the chats are also not saved because a compliance copy of private channel chats are stored in the mailboxes of the individual participants.)
Searching for chats when no mailbox exists
As we can see above, the word ‘mosquito’ was contained in the chat messages between Roger and Charles.
Content Searches are carried out via the Compliance portal and are more or less the same as eDiscovery searches in that they are created as cases.
From the Content Search option, a new search is created by clicking on ‘+New Search’, as shown below. The word ‘mosquito’ has been added as a keyword.
We then need to determine where the search will look. In this case the search will look through all the options shown below, including all mailboxes and Teams messages.
When the search was run, the results area shows the words ‘No results found’.
Clicking on ‘Status details’ in the search results, the following information is displayed – ‘0 items’ found. The ‘5 unindexed items’ is unrelated to this search and simply indicates that there are 5 unindexed items.
Double-checking the results
To confirm the results were accurate, another search was conducted where the end-user originally did not have a mailbox, and then was assigned one.
If the end-user didn’t have a mailbox but the other recipient/s of the message did, the Content Search found one copy of the chat message in the mailbox of the other participants. Only one item was found.
When the Exchange Online option was enabled for the end-user who previously did not have a mailbox (so they were now assigned a mailbox), a copy of the chat was found in the mailbox of both participants, as shown in the details below (‘2 items’).
Summary and implications
If end users chat in the 1:1 area of MS Teams and don’t have an Exchange Online mailbox, no compliance copy of the chat will be saved, and so it will not be found via Content Search.
If any of the participants in the 1:1 chat have an Exchange Online mailbox, the chat will appear in the mailboxes of those participants.
If all participants in the 1:1 chat have an Exchange Online mailbox, the chat will be found in the mailbox of all participants.
Further to the above:
If end users can delete chats (via Teams policies) and don’t have a mailbox, no copy of the chat will exist.
If end-users with a mailbox can delete Teams chats, but a retention policy has been applied to the chats, the chats will be retained as per the retention policy (in a hidden folder).
And finally, if you allow private channels, end-users can create private channels in the Organisation Team. The chats in these private channels are usually stored in the personal mailboxes of participants (not the Group mailbox) – so these chats will also be inaccessible and cannot be found via Content Search.
The implications for the above are that, if you need to ensure that personal chat messages can be accessed (from Content Search), then the participants in the chat must have an Exchange Online mailbox.
Further, if you allow deletion of chats but need to be able to recover them for compliance purposes, a retention policy should be applied to Teams 1:1 chat.
In his April 2007 article titled ‘Useful Void: The Art of Forgetting in the Age of Ubiquitous Computing’ (Harvard University RWP07-022), Viktor Mayer-Schönberger noted that the default human behaviour for millenia was to forget. Only information that needed to be kept would be retained. He noted that the digital world had changed the default to remembering, and that the concept of forgetting needed to be re-introduced through the active deletion of digital content that does not need to be retained.
The harsh reality is that there is now so much digital information in the world, including digital content created and captured by individual organisations, that active deletion of content that does not need to be retained, seems an almost impossible task.
This post explores issues with the traditional model of records retention in the digital world, and why newer options such as the records retention capability of Microsoft 365 is a more effective way to manage the retention and disposal of records, and all other digital content.
The traditional retention model
The traditional model of managing the retention and disposal/disposition of records was based on the ability to apply a retention policy to a group or aggregation of information identified as records. For the most part, those paper records were the only copy that existed (with some allowance for working and carbon copies).
The model worked reasonably well for paper records, but started to falter when paper records became the printed versions of born-digital records, and where the original digital versions remained where they were created or captured – on network files shares, in email systems, and on backups. Although, technically, the official record was on a file, a digital version was likely to remain on network file shares or in an email mailbox after the paper version was destroyed at the end of the retention period, and remain overlooked.
How many of us have had to wade through the content of old network file shares to examine the content, determine its value, and perhaps see if it can even still be accessed? Or do the same with old backup tapes?
The volume of unmanaged digital content, not subject to any retention policy, only continued to increase. This situation continued to worsen when electronic document and records management (EDRM) systems were introduced from the late 1990s. End-users had to copy records to the EDRMS, thereby creating yet another digital copy, in addition to the born-digital originals stored in mailboxes or file shares.
Even if the record in the EDRMS were destroyed, there was a good chance the original ‘uncontrolled’ version of the digital record – along with an unknown volume of digital records that probably should have been consigned to the EDRMS but weren’t – remained in email mailboxs, on file shares, or on a backup tape somewhere.
eDiscovery was born.
The emergence of new forms of digital records, including instant messages, social media, and smart-phone based chat and other apps from the early 2000s only added to the volume of digital content, much of which was stored in third-party cloud-based and mobile-device accessible applications completely out of the reach and ability of the organisation trying to manage records.
Modern retention management
A modern approach to retention management should be based on the following principles:
Information, not just records, should only be kept for as long as it is required.
It is no longer possible to accurately and/or consistently identify and capture all records in a single recordkeeping system.
Duplication of digital content can be reduced by creating and capturing records in place, promoting ‘working out loud’, co-authoring and sharing (no more attachments and private copies).
None of the above points excludes the ability to manage certain types of records at a more granular level where this is required. But these records, or the location in which they are created or captured, should not be regarded as the only form of record.
Ideally, these records should be created (or captured) directly in the system where they are to be managed – not copied to it.
Change management is necessary
Some of these new ways of working are likely to come up against deeply ingrained behaviours, many of which go back several decades and have contributed to a reluctance to ‘forget’ and destroy old digital content, including:
hiding/hoarding content in personal drives (and personal cloud-based systems or on USB drives);
communicating by email, the content is which is inaccessible to anyone else;
attaching documents to emails;
printing and filing born-digital content; and
sometimes, scanning/digitising the printed copies of born-digital records and saving them back to a digital system.
What about destruction?
Records managers in organisations moving away from the authorised destruction of digital content identified as records, to the destruction of all digital content (including identified records) need to consider what is required to achieve this outcome, and the implications for existing process and practices (including those described above).
Some activities will remain unchanged. For example, the need to review certain types of records before they are destroyed (aka ‘disposition review’), to seek approval for that destruction, and to keep a record of what was destroyed.
Some activities are new and can replace other existing actions and activities. For example, the application of retention policies to mailboxes can remove the requirement to backup those mailboxes.
Some of activities or outcomes may be challenging. For example, the automatic destruction without review of digital content that is not the subject of more granular retention requirements, such as emails out of mailboxes, documents in personal working drives. This content will simply disappear after the retention period expires.
How Microsoft 365 can support modern retention management
Microsoft recognised some time ago that it was becoming increasingly difficult to manage the volumes and types of digital content that was being created every day by organisations.
Exisiting and newly released functionality in the Compliance portal of Microsoft 365 includes the ability to create and apply both label-based retention policies to specific types of records, including automatically based on machine learning capabilities, and broader ‘workload’ specific (e.g., mailboxes, SharePoint sites, OneDrive accounts, MS Teams chats) retention policies. This capability helps organisations to focus retention requirements on the records that need to be retained, while destroying digital content that is no longer relevant and can be forgotten.
Instead of directing end-users to identify records and copy them from one system to another (thereby creating two versions), Microsoft 365 allows end-users to create and capture records in place, providing a single source of truth that can be shared (rather than attached), be the subject of co-authoring, and protected from unauthorised changes (and even downloads).
Limitations with Microsoft 365
It is important to keep in mind that there are some limitations with the current (October 2020) retention capability in Microsoft 365.
Retention and disposal is based on individual digital objects, not aggregations. There are limited ways to group individual records by the original aggregations in which they may have been stored (e.g., document libraries in SharePoint).
Only the (minimal) details of records that were subject to a disposition review are recorded in the ‘disposed items’ listing, and this is only kept for a year (but can be exported). No record is kept of any other destroyed record, except in audit logs (for a limited period).
The metadata details of records subject to a disposition review that were destroyed is minimal – the document type and name, date destroyed, destroyed by whom.
When records are destroyed from SharePoint document libraries or lists, the library or list remains with no record kept of what was previously stored there. It is not possible to leave a ‘stub’ for a destroyed record.
The primary outcome from introducing modern ways to manage retention will be that all digital content, not just content that has been identified as records or copied to a recordkeeping system, will be subject to some form of retention and disposal management.
In other words, a change from exception-based retention (where all the other digital content is overlooked), to a more holistic method of retention with both granular controls on certain types of records where this is required, and broader retention capability allowing us to forget the content that is no longer relevant – the ‘redundant, trivial and outdated’ (ROT) content often scattered across network file shares.
At the 2020 Microsoft Ignite conference, Jeff Teper presented a diagram titled ‘Microsoft 365’. The diagram showed only four icons: Teams, Outlook, Office and Edge.
The implication of this diagram was that, for most end-users, Teams is now (or will become) their primary portal into Microsoft 365. As stated by Jeff Teper, SharePoint is a foundation platform, the out of sight content engine. Edge’s ability to serve up search results from Microsoft 365 further reduces the need to go to SharePoint.
So, what are the implications for managing records?
SharePoint as a recordkeeping system
For a long time, records have been created, captured and stored in recordkeeping systems.
In the paper world, the recordkeeping system consisted of paper records stored in files and boxes and detailed in registers. With the introduction of computers in the 1980s, registers were transferred to databases, making it a bit easier to find records. In the late 1990s, recordkeeping databases were linked with (separate) file stores and became electronic document and records management (EDRM) systems that continued to manage paper records (the so-called ‘hybrid’ systems).
For almost a decade (since SharePoint 2010 was introduced), SharePoint has contended with files shares and EDRM systems as an alternative recordkeeping system, providing almost all the same core functionality.
The ability to create a record in a single location, then share and co-author it from that location, has completely removed the requirement to copy a record to a separate recordkeeping system.
And then came Teams
Someone at Microsoft had incredible foresight to see the potential for a new user interface that would replace products like Lync and Skype for chat and conferencing, and would also provide access to files stored in SharePoint.
SharePoint has been a core part of the Microsoft productivity offerings for a very long time and people have built careers around developing functionality on the SharePoint platform to appeal to end-users, the intranet being the most common case in point, with customised team sites close behind.
The arrival of Microsoft 365 Groups and then Teams in 2017 was perhaps not widely noticed. One could argue that end by the beginning of 2020, it was still largely unnoticed.
And then came a pandemic and working from home. Teams – which may have been largely ignored or overlooked until then – was already ready to take its place next to Outlook, Office and Edge as a primary end-user interface.
New Teams were created, sometimes with abandon (and were sometimes just as quickly abandoned).
Both 1:1 (or 1:many) chats and channel chats took off. Files were created and shared via OneDrive for Business (‘Files’ in the 1:1 chat area), or via the back-end SharePoint sites (‘Files’ in the channel chat area).
There was (and maybe still is) a belief that files were being saved to Teams but not SharePoint. ‘We are storing everything in Teams’ was not an uncommon expression, sometimes followed by ‘but we’re not using SharePoint or OneDrive’.
The year 2020 saw a huge increase in the volume of records stored in SharePoint sites linked with Teams, as well as a completely new set of records – chats (‘compliance’ copies of which are stored in Exchange mailboxes).
The diagram below provides an overview of the relationship between Teams, Microsoft 365 Groups, Exchange mailboxes, SharePoint and OneDrive for Business.
What about SharePoint?
As the diagram above shows, SharePoint has not disappeared. Many organisations will continue to use, and ask end-users to access, SharePoint sites directly to store and manage records.
But accessing SharePoint from SharePoint may become less necessary over time. At Ignite 2020, the ability to pin a ‘home site’ (such as an intranet) to Teams was demonstrated. Even the intranet may end up in Teams.
As Jeff Teper said, SharePoint is a foundation platform, one that does not get in the way of collaboration and productivity but powers it.
Implications for records managers
Records managers, who were likely already on a steep learning curve regarding SharePoint, need to continue to improve their knowledge of the SharePoint platform. On a positive note, SharePoint Online is a much easier application to learn and manage, compared with its earlier on-premise predecessors.
In organisations that have been using SharePoint for a while and/or have allowed the free-creation of Teams in MS Teams, there will some requirement for retrospective analysis, review, and cleaning up.
In all organisations, there will be a requirement to establish some form of governance and oversight of records (files and chats) that have been created, including for the purpose of retention and disposal/disposition.
Where MS Teams has been implemented with little thought given to naming conventions, SharePoint site provisioning, or access controls, records managers should been given access to and review the list of all SharePoint sites that have been created, including from MS Teams. This will provide an initial idea of the volume of content and activity on each site, and what action needs to be taken on things like inactive Teams.
Ideally, records managers should be added to the Site Collection Administrators (SCA) group of every SharePoint site, including MS Teams-based sites. This action will give records managers access to the content on every site and to help advise on the management of records in those sites (including Team-based sites).
The best way to do this is to add records managers to a Security Group and then add that Group to the SCA group of every site. This access could be deferred for sites that contain very sensitive information, although typically records managers would have access to all records, including if they had an EDRMS. And, access is always recorded in audit logs or the local site ‘viewers’ (where enabled) and ‘last modified by’ information.
Access to the chat content of Teams (including 1:1 chats) will not normally be required; some understanding of the content could be inferred from the name of the Team or the SharePoint content. If necessary, Global Admins or a Compliance Admin can run a Content Search across Teams to find chat content, and/or export that content by an individual person or subject.
Records managers will also need to advise on the appropriate retention policy or policies that need to be created and then applied to:
The chat content in 1:1 chats.
The chat content in the various Teams.
SharePoint sites linked with Teams.
OneDrive for Business accounts. An additional consideration is how long the content of inactive ODfB acccounts should be retained via the ‘Storage’ policy (default is 30 days then permanent deletion).
SharePoint sites not linked with MS Teams. This includes whole sites as well as library-based retention policies.
Office 365 Groups (mailbox/SharePoint site). If linked with a Team, a second retention policy is required for the Team chat content retention (second dot point above). For example, one policy ‘GroupABC’ and a second policy ‘GroupABCTeamChat’.
As many of the above retention policies replace the need for backups, records managers need to discuss the options with their IT colleagues.
Forward looking implications
Ideally, there should be some form of governance around the creation of new Teams in MS Teams. These governance arrangements might include:
The necessary access for records managers. For example, Site Collection Administrator on every site, and/or a customised Compliance Admin role to create and access retention policies.
Controls around the creation of new Teams, including naming conventions. If not controlled, what processes will ensure that records are properly managed.
Retention implications. For example, can the new site and/or the channel chat content be covered by another retention policy – e.g., ‘All Teams with assessed low-level working content should be kept for 5 years’.
Simple best practice guidance for all new users, including on how to share and co-author.
Retention policies for all Microsoft 365 content, not just SharePoint.
Reviews of the content of OneDrive for Business accounts of departed end-users, especially for people in senior or decision making positions. It is relatively common practice for end-users to delete (and download) this content before they leave their jobs.
Monitoring and oversight of content, including access to reporting dashboards.
So, is Microsoft 365 just Teams, Outlook and Office (in Edge)?
For many, or not most information based end-users, MS Teams is likely to become the primary interface to Microsoft 365 collaboration team spaces including SharePoint and OneDrive. Just like Outlook, Teams will probably be left open all day.
In theory, the volume of low-value emails, and emails with attachments, should reduce over time.
The developing role of records managers
In this new world, the role of records managers will change from being the curators of records copied to and stored in a separate ‘records and document management’ system, to being records compliance analysts or perhaps, corporate knowledge and information managers and content analysts.
They will learn what the Graph can do, and help to guide AI tools including machine learning and machine teaching, Project Cortex and SharePoint Syntex. They will be responsible for monitoring content across the Microsoft 365 platform, creating and applying retention policies and managing the outcome of those policies, working more interactively with the Graph, and with a range of data.
In organisations that have a requirement to transfer records to archival institutions, the new knowledge and information managers will have a key role in ensuring that this data is suitable for transfer.
They might even have oversight of old paper records gathering dust until they can be destroyed.
Two types of retention policy can be created in Microsoft 365:
Label-based retention policies, where the label is used to define the retention and retention outcomes. Labels must be published in a retention policy, a process that includes determining where the labels will be applied and appear (‘explicit’) to end users.
Non-label-based retention policies, where the policy includes the retention details and the outcomes. As part of the policy creation, these policies are then applied to specific Microsoft 365 workloads where they are mostly invisible to end-users (except in Exchange mailboxes). In SharePoint and OneDrive for Business, these policies create a Preservation Hold library that is only visible to Site Collection Admins and above.
It is possible to apply both a label-based retention policy and a non-label retention policy to the same SharePoint site. In theory, this would allow for (a) everything on the site to be covered by an overarching retention policy and (b) specific libraries or lists to be covered by a label-based policy.
In practice, it gets a little complicated, as described in this post.
Creating the two labels
For the purpose of this post, I will apply the two types of policy to a SharePoint site (‘FinanceAP’) that contains specific types of financial information that needs to be kept for 7 years, but I want to allow other content on the site to be destroyed after 5 years.
Retention labels are created in the Information Governance section of the Compliance admin portal in Microsoft 365. I created a label titled ‘Financial records’ with a retention period of 7 years. I then published that label to a retention policy named ‘Financial Records – 7 years’ and applied it only to the FinanceAP site.
More than one label can be published in the same policy, making this a useful option if your SharePoint architecture ‘maps to your file plan or Business Classification Scheme (BCS) and your records retention classes are based on either. It also allows you to create and add the same retention class for types of records that occur in multiple functions where the classes have the same retention – for example, ‘Meetings – 7 years’ or ‘Policy – 10 years’.
Once the policy has been published to a site or sites, the option (in Library Settings) to ‘Apply label to items in this list or library’ can be used to choose which label will apply to the content in the library, as shown below.
If the column ‘Retention label’ is checked, the retention label name appears in that column.
Non-label retention policy
Non-label retention policies are also created in the Information Governance section of the Compliance admin portal which also (a little confusingly) lists all the label-based policies as well.
The process of creating these policies includes the retention (e.g, 5 years) and retention outcome (delete) definitions, as well as the location where the policy will be applied.
For the purpose of this post I created a retention label named ‘Financial Working Records – 5 years’ and applied it to the same site (only) as the label-based policy.
I should expect now to find a Preservation Hold library (via Site Contents as a SharePoint admin) when something is deleted.
At this point, I have two retention policies, (a) one label-based and applied to the site, and (b) one that applies to the whole site.
What happens now?
In the document library where the label-based policy has been selected, I can see that the retention label (Financial Records) that has been applied to items in this library.
This means that I cannot delete this document unless (as an end-user with edit rights or admins) the retention label is removed. However, as we will see below, another policy is working behind the scenes.
In a document library where no label-based policy has been applied, I can see that no label appears under the Retention label policy. From an end-user point of view, it appears that the record can be deleted – or is it?
As this site is the subject of an ‘implicit’ or invisible retention policy that has been applied to the entire site, any attempt to delete anything will be captured by the back-end Preservation Hold library seen below via Site Contents (visible to Admins only).
Interestingly, any attempt to delete a document from a library where a label-based retention policy has been applied, which is ‘denied’ in the actual library, is recorded in the Preservation Hold library, although the document remains in the original library.
If anyone with access to the Preservation Hold library tries to delete that item there, they will receive this message:
The only way to remove this item is to remove the policy.
(Note – the image above is a small ticket dated 1956 from my grandmother’s visit to Denmark. I used this because the word ‘Kontrolbillet’ seemed appropriate for this post.)
In response to several queries about this following my previous post about whether it is possible to manage records as data, it seemed apparent that the data-based nature of contemporary modern digital content formats, especially Office documents, is not well known.
This post provides details of the data structure content of a typical Word document, to help explain why such records could be seen (and managed) as self-contained data sets.
Just to be clear, the idea of managing records as data does not remove the need or business requirement to store and manage records in ‘local’ aggregations or context – a SharePoint document library or a mailbox for example (less so a network file share because of the limited metadata, but still possible). These aggregations will generally map to business activities, can have specific metadata requirements and can be used to control access to and retention of records as long as they need to be managed.
Managing records as data is a more holistic data analytics concept that allows organisations to better understand and analyse records amidst the volume of all other digital content. It should should help to ensure that all records on a given subject or context are managed appropriately through time, and that, wherever possible, only one copy exists.
A document in a SharePoint Online library
For this example, a document is stored in a SharePoint Online document library called ‘Client Agreements’. The library has a set of metadata columns that must be added to every record. The library uses document sets but it could equally use metadata or folders, the important point is that metadata is added to the library.
The metadata added to the library can be anything, including terms from a business classification scheme. The metadata can be mandatory or optional, and can be set as default options – for example, you may want every document in a library to automatically have the same function and activity terms.
In the screenshot below, we can see the document library with two document sets (a type of folder). The library has four added metadata options: Client Name, Client Reference, ClientRef, and Date of Birth (not visible in the screenshot but we’ll see it later).
The metadata properties
Here are the metadata columns for the library. As we will see below in the actual data, metadata columns with a space between words results in additional characters (‘_0020_’) replacing the space.
When I open the Harpin ‘folder’, I can see the metadata columns next to a document. In this case they were added to the document automatically as the documents inherit the same metadata properties as the document set. This is set via the Document Set settings – ‘Shared Columns’:
Alternatively, the metadata can be added to each new individual document when the document is added.
If the Harpin document is selected as shown below …
… the information panel on the far right shows the metadata properties for the document (and also the activity – when the document was modified and by whom, and who viewed it):
As this particular document is a Word template added to a content type in the library, an end user can to select it when they create a new document in the library as shown in the screenshot below. Alternatively, the ‘Client Folder’ option allows them to create a new document set folder with all the metadata that relates to the client; this data is then inherited by any document created in the library:
If the document is opened, you can click on File – Info and see the metadata properties already added TO the document in the library. These properties remain with the document even if it is downloaded and/or attached to an email. If Document IDs have been enabled, that metadata value is also added to the document properties, meaning we can see that it came from a SharePoint library (and which one):
Because it is used as a template, the Word document can make use of the metadata added to the record in the body of the document, in addition to the metadata forming part of the properties for the document.
The XML properties
Let’s now look at the XML of the document.
Download the document to an accessible location. Using the Command Prompt (CMD), rename the document to .zip (You cannot do this from File Explorer). From File Explorer, the original file will now have the extension .zip. In the list below, the other file with a similar name is a copy, but the size is identical.
Now, unzip the zip file (right click, Extract All).
Here is the top level output, which is standard for all Word documents.
Open the ‘customXml’ folder and you will see a set of XML files:
Open item1.xml, and you will see the custom properties which, as you can see, includes both the Document ID as well as the original path SharePoint site/library location. Just to be clear the Document ID ends in ‘119’, which is the actual document; the original document set folder’s ID ends in 118 (scroll up to check):
As can be seen, the document that was downloaded has the unique Document ID embedded in the metadata. Note that this ID will change if the document is uploaded to a different library.
In the ‘docProps’ folder we find three sets of XML files:
In the ‘coreXML’ file we see the Dublin Core (DC) metadata that you see in the document Properties above. You can add all the Dublin Core metadata to the library, they are built-in to every library, which means that every document can have all that metadata.
The actual content (the body) of the email is found in the ‘word’ folder of the XML files. Here is the content of that ‘word’ folder:
In the screenshot below you can see some of the ‘document.xml’ content including the metadata that has been added in the body of the document (separately from the properties of the document).
All this metadata is accessible and is used by the Microsoft Graph.
Excel files are interesting because, in a sense, they contain data within data. Here is some data in a spreadsheet:
This data is – strangely – stored in two different XML files. The text (including the column headings) is stored here: \xl\sharedStrings.xml:
The values are stored according to each worksheet. For example: \xl\worksheets\sheet1.xml (first two rows only)
A note about emails
Emails do not have the same XML-based structure as Office documents and generally cannot have additional metadata added (except as tags).
Emails in Outlook (sent or received) become ‘.msg’ files if saved to another location from Outlook.
The ‘.msg’ format is based on CFB_3, or compound file binary format, a format that was also used by earlier versions of Microsoft Office documents. It is ‘a general-purpose file format that provides a file-system-like structure within a file for the storage of arbitrary, application-specific streams of data’. (Source: Microsoft web page on Compound File Binary File Format).
Copies of Microsoft Teams chat messages are also stored in a hidden folder in Exchange mailboxes, as instant messages. They cannot be accessed directly but should be considered as a type of archive copy – the originals are stored in a separate database.
If emails are saved to a SharePoint document library, they can be described with additional metadata while stored in the library, but this metadata does not become part of the core metadata of the email or remain with it if it is downloaded, as it does with other Office documents.
In any case, whether they remain in Exchange/Outlook mailboxes, are copied and stored in SharePoint or other Microsoft-based locations, the metadata content in them is accessible via searches.
Active Directory completes the relationships
Every digital record has an author and is likely to have contributors (modified by). Every email is sent and received by someone. All of the internal names linked with digital content are recorded in an organisation’s Active Directory. Employees are also likely to be added to Security Groups (sometimes known as AD Groups) that provide a way to control access to IT resources.
The relationship between document-based content (documents, emails), and between people in AD Security Groups, provides the ability to establish relationships between content, people and business activities.
A final word
Importantly, managing records data does NOT remove or exclude the business need or requirement to aggregate (e.g., in document libraries, mailboxes), manage through time, and then destroy or transfer records according to business requirements. Instead, it enhances this capability by ensuring that all records about a given subject or context can be identified and that, as much as possible, only one copy of the record exists.
(Note – while drafting this post I became aware of an MA Dissertation on the subject of ‘Artificial Intelligence and Record-keeping’ being developed by Mohamed Ben Tahayekt at University College London. I have not had access to this material but I believe some of the concepts may be similar to those outlined in this post.)
Digital records have long been thought of (and described) as being ‘unstructured’.
The reality, however, is that almost all contemporary text-based digital record is made up of a defined, structured and mostly open or accessible package of data that is based on standards. For example:
Microsoft Word, PowerPoint and Excel documents are all based on an XML structure (indicated by the ‘x’ on the end of the file extension) described in ISO/IEC 29500 and ECMA 376.
Google Docs exist only in an online format (described in this Google site); to access them offline they must be converted to one of the following formats ISO/IEC 29500 format, ODT (ISO/IEC 26300), PDF or html.
Emails are now mostly based on the Internet Messaging Format (IMF), standardized by RFC 5322.
PDFs are based on the open standard ISO 32000.
All of these standards support interoperability between systems (and devices). (See my post about Metadata Payloads for more information on this subject).
An exception to the above are binary objects, including digital photos and images and where these are embedded in text-based documents. But even so, most binary objects are stored with a range of metadata to describe them.
Given that text-based digital records are already full of readable and accessible structured data (and binary objects come with a range of descriptive metadata), is it possible to manage digital records as self-contained data objects?
Records and context
Digital content (records and non-records) will be always be captured, saved to or stored somewhere:
In email mailboxes. Emails of course may include attachments that duplicate records stored elsewhere in the system, or are not stored anywhere else – e.g., received from outside the organisation.
In a drive/folder structure in a network file share location, including ‘personal’ drives.
In a library/folder in online file storage and collaboration platforms, including ‘personal’ online storage locations.
In corporate enterprise ‘social’ platforms such as the intranet.
In corporate messaging and chat applications.
Some of the above may have well-defined ‘filing’ or storage structures (including folders) that are used to store or ‘file’ records. Some of these may include the ability to classify and categorise records, and add additional metadata.
In an organisational setting, all of this digital content will be created, sent/received, or modified by someone listed in Active Directory (AD), a system that generally links employees through their organisational structure. Additionally, employees are likely to belong to several AD Groups that further define relationships between them.
These relationships are important as they help us to understand the context for records.
Isolating records from other content
But one of the challenges for any organisation is knowing what is a record and what isn’t. Perhaps that isn’t as important as it sounds, if all the digital content is considered a potential record.
Organisations create or receive and store a lot of digital content, and a lot of this content has traditionally been kept (on backup tapes) for a long time to support disaster recovery and investigation purposes.
Only a percentage of this content is likely to fit the standard definition of a record – ‘evidence of business activities’.
And some digital content may not obviously be a record until it is connected with or related to other content or activities. For example, a simple email that says ‘Yes’ or ‘OK’ may be the record of agreement to something that doesn’t form part of any other obvious records until it is identified as being a record.
Not uncommonly in traditional electronic recordkeeping systems, there could be no guarantee that everything copied there was a copy of every record that existed on a given subject. Additionally, a record stored in a recordkeeping system may be of relevance in other contexts.
The key to what a record might be is the word ‘evidence’; this is exactly what lawyers look for when they conduct eDiscovery activities.
Rather than assume all records can be accurately found and managed amidst the volume of all digital content, it may be more efficient and accurate to assume all digital content is a record and then apply rules and tools to manage that content, with the aim of identifying records and their potential context based on the data contained in the individual digital objects and their relationships with both other records and people.
In other words find records amongst the entire content, rather than seeking to isolate only those digital objects that are identified as records and copy them to another location – while leaving the originals and potentially other related records in place. Managing records this way avoids the problem of email threads or chats that continue after the copy has been made, or a new copy of a Word document appearing.
How can we achieve this outcome?
There are three potential ways to manage records as data.
The first is to understand, even in general terms, is that digital content is not unstructured, and to learn more about how they are structured. Some simple examples:
Every email (and instant messages) has a sender, recipient, date sent, date received. They usually (but not always) have a subject. The text-based body of the email provides an additional form of accessible data. A quick look at email headers reveals a huge amount about the email.
Every document (and web page) has an author, created dated, modified date and last modified by, and a name. They also have a large amount of other data, a lot of which is visible in the Properties section.
Photographs are stored as binary objects but have a range of EXIF metadata that includes the creation date, information about the camera settings, and may also include the name of the person who created it, as well as a GPS location.
The second is to understand that digital content may include added data or metadata. This added data may relate to or derive from the location where the record is stored, or may be added by end-users as part of their work. It may include a unique identifier and information about the aggregation where it is stored, as well as recordkeeping classification terms. Additionally, it may include both process metadata (modified by, and when) and security or access control metadata. Depending on where it is stored, this additional metadata may be embedded with the document properties (the metadata payload).
The third is to have access to (ideally) all digital content across the organisation, and the necessary tools (or access to people with them who can provide usable output) to search and retrieve, relate, and manage all digital content on any given subject or context through to disposal. A very simple example of this is to run a PowerBI report across the network file shares.
And lastly, while there will always be some form of ‘local’ aggregation for specific records where all the records are stored in the one place (mailbox, document library, folder), the only way to establish an aggregation of all digital records on a given subject or context using data only is through the use of advanced searches and/or eDiscovery tools and/or data reporting or visualisations and/or artificial intelligence to find, link and relate content.
Linking and relating content
The diagram below from Microsoft about its Graph technology, provides a simple example of how content can be linked and related through its data.
There are now many data analytics and data visualisation tools that help to understand digital content. These tools are just one part of the picture.
Data analytics tools (such as ‘Constellation‘, a joint project between the Australian Signals Directorate and the Australian CSIRO) are a starting point to understand digital content – including digital content from line of business systems.
These tools might be used to identify content or people related to a given subject, through chat messages, emails, documents etc, including content that is already linked through its own context – the mailbox or a SharePoint library. From that information it would be possible to build a picture – types and volume of content, people, and the relationships between them.
A global search should be able to retrieve, and if necessary export, the content, keeping in mind always that the nature of digital content means it may continue to be modified or new content may added at any time.
As searches improve, narrower set of content allows more granular analysis and visualisation, allowing the identification of sub-sets of records within broader sets. For example, of the potentially large group of ‘everything about COVID’, just the narrower set ‘Vaccines’.
All of this could be achieved through the data that makes up the digital content. And many data-driven organisations are likely to be doing just this, using a range of business intelligence tools to understand the information available to them, in both line of business systems and other content.
Can we manage records as data
Perhaps ‘manage’ is not the right word, or at least not in the sense of expecting digital records to be managed as exceptions to the rest of the digital content.
But there is a lot more we can do to make this outcome possible. We can leave the records where they are stored or captured, we can apply local structure to those records, or security controls. We can keep records of changes that are made. We can apply retention rules that prevent the destruction of any record, or potential record, before it can be legally destroyed.
Instead of ‘managing’ records as exceptions, we can leave the data where it was created or stored, and use a range of tools to help us understand and manage it.
This will allow us to manage records as data and finally achieve the ‘semantic office‘ I wrote about in 2010.
The international standard for records management, ISO 15489-1:2016 (‘Information and documentation – Records management – Part 1: Concepts and Principles’), defines records as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Among other things, the standard notes that records systems may exist in a variety of forms, not necessary as or in a single or dedicated application. It also underlines the importance of appraisal; that is, the recurrent analysis of business context, business activity, processes and risk for the purpose of determining what records to make and keep and how to manage them over time – especially given the complexity of contemporary recordkeeping.
In terms of risks, the standard states that risk management is required to develop strategies for managing records and the management of records as a risk management strategy in itself.
Unlike traditional electronic document and records management (EDRM) systems that are used to store copies of records created and stored in other applications (‘exception management’), the Microsoft 365 environment is a single system in which records are a sub-set of the entire content (‘exception identification’).
This post discusses how records can be collated, grouped and aggregated in Microsoft 365 to meet requirements for management records. It emphases the point made in the international standard that the risk to records should be understood and minimised.
Records and context
Records are usually created or captured in some form of context – for example a business activity or project. This in turn provides the basis for collating, grouping or aggregating those records according to that context – commonly, a ‘subject’ or ‘topic’.
Records may be a subset of a broader subject (or series). They may be relevant or relate to more than one context or subject.
Digital records that may have no obvious context when they are first created or capture (for example a casual email about an ‘unusual virus outbreak’ in November 2019) may form part of a specific context only when their value is recognised (‘global pandemic’).
Grouping digital records
Grouping records in the digital world has up until now usually involved copying a digital record, created or captured in one system (such as email or a network file share), to a digital ‘file’ in another system such as an electronic document and records management (EDRM) system. The digital ‘file’ in those systems is a virtual representation; the records are actually stored in a file share, linked by metadata in the form of a file number.
The grouping of digital records as exceptions had (and continues to have) several flaws:
It assumed that all types of digital records could be stored in a digital ‘file’ from where they could be faithfully and reliably rendered (and not just stored as zipped versions of exported content from the originating system).
It relied on the willingness of end-users (often after training) and/or a technical third-party system, to copy a record to the system. This ‘exception management’ meant that some records were not copied to the EDRMS.
It was a ‘point in time’ capture. The original digital record remained in the system where it was created or captured, and might also be attached to emails and from there saved to multiple other locations.
There was no way of knowing if all the records in the file were all the records relating to the subject.
Where are the records created or captured in Microsoft 365
Most business records in Microsoft 365 will be created or captured in Outlook/Exchange mailboxes, SharePoint site libraries or MS Teams (which stores chat in Exchange mailboxes and documents in SharePoint or OneDrive). (For the purpose of this post, OneDrive is seen as a personal working space that should not be used to store business records.)
Regardless of whether they are created or captured in Exchange or SharePoint (including via Teams), all of the content – records and non records – created or captured in Microsoft 365 is stored in the Azure substrate. This effectively means that records in Microsoft 365 are a sub-set of all the other content stored in the Azure substrate.
Consequently, the management of records in Microsoft 365 involves exception identification. That is, identifying records and ensuring they are managed appropriately as much as possible where they are captured or created – and placing other controls over all the other content as necessary.
Everything created and stored in Microsoft 365 – including all the very rich metadata associated with every digital record – is subject to the Graph. The Graph identifies relationships and ‘signals’ not only between digital content but between people (agents) and business activities.
The Graph powers Delve and Discovery and the soon-to-be-released Project Cortex, presenting information (they have access to) to end-users that can sometimes be unsettling for people used to working in relative privacy. See below for further discussion about Project Cortex.
Additionally, as all the content in Microsoft 365 is stored in the Azure back-end, most of it can be searched and (where necessary) exported through the Content Search option in the Compliance portal, a capability that supports eDiscovery. This capability means that even when records are not ‘manually’ identified as records, there is a better chance they will be found.
How are records aggregated in Microsoft 365
There are three main ways that records are, or can be, aggregated in Microsoft 365: Exchange mailboxes, SharePoint site libraries, and Microsoft Groups that have a mailbox and a SharePoint site and can be linked to (or created from) a Team in MS Teams.
Exchange aggregates email records by:
Personal mailboxes, accessible only the ‘owner’ (end-user).
Shared mailboxes, accessible to those who have access.
Microsoft 365 Group mailboxes, accessible to the members of the Group (including anyone added to the Group).
Although a mailbox is a form of aggregation, there is no way to relate or link emails stored there with other related records stored in SharePoint unless they are copied to a SharePoint document library, as can be seen in the example below. This is recommended if an organisation wants to keep emails together with other records.
Emails copied to a SharePoint document library are a ‘point in time’ copy; there may be additional replies to the email, forming a thread that isn’t captured.
The alternatives to copying emails to SharePoint are:
Leave all emails in mailboxes and use Content Search to find and export them to SharePoint as a PST.
Creating a Microsoft 365 Group with an associated mailbox and SharePoint site, so that the records are retained in the context of the Group.
In any case, all mailboxes should be subject to a minimum retention period to ensure that any email that might be a record is preserved for that period. Certain mailboxes (for example, senior or key staff members) may be kept for longer periods and then exported for permanent storage.
SharePoint document libraries are logical aggregations for the storage of records, including emails copied from Exchange mailboxes.
Ideally, individual libraries that are used for the storage of records should map to a business activity and/or records retention class; this mapping should be reflected in the library name.
NOTE: Individual document libraries should not be used to store records relating to multiple subjects or mapping to more than one retention class or policy.
Document libraries may be assigned as much metadata as required, and content stored in them can be defined through the use of metadata and/or content types.
Microsoft 365 Groups (including Teams in MS Teams)
Microsoft 365 Groups provide a way to group and manage records, including MS Teams channel chats, in the context of the Group.
Every Group includes a mailbox (visible in Outlook) and a SharePoint site, and can be linked to new Team in MS Teams. Teams channel chats are stored in a hidden folder in the Group mailbox. Any documents and records are stored in the ‘Files’ tab of the channel, which surfaces the default ‘Documents’ library in the connected SharePoint site.
If the creation of Teams is allowed from the MS Teams application, every new Team creates a Microsoft Group (with the same name) and a SharePoint site (with the same name), however the mailbox (with the hidden folder for channel chats) is not visible from Outlook.
(The exception here are private channels; if these are allowed: (a) the chat content is stored in the Exchange mailbox of the each participant, and (b) a new SharePoint site is created for the ‘Files’.
The relationship between the content created by the Group is most obviously visible from the ‘Activity’ web part of the SharePoint site of the Group as can be seen in the screenshot below. This shows (right to left), an original incoming email from Outlook in the Group’s mailbox, the copy saved to the SharePoint document library, and the Word document reply. The specific context of the record (= the ‘file’) – ‘Correspondence 2020’ – is defined by the document library.
What about records in 1:1 Teams chat
As with OneDrive, Teams 1:1 chat should not be used to create or capture records, but may be used as a ‘working’ space.
However, ‘should’ and ‘reality’ can be different things. There are two ways to address this:
Explictly, through communication to end-users. Make it clear that Teams 1:1 chat and OneDrive are NOT to be used to create or capture records. Applying short-term retention policies to this content may assist with reducing (or increasing) this risk.
Implicitly, through monitoring and retention policies. Apply longer-term retention policies to the content and use Content Search/eDiscovery to look for content that may be records. Additionally, review the content of the OneDrive of departed staff and ensure that any records are kept.
Implications for managing records
The implications for collating, grouping and aggregating records in Microsoft 365 are as follows.
SharePoint document libraries will continue to be the primary aggregation for managing corporate records, including emails copied from Outlook.
Organisations should establish an architecture model for SharePoint sites that are used to manage records. The model may include a mix of the following: (a) sites mapped to business functions with libraries mapped to business activities and retention classes, (b) entire sites used to create and capture records relating to a single activity, where the entire site is mapped to a retention class, and (c) MS Groups (and Teams) with an associated SharePoint site, where the Group (mailbox/SharePoint site) is subject to a single retention class (and the Team channel chat also).
More effort, in terms of site/library set up, metadata, access controls, retention and end-of-retention process is likely to be required for the management of high-level, high-risk and permanent records.
Personal mailboxes in Exchange will continue to exist as a form of aggregation, and consideration should be given to having different retention policies for different ‘types’ of mailbox, to ensure that any email that could be a record is not deleted too quickly.
Addendum – Other options that collate, group and aggregate content in Microsoft 365
As noted earlier, all of the content created or captured in Microsoft 365 is stored in the backend Azure substrate. Consequently, it is possible to search across all or part of that content to find related information and, where required, export it to a different location.
The global Content Search is accessed from the Compliance portal and access requires elevated privileges – Global Admin or Compliance Admin.
Searches are created as cases and are based on keywords, conditions (such as ‘Sender’ for emails), and locations – all or specific. When a new content search is created or run, the Global Admins are alerted, providing a form of oversight in addition to audit logs.
While content searches find content is related to the search parameters, and legal holds can then be applied to that content, they do not create any form of aggregation in a recordkeeping sense.
The Graph, Delve, Discovery
Microsoft describe the Graph as being ‘the gateway to data and intelligence in Microsoft 365 [that can be used via the Microsoft Graph API] to access the tremendous amount of data in Microsoft 365, Windows 10, and Enterprise Mobility + Security’ and ‘… build apps that support scenarios spanning across productivity, collaboration, education, people and workplace intelligence, and much more. (Source ‘Overview of Microsoft Graph‘)
The Graph is commonly represented in diagrams similar to the one below.
Most end-users will encounter the Graph through either Delve or the Discover option in both the office.com portal and their OneDrive for Business accounts.
It is not uncommon for end-users to express surprise at the content (that they have access to) that is presented. Commonly this will show documents that a colleague is working on, or connections between people. Disabling Delve does not fix permissions; if a person has access to a document that appears in Delve, they will be able to search for it and find it that way.
Over time, the Graph can also provide other information based on the relationships or ‘signals’ it finds between all the different content in Microsoft 365.
While the Graph can present groups of records that have some relationship to the end-user, it does not aggregate those records or maintain a single consistent view. However, the Graph powers the new Project Cortex that does do something similar.
Project Cortex was announced by Microsoft in April 2019. To quote the announcement, Project Cortex:
Uses advanced AI to deliver insights and expertise in the apps you use every day, to harness collective knowledge and to empower people and teams to learn, upskill and innovate faster.
Uses AI to reason over content across teams and systems, recognizing content types, extracting important information, and automatically organizing content into shared topics like projects, products, processes and customers. Cortex then creates a knowledge network based on relationships among topics, content, and people.
From a recordkeeping aggregation point of view, a core functionality of Project Cortex is its ability to create ‘topic cards’ based on the rich metadata that makes up all the content in Microsoft 365. Again to quote the announcement:
Project Cortex securely collects content that is created and shared every day in Microsoft 365—including files, conversations, recorded meetings and video—and it categorizes the content based on its type, and tags it with extracted metadata.
AI then applies advanced topic mining logic—whether its content contained in Microsoft 365 or connected from external systems—to identify topics and relate content to those topics.
Topics can reflect any knowledge that’s important, including customers, products, projects, policies and procedures. Technically, AI is creating knowledge entities, a new object class, in the Microsoft Graph. The relationships between those topics—those knowledge entities—and the experiences that connect this knowledge with people creates your knowledge network.
Topic cards – or ‘knowledge entities’ – are a form of AI-generated aggregation.
However, topic cards will only present information that an end-user has access to and so the nirvana of presenting emails or Teams 1:1 chats in these cards as a form of aggregation for recordkeeping purposes is not likely to be realised through Project Cortex.