The international standard for records management, ISO 15489-1:2016 (‘Information and documentation – Records management – Part 1: Concepts and Principles’), defines records as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Among other things, the standard notes that records systems may exist in a variety of forms, not necessary as or in a single or dedicated application. It also underlines the importance of appraisal; that is, the recurrent analysis of business context, business activity, processes and risk for the purpose of determining what records to make and keep and how to manage them over time – especially given the complexity of contemporary recordkeeping.
In terms of risks, the standard states that risk management is required to develop strategies for managing records and the management of records as a risk management strategy in itself.
Unlike traditional electronic document and records management (EDRM) systems that are used to store copies of records created and stored in other applications (‘exception management’), the Microsoft 365 environment is a single system in which records are a sub-set of the entire content (‘exception identification’).
This post discusses how records can be collated, grouped and aggregated in Microsoft 365 to meet requirements for management records. It emphases the point made in the international standard that the risk to records should be understood and minimised.
Records and context
Records are usually created or captured in some form of context – for example a business activity or project. This in turn provides the basis for collating, grouping or aggregating those records according to that context – commonly, a ‘subject’ or ‘topic’.
Records may be a subset of a broader subject (or series). They may be relevant or relate to more than one context or subject.
Digital records that may have no obvious context when they are first created or capture (for example a casual email about an ‘unusual virus outbreak’ in November 2019) may form part of a specific context only when their value is recognised (‘global pandemic’).
Grouping digital records
Grouping records in the digital world has up until now usually involved copying a digital record, created or captured in one system (such as email or a network file share), to a digital ‘file’ in another system such as an electronic document and records management (EDRM) system. The digital ‘file’ in those systems is a virtual representation; the records are actually stored in a file share, linked by metadata in the form of a file number.
The grouping of digital records as exceptions had (and continues to have) several flaws:
- It assumed that all types of digital records could be stored in a digital ‘file’ from where they could be faithfully and reliably rendered (and not just stored as zipped versions of exported content from the originating system).
- It relied on the willingness of end-users (often after training) and/or a technical third-party system, to copy a record to the system. This ‘exception management’ meant that some records were not copied to the EDRMS.
- It was a ‘point in time’ capture. The original digital record remained in the system where it was created or captured, and might also be attached to emails and from there saved to multiple other locations.
- There was no way of knowing if all the records in the file were all the records relating to the subject.
Where are the records created or captured in Microsoft 365
Most business records in Microsoft 365 will be created or captured in Outlook/Exchange mailboxes, SharePoint site libraries or MS Teams (which stores chat in Exchange mailboxes and documents in SharePoint or OneDrive). (For the purpose of this post, OneDrive is seen as a personal working space that should not be used to store business records.)
Regardless of whether they are created or captured in Exchange or SharePoint (including via Teams), all of the content – records and non records – created or captured in Microsoft 365 is stored in the Azure substrate. This effectively means that records in Microsoft 365 are a sub-set of all the other content stored in the Azure substrate.
Consequently, the management of records in Microsoft 365 involves exception identification. That is, identifying records and ensuring they are managed appropriately as much as possible where they are captured or created – and placing other controls over all the other content as necessary.
Everything created and stored in Microsoft 365 – including all the very rich metadata associated with every digital record – is subject to the Graph. The Graph identifies relationships and ‘signals’ not only between digital content but between people (agents) and business activities.
The Graph powers Delve and Discovery and the soon-to-be-released Project Cortex, presenting information (they have access to) to end-users that can sometimes be unsettling for people used to working in relative privacy. See below for further discussion about Project Cortex.
Additionally, as all the content in Microsoft 365 is stored in the Azure back-end, most of it can be searched and (where necessary) exported through the Content Search option in the Compliance portal, a capability that supports eDiscovery. This capability means that even when records are not ‘manually’ identified as records, there is a better chance they will be found.
How are records aggregated in Microsoft 365
There are three main ways that records are, or can be, aggregated in Microsoft 365: Exchange mailboxes, SharePoint site libraries, and Microsoft Groups that have a mailbox and a SharePoint site and can be linked to (or created from) a Team in MS Teams.
Exchange aggregates email records by:
- Personal mailboxes, accessible only the ‘owner’ (end-user).
- Shared mailboxes, accessible to those who have access.
- Microsoft 365 Group mailboxes, accessible to the members of the Group (including anyone added to the Group).
Although a mailbox is a form of aggregation, there is no way to relate or link emails stored there with other related records stored in SharePoint unless they are copied to a SharePoint document library, as can be seen in the example below. This is recommended if an organisation wants to keep emails together with other records.
Emails copied to a SharePoint document library are a ‘point in time’ copy; there may be additional replies to the email, forming a thread that isn’t captured.
The alternatives to copying emails to SharePoint are:
- Leave all emails in mailboxes and use Content Search to find and export them to SharePoint as a PST.
- Creating a Microsoft 365 Group with an associated mailbox and SharePoint site, so that the records are retained in the context of the Group.
In any case, all mailboxes should be subject to a minimum retention period to ensure that any email that might be a record is preserved for that period. Certain mailboxes (for example, senior or key staff members) may be kept for longer periods and then exported for permanent storage.
SharePoint document libraries are logical aggregations for the storage of records, including emails copied from Exchange mailboxes.
Ideally, individual libraries that are used for the storage of records should map to a business activity and/or records retention class; this mapping should be reflected in the library name.
NOTE: Individual document libraries should not be used to store records relating to multiple subjects or mapping to more than one retention class or policy.
Document libraries may be assigned as much metadata as required, and content stored in them can be defined through the use of metadata and/or content types.
Almost every type of digital file, with a (newly announced) 100GB single file limit, can be stored in SharePoint Online, as noted in the article ‘Types of files that cannot be added to a list or library‘ (restrictions only apply to on-premise versions).
Microsoft 365 Groups (including Teams in MS Teams)
Microsoft 365 Groups provide a way to group and manage records, including MS Teams channel chats, in the context of the Group.
Every Group includes a mailbox (visible in Outlook) and a SharePoint site, and can be linked to new Team in MS Teams. Teams channel chats are stored in a hidden folder in the Group mailbox. Any documents and records are stored in the ‘Files’ tab of the channel, which surfaces the default ‘Documents’ library in the connected SharePoint site.
If the creation of Teams is allowed from the MS Teams application, every new Team creates a Microsoft Group (with the same name) and a SharePoint site (with the same name), however the mailbox (with the hidden folder for channel chats) is not visible from Outlook.
(The exception here are private channels; if these are allowed: (a) the chat content is stored in the Exchange mailbox of the each participant, and (b) a new SharePoint site is created for the ‘Files’.
The relationship between the content created by the Group is most obviously visible from the ‘Activity’ web part of the SharePoint site of the Group as can be seen in the screenshot below. This shows (right to left), an original incoming email from Outlook in the Group’s mailbox, the copy saved to the SharePoint document library, and the Word document reply. The specific context of the record (= the ‘file’) – ‘Correspondence 2020’ – is defined by the document library.
What about records in 1:1 Teams chat
As with OneDrive, Teams 1:1 chat should not be used to create or capture records, but may be used as a ‘working’ space.
However, ‘should’ and ‘reality’ can be different things. There are two ways to address this:
- Explictly, through communication to end-users. Make it clear that Teams 1:1 chat and OneDrive are NOT to be used to create or capture records. Applying short-term retention policies to this content may assist with reducing (or increasing) this risk.
- Implicitly, through monitoring and retention policies. Apply longer-term retention policies to the content and use Content Search/eDiscovery to look for content that may be records. Additionally, review the content of the OneDrive of departed staff and ensure that any records are kept.
Implications for managing records
The implications for collating, grouping and aggregating records in Microsoft 365 are as follows.
- SharePoint document libraries will continue to be the primary aggregation for managing corporate records, including emails copied from Outlook.
- Organisations should establish an architecture model for SharePoint sites that are used to manage records. The model may include a mix of the following: (a) sites mapped to business functions with libraries mapped to business activities and retention classes, (b) entire sites used to create and capture records relating to a single activity, where the entire site is mapped to a retention class, and (c) MS Groups (and Teams) with an associated SharePoint site, where the Group (mailbox/SharePoint site) is subject to a single retention class (and the Team channel chat also).
- More effort, in terms of site/library set up, metadata, access controls, retention and end-of-retention process is likely to be required for the management of high-level, high-risk and permanent records.
- Personal mailboxes in Exchange will continue to exist as a form of aggregation, and consideration should be given to having different retention policies for different ‘types’ of mailbox, to ensure that any email that could be a record is not deleted too quickly.
Addendum – Other options that collate, group and aggregate content in Microsoft 365
As noted earlier, all of the content created or captured in Microsoft 365 is stored in the backend Azure substrate. Consequently, it is possible to search across all or part of that content to find related information and, where required, export it to a different location.
The global Content Search is accessed from the Compliance portal and access requires elevated privileges – Global Admin or Compliance Admin.
Searches are created as cases and are based on keywords, conditions (such as ‘Sender’ for emails), and locations – all or specific. When a new content search is created or run, the Global Admins are alerted, providing a form of oversight in addition to audit logs.
While content searches find content is related to the search parameters, and legal holds can then be applied to that content, they do not create any form of aggregation in a recordkeeping sense.
The Graph, Delve, Discovery
Microsoft describe the Graph as being ‘the gateway to data and intelligence in Microsoft 365 [that can be used via the Microsoft Graph API] to access the tremendous amount of data in Microsoft 365, Windows 10, and Enterprise Mobility + Security’ and ‘… build apps that support scenarios spanning across productivity, collaboration, education, people and workplace intelligence, and much more. (Source ‘Overview of Microsoft Graph‘)
The Graph is commonly represented in diagrams similar to the one below.
Most end-users will encounter the Graph through either Delve or the Discover option in both the office.com portal and their OneDrive for Business accounts.
It is not uncommon for end-users to express surprise at the content (that they have access to) that is presented. Commonly this will show documents that a colleague is working on, or connections between people. Disabling Delve does not fix permissions; if a person has access to a document that appears in Delve, they will be able to search for it and find it that way.
Over time, the Graph can also provide other information based on the relationships or ‘signals’ it finds between all the different content in Microsoft 365.
While the Graph can present groups of records that have some relationship to the end-user, it does not aggregate those records or maintain a single consistent view. However, the Graph powers the new Project Cortex that does do something similar.
Project Cortex was announced by Microsoft in April 2019. To quote the announcement, Project Cortex:
- Uses advanced AI to deliver insights and expertise in the apps you use every day, to harness collective knowledge and to empower people and teams to learn, upskill and innovate faster.
- Uses AI to reason over content across teams and systems, recognizing content types, extracting important information, and automatically organizing content into shared topics like projects, products, processes and customers. Cortex then creates a knowledge network based on relationships among topics, content, and people.
From a recordkeeping aggregation point of view, a core functionality of Project Cortex is its ability to create ‘topic cards’ based on the rich metadata that makes up all the content in Microsoft 365. Again to quote the announcement:
- Project Cortex securely collects content that is created and shared every day in Microsoft 365—including files, conversations, recorded meetings and video—and it categorizes the content based on its type, and tags it with extracted metadata.
- AI then applies advanced topic mining logic—whether its content contained in Microsoft 365 or connected from external systems—to identify topics and relate content to those topics.
- Topics can reflect any knowledge that’s important, including customers, products, projects, policies and procedures. Technically, AI is creating knowledge entities, a new object class, in the Microsoft Graph. The relationships between those topics—those knowledge entities—and the experiences that connect this knowledge with people creates your knowledge network.
Topic cards – or ‘knowledge entities’ – are a form of AI-generated aggregation.
However, topic cards will only present information that an end-user has access to and so the nirvana of presenting emails or Teams 1:1 chats in these cards as a form of aggregation for recordkeeping purposes is not likely to be realised through Project Cortex.