Ever since emails first appeared as a way to communicate more than 30 years ago they have been a problem for records management, for two main reasons.
Emails (and attachments) are created and captured in a separate (email) system, and are stored in mailboxes that are inaccessible to records managers (a bit like ‘personal’ drives).
The only way to manage them in the context of other records was/is to print and file or copy them to a separate recordkeeping system, leaving the originals in place.
Thirty-plus years of email has left a trail of mostly inaccessible digital debris. An unknown volume of records remains locked away in ‘personal’ and archived mailboxes. Often, the only way to find these records is via legal eDiscovery, but even that can be limited in terms of how back you can go.
The report noted (from page 58) three common approaches to the preservation of legacy emails:
Migration (to MBOX, EML or even XML)
In a follow up article, the Australian IDM magazine published an article in March 2020 by one of the CLIR report authors (Chris Prom). The article, titled ‘The Future of Past Email is PDF‘, suggested that PDF may be (or become) a more suitable long-term solution for preservation of legacy emails.
Preservation is one thing, what about access
There is little point in preserving important records if they cannot be accessed. The two must go together. In fact, preservation without the ability access a record is not a long different from destruction through negligence.
Assuming emails can be migrated to a long-term and accessible format, what then?
No-one (except possible well-funded archival institutions perhaps) is seriously likely to attempt to move or copy individual legacy emails to pre-defined and pre-existing containers or aggregations of other records. This would be like printing individual emails and storing them in the same paper file or box that other records on the same subject are stored.
Access to legacy emails in an digitally accessible, metadata-rich format like PDF provides a range of potential opportunities to ‘harvest’ and make use of the content, including through machine learning and artificial intelligence.
These options have been available for close to twenty years in the eDiscovery world, but to support specific legal requirements.
Search, discovery and retention/disposal tools available in the Microsoft 365 Compliance portal, along with the underlying Graph and AI tools (including SharePoint Syntex) provide the potential to manage legacy content, including emails.
The starting point is migrating all those old legacy emails to an accessible format.
When people chat in Microsoft Teams (MS Teams), a ‘compliance’ copy of the chat is saved to either personal or (Microsoft 365) Group mailboxes. This copy is subject to retention policies, and can be found and exported via Content Search.
But what happens if there is no Exchange Online mailbox? It seems the chats become inaccessible which could be an issue from a recordkeeping and compliance point of view.
This post explains what happens, and why it may not be a good idea (from a compliance and recordkeeping point of view) not to disable the Exchange Online mailbox option as part of licence provisioning.
Licences and Exchange Online mailboxes
When an end-user is allocated a licence for Microsoft 365, a decision (sometimes incorporated into a script) is made about which of the purchased licences – and apps in those licences – will be assigned to that person.
E1, E3 and E5 licences include ‘Exchange Online’ as an option under ‘Apps’. This option is checked by default (along with many of the other options), but it can be disabled (as shown below).
If the checkbox option is disabled as part of the licence assigning process (not after), the end-user won’t have an Exchange mailbox and so won’t see the Outlook option when they log on to office.com portal. (Note – If they have an on-premise mailbox, that will continue to exist, nothing changes).
Having an Exchange Online mailbox is important if end-users are using MS Teams, because the ‘compliance’ copy of 1:1 chat messages in MS Teams are stored in a hidden folder (/Conversation History/Team Chat) in the Exchange Online mailbox of every participant in the chat. If the mailbox doesn’t exist, those copies aren’t made and so aren’t accessible and may be deleted.
If end-users chat with other end-users who don’t have an Exchange mailbox as shown in the example below, the same thing happen – no compliance copy is kept. The chat remains inaccessible (unless the Global Admins take over the account).
The exchange above, between Roger Bond and Charles, includes some specific key words. As we will see below, these chats cannot be found via a Content Search.
(On a related note, if the ability to create private channels is enabled and they create a private channel and chat there, the chats are also not saved because a compliance copy of private channel chats are stored in the mailboxes of the individual participants.)
Searching for chats when no mailbox exists
As we can see above, the word ‘mosquito’ was contained in the chat messages between Roger and Charles.
Content Searches are carried out via the Compliance portal and are more or less the same as eDiscovery searches in that they are created as cases.
From the Content Search option, a new search is created by clicking on ‘+New Search’, as shown below. The word ‘mosquito’ has been added as a keyword.
We then need to determine where the search will look. In this case the search will look through all the options shown below, including all mailboxes and Teams messages.
When the search was run, the results area shows the words ‘No results found’.
Clicking on ‘Status details’ in the search results, the following information is displayed – ‘0 items’ found. The ‘5 unindexed items’ is unrelated to this search and simply indicates that there are 5 unindexed items.
Double-checking the results
To confirm the results were accurate, another search was conducted where the end-user originally did not have a mailbox, and then was assigned one.
If the end-user didn’t have a mailbox but the other recipient/s of the message did, the Content Search found one copy of the chat message in the mailbox of the other participants. Only one item was found.
When the Exchange Online option was enabled for the end-user who previously did not have a mailbox (so they were now assigned a mailbox), a copy of the chat was found in the mailbox of both participants, as shown in the details below (‘2 items’).
Summary and implications
If end users chat in the 1:1 area of MS Teams and don’t have an Exchange Online mailbox, no compliance copy of the chat will be saved, and so it will not be found via Content Search.
If any of the participants in the 1:1 chat have an Exchange Online mailbox, the chat will appear in the mailboxes of those participants.
If all participants in the 1:1 chat have an Exchange Online mailbox, the chat will be found in the mailbox of all participants.
Further to the above:
If end users can delete chats (via Teams policies) and don’t have a mailbox, no copy of the chat will exist.
If end-users with a mailbox can delete Teams chats, but a retention policy has been applied to the chats, the chats will be retained as per the retention policy (in a hidden folder).
And finally, if you allow private channels, end-users can create private channels in the Organisation Team. The chats in these private channels are usually stored in the personal mailboxes of participants (not the Group mailbox) – so these chats will also be inaccessible and cannot be found via Content Search.
The implications for the above are that, if you need to ensure that personal chat messages can be accessed (from Content Search), then the participants in the chat must have an Exchange Online mailbox.
Further, if you allow deletion of chats but need to be able to recover them for compliance purposes, a retention policy should be applied to Teams 1:1 chat.
The international standard for records management, ISO 15489-1:2016 (‘Information and documentation – Records management – Part 1: Concepts and Principles’), defines records as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Among other things, the standard notes that records systems may exist in a variety of forms, not necessary as or in a single or dedicated application. It also underlines the importance of appraisal; that is, the recurrent analysis of business context, business activity, processes and risk for the purpose of determining what records to make and keep and how to manage them over time – especially given the complexity of contemporary recordkeeping.
In terms of risks, the standard states that risk management is required to develop strategies for managing records and the management of records as a risk management strategy in itself.
Unlike traditional electronic document and records management (EDRM) systems that are used to store copies of records created and stored in other applications (‘exception management’), the Microsoft 365 environment is a single system in which records are a sub-set of the entire content (‘exception identification’).
This post discusses how records can be collated, grouped and aggregated in Microsoft 365 to meet requirements for management records. It emphases the point made in the international standard that the risk to records should be understood and minimised.
Records and context
Records are usually created or captured in some form of context – for example a business activity or project. This in turn provides the basis for collating, grouping or aggregating those records according to that context – commonly, a ‘subject’ or ‘topic’.
Records may be a subset of a broader subject (or series). They may be relevant or relate to more than one context or subject.
Digital records that may have no obvious context when they are first created or capture (for example a casual email about an ‘unusual virus outbreak’ in November 2019) may form part of a specific context only when their value is recognised (‘global pandemic’).
Grouping digital records
Grouping records in the digital world has up until now usually involved copying a digital record, created or captured in one system (such as email or a network file share), to a digital ‘file’ in another system such as an electronic document and records management (EDRM) system. The digital ‘file’ in those systems is a virtual representation; the records are actually stored in a file share, linked by metadata in the form of a file number.
The grouping of digital records as exceptions had (and continues to have) several flaws:
It assumed that all types of digital records could be stored in a digital ‘file’ from where they could be faithfully and reliably rendered (and not just stored as zipped versions of exported content from the originating system).
It relied on the willingness of end-users (often after training) and/or a technical third-party system, to copy a record to the system. This ‘exception management’ meant that some records were not copied to the EDRMS.
It was a ‘point in time’ capture. The original digital record remained in the system where it was created or captured, and might also be attached to emails and from there saved to multiple other locations.
There was no way of knowing if all the records in the file were all the records relating to the subject.
Where are the records created or captured in Microsoft 365
Most business records in Microsoft 365 will be created or captured in Outlook/Exchange mailboxes, SharePoint site libraries or MS Teams (which stores chat in Exchange mailboxes and documents in SharePoint or OneDrive). (For the purpose of this post, OneDrive is seen as a personal working space that should not be used to store business records.)
Regardless of whether they are created or captured in Exchange or SharePoint (including via Teams), all of the content – records and non records – created or captured in Microsoft 365 is stored in the Azure substrate. This effectively means that records in Microsoft 365 are a sub-set of all the other content stored in the Azure substrate.
Consequently, the management of records in Microsoft 365 involves exception identification. That is, identifying records and ensuring they are managed appropriately as much as possible where they are captured or created – and placing other controls over all the other content as necessary.
Everything created and stored in Microsoft 365 – including all the very rich metadata associated with every digital record – is subject to the Graph. The Graph identifies relationships and ‘signals’ not only between digital content but between people (agents) and business activities.
The Graph powers Delve and Discovery and the soon-to-be-released Project Cortex, presenting information (they have access to) to end-users that can sometimes be unsettling for people used to working in relative privacy. See below for further discussion about Project Cortex.
Additionally, as all the content in Microsoft 365 is stored in the Azure back-end, most of it can be searched and (where necessary) exported through the Content Search option in the Compliance portal, a capability that supports eDiscovery. This capability means that even when records are not ‘manually’ identified as records, there is a better chance they will be found.
How are records aggregated in Microsoft 365
There are three main ways that records are, or can be, aggregated in Microsoft 365: Exchange mailboxes, SharePoint site libraries, and Microsoft Groups that have a mailbox and a SharePoint site and can be linked to (or created from) a Team in MS Teams.
Exchange aggregates email records by:
Personal mailboxes, accessible only the ‘owner’ (end-user).
Shared mailboxes, accessible to those who have access.
Microsoft 365 Group mailboxes, accessible to the members of the Group (including anyone added to the Group).
Although a mailbox is a form of aggregation, there is no way to relate or link emails stored there with other related records stored in SharePoint unless they are copied to a SharePoint document library, as can be seen in the example below. This is recommended if an organisation wants to keep emails together with other records.
Emails copied to a SharePoint document library are a ‘point in time’ copy; there may be additional replies to the email, forming a thread that isn’t captured.
The alternatives to copying emails to SharePoint are:
Leave all emails in mailboxes and use Content Search to find and export them to SharePoint as a PST.
Creating a Microsoft 365 Group with an associated mailbox and SharePoint site, so that the records are retained in the context of the Group.
In any case, all mailboxes should be subject to a minimum retention period to ensure that any email that might be a record is preserved for that period. Certain mailboxes (for example, senior or key staff members) may be kept for longer periods and then exported for permanent storage.
SharePoint document libraries are logical aggregations for the storage of records, including emails copied from Exchange mailboxes.
Ideally, individual libraries that are used for the storage of records should map to a business activity and/or records retention class; this mapping should be reflected in the library name.
NOTE: Individual document libraries should not be used to store records relating to multiple subjects or mapping to more than one retention class or policy.
Document libraries may be assigned as much metadata as required, and content stored in them can be defined through the use of metadata and/or content types.
Microsoft 365 Groups (including Teams in MS Teams)
Microsoft 365 Groups provide a way to group and manage records, including MS Teams channel chats, in the context of the Group.
Every Group includes a mailbox (visible in Outlook) and a SharePoint site, and can be linked to new Team in MS Teams. Teams channel chats are stored in a hidden folder in the Group mailbox. Any documents and records are stored in the ‘Files’ tab of the channel, which surfaces the default ‘Documents’ library in the connected SharePoint site.
If the creation of Teams is allowed from the MS Teams application, every new Team creates a Microsoft Group (with the same name) and a SharePoint site (with the same name), however the mailbox (with the hidden folder for channel chats) is not visible from Outlook.
(The exception here are private channels; if these are allowed: (a) the chat content is stored in the Exchange mailbox of the each participant, and (b) a new SharePoint site is created for the ‘Files’.
The relationship between the content created by the Group is most obviously visible from the ‘Activity’ web part of the SharePoint site of the Group as can be seen in the screenshot below. This shows (right to left), an original incoming email from Outlook in the Group’s mailbox, the copy saved to the SharePoint document library, and the Word document reply. The specific context of the record (= the ‘file’) – ‘Correspondence 2020’ – is defined by the document library.
What about records in 1:1 Teams chat
As with OneDrive, Teams 1:1 chat should not be used to create or capture records, but may be used as a ‘working’ space.
However, ‘should’ and ‘reality’ can be different things. There are two ways to address this:
Explictly, through communication to end-users. Make it clear that Teams 1:1 chat and OneDrive are NOT to be used to create or capture records. Applying short-term retention policies to this content may assist with reducing (or increasing) this risk.
Implicitly, through monitoring and retention policies. Apply longer-term retention policies to the content and use Content Search/eDiscovery to look for content that may be records. Additionally, review the content of the OneDrive of departed staff and ensure that any records are kept.
Implications for managing records
The implications for collating, grouping and aggregating records in Microsoft 365 are as follows.
SharePoint document libraries will continue to be the primary aggregation for managing corporate records, including emails copied from Outlook.
Organisations should establish an architecture model for SharePoint sites that are used to manage records. The model may include a mix of the following: (a) sites mapped to business functions with libraries mapped to business activities and retention classes, (b) entire sites used to create and capture records relating to a single activity, where the entire site is mapped to a retention class, and (c) MS Groups (and Teams) with an associated SharePoint site, where the Group (mailbox/SharePoint site) is subject to a single retention class (and the Team channel chat also).
More effort, in terms of site/library set up, metadata, access controls, retention and end-of-retention process is likely to be required for the management of high-level, high-risk and permanent records.
Personal mailboxes in Exchange will continue to exist as a form of aggregation, and consideration should be given to having different retention policies for different ‘types’ of mailbox, to ensure that any email that could be a record is not deleted too quickly.
Addendum – Other options that collate, group and aggregate content in Microsoft 365
As noted earlier, all of the content created or captured in Microsoft 365 is stored in the backend Azure substrate. Consequently, it is possible to search across all or part of that content to find related information and, where required, export it to a different location.
The global Content Search is accessed from the Compliance portal and access requires elevated privileges – Global Admin or Compliance Admin.
Searches are created as cases and are based on keywords, conditions (such as ‘Sender’ for emails), and locations – all or specific. When a new content search is created or run, the Global Admins are alerted, providing a form of oversight in addition to audit logs.
While content searches find content is related to the search parameters, and legal holds can then be applied to that content, they do not create any form of aggregation in a recordkeeping sense.
The Graph, Delve, Discovery
Microsoft describe the Graph as being ‘the gateway to data and intelligence in Microsoft 365 [that can be used via the Microsoft Graph API] to access the tremendous amount of data in Microsoft 365, Windows 10, and Enterprise Mobility + Security’ and ‘… build apps that support scenarios spanning across productivity, collaboration, education, people and workplace intelligence, and much more. (Source ‘Overview of Microsoft Graph‘)
The Graph is commonly represented in diagrams similar to the one below.
Most end-users will encounter the Graph through either Delve or the Discover option in both the office.com portal and their OneDrive for Business accounts.
It is not uncommon for end-users to express surprise at the content (that they have access to) that is presented. Commonly this will show documents that a colleague is working on, or connections between people. Disabling Delve does not fix permissions; if a person has access to a document that appears in Delve, they will be able to search for it and find it that way.
Over time, the Graph can also provide other information based on the relationships or ‘signals’ it finds between all the different content in Microsoft 365.
While the Graph can present groups of records that have some relationship to the end-user, it does not aggregate those records or maintain a single consistent view. However, the Graph powers the new Project Cortex that does do something similar.
Project Cortex was announced by Microsoft in April 2019. To quote the announcement, Project Cortex:
Uses advanced AI to deliver insights and expertise in the apps you use every day, to harness collective knowledge and to empower people and teams to learn, upskill and innovate faster.
Uses AI to reason over content across teams and systems, recognizing content types, extracting important information, and automatically organizing content into shared topics like projects, products, processes and customers. Cortex then creates a knowledge network based on relationships among topics, content, and people.
From a recordkeeping aggregation point of view, a core functionality of Project Cortex is its ability to create ‘topic cards’ based on the rich metadata that makes up all the content in Microsoft 365. Again to quote the announcement:
Project Cortex securely collects content that is created and shared every day in Microsoft 365—including files, conversations, recorded meetings and video—and it categorizes the content based on its type, and tags it with extracted metadata.
AI then applies advanced topic mining logic—whether its content contained in Microsoft 365 or connected from external systems—to identify topics and relate content to those topics.
Topics can reflect any knowledge that’s important, including customers, products, projects, policies and procedures. Technically, AI is creating knowledge entities, a new object class, in the Microsoft Graph. The relationships between those topics—those knowledge entities—and the experiences that connect this knowledge with people creates your knowledge network.
Topic cards – or ‘knowledge entities’ – are a form of AI-generated aggregation.
However, topic cards will only present information that an end-user has access to and so the nirvana of presenting emails or Teams 1:1 chats in these cards as a form of aggregation for recordkeeping purposes is not likely to be realised through Project Cortex.
Office 365 is sometimes referred to as an ‘ecosystem’. In theory this means that records could be stored anywhere across that ecosystem.
Unlike the ‘old’ on-premise world of standalone servers for each Microsoft application (Exchange, SharePoint, Skype) – and where specific retention policies could apply (including the Exchange Messaging Records Management MRM policy), the various elements that make up Office 365 are interconnected.
The most obvious example of this interconnectivity is Microsoft Teams which stores chat content in Exchange and provides access to content stored in both SharePoint (primarily the SharePoint site of the linked Office 365 Group) and OneDrive, and has links to other elements such as Planner.
Records continue to be created and kept in the various applications but retention policies are set centrally and can apply to any or all of the content across the ecosystem.
Managing records in Office 365, and applying retention rules to those records, requires an understanding of at least the key parts of the ecosystem – Exchange, Teams, SharePoint and OneDrive and how they interrelate, and from there establishing a plan for the implementation of retention.
What types of records are created in Office 365?
Records are defined as ‘evidence of business activity’ and are often associated with some form of metadata.
Evidence of business activity is an overarching term that can include:
Documents and notebooks (in the sense of text on a page)
Plans, including both project plans and architectural plans and diagrams
Images/photographs and video
Chat and/or messages
Conversations (audio and/or video based)
Social media posts
All digital records contain some form of metadata, usually displayed as ‘Properties’.
Where are the records stored in Office 365?
Most records created organisations using Office 365 are likely to be created or stored in the following parts of the ecosystem:
Exchange/Outlook – for emails and calendars.
SharePoint and OneDrive – for documents and notebooks (in the sense of text on a page), plans, images/photographs and video.
Stream – for audio and video recordings.
MS Teams – for chat and/or messages, conversations (audio and/or video based). Note that 1:1 chats are stored in a hidden folder of the Exchange mailbox of the end-user/s participating in the chat, while Teams channel chat is stored in a hidden folder of the linked Office 365 Group mailbox.
Yammer – for (internal) social media posts.
It is also possible to import and archive certain external content such as Twitter tweets and Facebook content in Office 365.
The diagram below provides a overview of the main Office 365 applications and locations where records are created or stored. Under SharePoint, the term ‘Sites’ refers to all types of SharePoint sites, including those associated with Office 365 Groups. Libraries are shown separately because of the potential to apply a retention policy to a library – see below.
Note also that this diagram does not include network file shares (NFS) as the assumption is made that (a) NFS content will be migrated to SharePoint and the NFS made read only, and (b) all new content that would previously have been stored on the NFS is instead saved either to OneDrive for Business (for ‘personal’ or working documents) or SharePoint only.
Creating a plan to manage records retention across Office 365
In previous posts I have recommended that organisations implementing Office 365 have the following:
A basic architecture design model for SharePoint sites, including SharePoint sites linked with Office 365 Groups (and Teams in MS Teams).
A plan for creating and applying retention policies across the ecosystem.
Because SharePoint is the most likely location for records to be stored (aside from Exchange mailboxes and OneDrive accounts), there should be at least one retention policy for every SharePoint site (or group of sites), as well as policies for specific document libraries if the retention for the content in those libraries may be different from the retention on the overall site.
For example, a ‘Management’ site may contain a range of general content as well as specific content that needs to be retained for longer.
The site can be covered by a single implicit retention policy of (say) 7 years. This policy will delete content in the background, based on date created or data modified.
The document library where specific types of records with longer or different retention requirements are stored may have one or more explicit label-based policies applied to those libraries. This content will be retained while the rest of the site content is deleted via the first policy.
Structure of a retention plan for records in Office 365
A basic plan for creating and applying retention policies might look something like the following:
User mailboxes – one ‘general’ (implicit) retention policy for all mailboxes (say, 7 years after creation) and another more specific retention policy for specific mailboxes that require longer retention.
SharePoint sites – multiple (implicit) retention policies targeting one or more sites.
SharePoint libraries – multiple (explicit) label-based retention policies that are applied manually. These policies will usually a retention policy that is longer than any implicit retention policy as any implicit site policy will prevent the deletion of content before it reaches the end of that retention period.
Office 365 Groups (includes the associated mailbox and SharePoint site) – one ‘general’ (implicit) retention policy. See also below.
Teams channel chat – one ‘general’ (implicit) retention policy. Note that this content is stored in a special folder of the Office 365 Group mailbox.
1:1 chat – one ‘general’ (implicit) retention policy. This content is stored in a special folder of the participant mailboxes.
OneDrive documents – one ‘general’ (implicit) retention policy for all ODfB accounts, plus the configuration of retention after the account is inactive.
At a high level, the retention policy plan might look something like the following – ‘implicit’ policies are shown in yellow, SharePoint document libraries may be subject to ‘explicit’, label-based policies. The ‘+7 years’ for OneDrive relates to inactive accounts, a setting set in the OneDrive Admin portal.
To retain content for a Microsoft 365 group, you need to use the Microsoft 365 groups location. Even though an Microsoft 365 group has an Exchange mailbox, a retention policy that includes the entire Exchange location won’t include content in Microsoft 365 group mailboxes. A retention policy applied to an Microsoft 365 group includes both the group mailbox and site. A retention policy applied to an Microsoft 365 group protects the resources created by an Microsoft 365 group, which would include Microsoft Teams.
The actual plan should contain more detail and included as part of other recordkeeping documentation (perhaps stored on a ‘Records Management’ SharePoint site). The plan should include details about (a) where the policies have been applied and (b) the expected outcomes or actions for the policies, including automatic deletion or disposition review (for document libraries).
Keep in mind that, unless the organisation decides to acquire this option, there is no default backup for content in Office 365 – once a record had been deleted, it is gone forever and there may be no record of this beyond 90 days.
Records managers have been struggling with managing emails as records ever since they first appeared in the workplace.
For a long time the accepted practice, as with other digital records, was to print them out and put them on the appropriate file. With the introduction of electronic document and records management (EDRM) systems, end users were instead required to save or copy documents and emails to an electronic ‘file’ in that system.
In both cases, the emails remained in the user’s ‘personal’ mailbox, where they remained inaccessible for ‘privacy’ reasons. End-users and business areas would (and still do) conduct business via the email system, without these records being available to anyone except the sender and recipient/s. Attachments to emails sent to individual recipients were (and continue to be) not managed as records unless they were printed out or saved to the EDRMS.
Microsoft Office 365 has changed the paradigm for keeping records as described in the linked post, away from the central storage and management of records in one system (while leaving the originals in place), to the decentralised ‘in place’ storage and centralised management of records across Office 365.
This post provides an overview of the three main options for managing email as records in Office 365, in both Exchange and SharePoint.
In summary the options are:
Leave emails in place in Exchange mailboxes (personal and Office 365 Group mailboxes) and apply one or more Office 365 retention policies to mailboxes.
Same as previous point, and use Content Search to retrieve emails as required.
Same as previous point, and only copy specific emails to SharePoint
Keep in mind while reading this post that chat content from MS Teams is also stored in Exchange mailboxes but that content cannot be copied to SharePoint.
Option 1 – Leave emails in place and apply retention policy
In this option, emails remained stored in personal or Office 365 Group mailboxes. End users may create folders and ‘categorise’ the content as they wish, but no additional attempt is made to further categorise, add metadata to, or group the content according to recordkeeping requirements. The aggregation, from a recordkeeping point of view, is the end-user or Office 365 Group.
All mailboxes are subject to one or more retention policies set in the Office 365 Compliance portal to ensure that no emails are deleted before a pre-defined minimum period.
Note that retention policies can effectively replace a back-up regime used by IT for disaster recovery and investigation purposes purposes.
Emails are aggregated by user name or Office 365 Group and will remain in mailboxes for a minimum period of time as set by the retention policy.
Office 365 Group mailboxes provide the ability to group emails by a more specific subject (the Group name, which could map to a business function – e.g., ‘Correspondence Management’) and have the added positive of having an associated SharePoint site.
The negative with this option, from a recordkeeping point of view, is that all emails – regardless of subject or importance – are grouped by the ‘personal’ or Office 365 Group mailbox, and kept for the period defined in the retention policy. That is, there is no differentiation between (email) records that may need to be kept for a long period of time and those that are transient in nature.
If there is a requirement to ensure that certain emails are kept in different aggregations or for different periods of time, then option 3 should be considered.
Option 2 – Same as option 1 and use Content Search to retrieve emails
This option is the same as the first option, but the business can make use of Content Search to identify and isolate emails as required. Content Search is more or less the same as the search part of an e-Discovery case.
Note that access to the Content Search area is restricted to Office 365 Global Admins and Compliance Admins. This is because, as can be seen in the screenshot below, a Content Search can be set up to search for any content in email, documents and much more.
Content Searches can be set up from the ‘New Search’ option, or the Administrator can make use of a Guided search or Search by ID List. For the purpose of this email, only the ‘New search’ will be examined.
Configuring a new Content Search
Each content search can be configured against three main options as shown in the screenshot below: Keywords, Conditions, and Locations. Some searches may require a combination of these three options.
Keywords can be any words that may be found anywhere in the email, including the content.
The available conditions are listed below:
Size (in bytes)
The available search locations include any or all of the options below:
Office 365 group email
Skype for Business
Office 365 Group sites
Exchange public folders
For more detail on how to use Content Search and all the options available, go to this Microsoft site.
Running a search
After the search has been configured, it must be run. The speed of the search will depend on the complexity of the search, conditions, locations and the volume of content. Every search will appear in the list of searches that have been saved.
When complete, the search result will show a ‘Status’, showing the number of:
Once the search has completed, the results of the search may be exported. There are two configurable options for exported results.
All items, excluding ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
All items, including ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
Only items that have an unrecognized format, are encrypted, or weren’t indexed for other reasons
Exchange content export options:
One PST file for each mailbox
One PST file containing all messages
One PST file containing all messages in a single folder
Enable de-duplication for Exchange content (check box)
Content searches are likely to find and retrieve more relevant emails than might be saved elsewhere, as it looks through all emails. Provided a retention policy has been applied to the mailboxes, the content should still be accessible. If the emails have been deleted at the end of a retention policy, they will not be accessible any more.
Emails can be exported and – if necessary – the PST copied to a different system (such as SharePoint) for long-term storage with additional metadata as required.
Access to the Content Search option is restricted to Global Admins and Compliance admins, for good reason. Consideration might need to be given to governance or procedural rules. Note that Global Admins are always alerted when a new content search is created or run.
Each search must be pre-configured and run regularly to ensure that all emails are identified.
Content searches may retrieve too much unrelated content.
Option 3 – Same as option 2 and copy only select emails to SharePoint
This option mimics the legacy way of saving a record to a pre-defined separate aggregation, in this case to a SharePoint document library.
It differs from the first two options in that only certain select emails are copied (by end-users or using a third-party application) to specific SharePoint document libraries. It is still, however, possible (and preferable) to apply a retention policy to the original mailboxes.
Content search, which can be used at any time, will find the emails in both Exchange mailboxes and SharePoint as long as they have not been deleted via a retention policy expiry .
The positives with this option are that emails copied to a SharePoint document library:
Are grouped with other related records. This may be important from an organisational recordkeeping point of view, for example for certain key records. Consideration might also be given to setting up an Office 365 Group instead for these specific records.
Can have additional metadata.
Can be retained for a period of time, different from the original mailbox.
The problems with this option are that:
It requires some kind of action to copy the email.
It creates a copy of the email, it doesn’t remove the original.
An email copied to another system may not be the most recent in a thread, especially if that thread is still active.
Does not include the ‘chat’ elements from MS Teams.
Summing up the options
The idea of copying an email to a separate aggregation, container or file for recordkeeping purposes is a legacy concept inherited from the paper recordkeeping period. While attempts were made over the years to mimic that concept in EDRM systems, it has several weaknesses that mostly outweigh the alleged benefits.
Email (in Exchange) and documents (in SharePoint) continue to remain separate in Office 365 but there is now the potential to manage both equally through a combination of retention policies and pre-defined content searches.
The majority of business emails are never captured in separate recordkeeping systems. Microsoft’s centralised retention model and ability to apply to retrieve emails on the fly mean that it is more efficient and cost effective to leave emails in place. This does not exclude the potential to copy certain select emails to SharePoint.
Additionally, mailboxes associated with Office 365 Groups provide the ability to keep emails in a business context, away from inaccessible ‘personal’ email accounts. Records managers should consider the potential of using Office 365 Group mailboxes in this way for particular types of records.
Records management standards (see below) state that a defining feature of records is that they are associated with metadata – both ‘point of capture’ metadata and ‘process’ metadata that continues to evolve throughout the life of the record.
For at least two decades, the requirement to capture and store metadata for digital records has driven the implementation of centralised electronic document and records management EDRM systems, many of which began life as databases used to record metadata about physical records (files and boxes).
EDRM systems were (and still are) used to store copies of digital records created or captured natively in other systems, primarily network file shares and email. End-users were required to copy individual records to the EDRMS, a process that mirrored the storage of records (including printed digital records) in physical files.
Network file shares and email systems were not considered to be suitable as recordkeeping systems because they could not ensure the authenticity, integrity and reliability of records over time, including to manage and preserve metadata about the records stored in them.
The increasing implementation of Office 365, and in particular the use of SharePoint for the storage of records, has highlighted the extent to which recordkeeping metadata can – or even should – be applied to the content stored in that system.
This post discusses the need for metadata in records stored in Office 365, including in both Exchange/Outlook, MS Teams, and SharePoint/OneDrive for Business. It concludes that most records stored in Office 365 do not need additional metadata but, where such metadata is required, there is unlimited capability to add it.
Records and metadata
The international standard for records management, ISO 15489:2016, defines a record as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Records are said to be different from ‘non-records’ because they are associated or described with (mostly added) metadata that describes ‘the context, content and structure of records and their management through time’.
The standard for recordkeeping metadata is ISO 23081:2017. One records management professional (link at the end of the post) noted that there has been reluctant adoption of this standard, mostly because it was ‘too complex’ and ‘academic’, and used ‘foreign terms’. Unspecified vendors were said to have been dismissive of the standard.
Standard for managing digital records – ISO 16175
Part 2 of the standard ISO 16175:2011, ‘Guidelines and functional requirements for digital records management systems’ contains multiple requirements relating to metadata, across three broad categories:
Point of capture metadata. This includes metadata that forms part of the ‘metadata payload’ of the original record (e.g., date created, creator), other metadata added at point of capture, and metadata that provides additional context for the records.
Process metadata. This is metadata that records activities and changes to both the record and metadata over the life of the record.
The need to manage and control metadata over time.
This standard appears to reinforce the requirement for records to be stored and managed in dedicated recordkeeping systems.
On premise document and records management systems: These systems use metadata schema that specify metadata fields to be used in the system.
Cloud systems including Office 365: These systems can make use of enterprise ‘graphs’ that map people to documents and topics. The graph is built from the interactions of people with content across the different workloads of the suite.
Most people now accept the algorithm capabilities of Facebook, LinkedIn, eBay, Amazon and similar online systems to automatically connect us with information relevant to us, without having to add any metadata.
Given the volume and types of digital content, almost all of which has metadata ‘payloads’, how can we ever hope to add the required recordkeeping metadata?
Can’t we just rely on the algorithms and graphs?
How much metadata do you really need?
The answer to this question may depend largely on business, regulatory/compliance and/or government recordkeeping requirements relevant to the organisation and its jurisdiction. In my experience, across multiple very large and also very small organisations:
Most private sector organisations will likely have minimal metadata requirements beyond basic ‘point of capture’ and ‘process’ metadata already recorded in the system where the records are created or captured (including email), unless this is required for specific compliance or regulatory purposes, or where there is risk associated with poor recordkeeping. For example, in a major food processing company, records relating to the manufacture of food were very well documented and managed, while corporate records were managed haphazardly.
Most public sector organisations are required, for government accountability and transparency (and information retrieval) purposes, to apply a minimum set of both ‘point of capture’ and ‘process’ metadata for non-permanent records. Many government agencies have struggled to manage digital records effectively.
A small percentage of records captured or created in government agencies may require more extensive metadata, especially if those records are to be transferred to archival institutions for permanent retention.
Office 365 ‘workloads’
In Office 365, most business records will be created or captured in either Exchange/Outlook (includes MS Teams chats), or SharePoint or OneDrive for Business (for ‘working’ or personal content).
Exchange is a recordkeeping system in that it stores records with consistent metadata. The primary ‘weakness’, in terms of recordkeeping, is that ‘personal’ Exchange mailboxes aggregate records on a range of subjects by an individual user rather than by business subject. The mailboxes of Office 365 Groups, on the other hand, can be used to aggregate records about a business function/activity or subject.
SharePoint is a recordkeeping system that has extensive default metadata and almost unlimited additional metadata capability (see below). OneDrive for Business is a SharePoint service that has the same extensive default metadata capability.
There is, generally speaking, no requirement for organisations that have implemented Office 365 to allow the continued use of network file shares because the ‘save’ and ‘save as’ options in Office/Windows 10 points to SharePoint and OneDrive as the default save locations.
Metadata in Exchange mailboxes/MS Teams
Emails have the same metadata options in the header of every email:
Recipients (To, including CC and BCC)
(Plus more with routing information and security controls including DKIM, SPF, DMARC etc)
However, no other metadata can be added and some (or most) emails may never form part of the collated record of a given subject.
Because of this ‘limitation’, there has been an assumption ever since email was introduced that emails identified as records would have to be copied to a (separate) recordkeeping system.
In pre-digital days, this meant printing out emails and placing them on a paper file.
In organisations with EDRM systems, this meant copying the email to the EDRMs where additional metadata would be applied.
The original emails generally remained in place in individual mailboxes where they may be subject to backups and journaling in case they needed to be recovered for whatever reason including subpoenas (eDiscovery).
The Office Graph in Office 365 now provides the ability to connect the content in email with other content across that ecosystem, as noted in James Lappin’s post above. This is new – but it doesn’t rely on metadata or copying emails anywhere.
Metadata in SharePoint
As a SharePoint service, OneDrive for Business has the same default metadata columns. According it will not be described further here.
What metadata is required?
Organisations that plan to manage records in SharePoint should consider the following questions as part of their overall information architecture design to ensure records are kept in logical aggregations rather than randomly. This is important especially if end-users are allowed to create Office 365 Groups or Teams.
What point of capture and process metadata is required (for compliance, regulatory, recordkeeping purposes)? What is the source of this requirement?
Is there a difference in the metadata requirements for short-term (retain in the organisation) and permanent records that are to be transferred to archival institutions?
Do the required metadata columns already exist in SharePoint?
If they don’t exist, should the additional metadata columns be added as site columns or library columns?
Does any of the metadata need to be mandatory, and/or can it be a default setting – for example, a metadata column that has the default function and/or activity so the user doesn’t need to add this.
Where is the process metadata and how do you view or manage it? (See also below on this subject).
Information architecture and metadata
The information architecture of SharePoint, in terms of managing records as objects (e.g., documents, spreadsheets, images, etc), is relatively simple:
SharePoint site. The primary aggregation that can be linked to a business function (e.g., ‘Financial management’).
Document library/ies. Logical aggregations or containers of records that can be linked to business activities (e.g., ‘Meetings’).
Folders, document sets as content aggregations.
An effective site architecture can replace the requirement for metadata. For example, the name of the SharePoint site can map to a business function, and library names can map to activities, instead of applying a function and activity pair to each record. The URL address for the record provides the context:
If additional metadata is still required, SharePoint has extensive and almost unlimited capability.
Every new SharePoint site comes with a standard set of around 240 metadata ‘site columns’. The metadata columns include the Dublin Core metadata items.
New metadata columns can be created at the site level (‘site columns’). These are then can be used by all libraries and lists on the site. Here is a useful description of how to add new site columns from ShareGate: SharePoint 101: SharePoint Site Columns.
Every new SharePoint library comes with a standard set of metadata columns – see below. New metadata columns can be created at the library (or list) level, but these columns are only available to that specific library or list.
Default SharePoint document library columns
The default library metadata columns are as follows. Dublin Core metadata items are shown with [DC]:
App Created By
App Modified By
Check In Comment
Checked Out To
Compliance Asset Id
Created By [DC]
Document ID (when enabled as a feature)
Folder Child Count
Item Child Count
Item is a Record
Label applied by
Modified By Name [DC]
Retention label Applied
How metadata is added to records in SharePoint
Every digital record saved to SharePoint will have some form of native metadata (payload). Additional metadata may be added when the document is saved; this may be optional or mandatory.
When a digital record is saved to SharePoint, SharePoint only copies the title or name of the record, not the original created date or author.
When a Microsoft Office document is saved to a SharePoint document library, the Office document stores the library metadata (including the unique Document ID) in its own XML-based properties. This information is retained with the record even when the record is downloaded from SharePoint.
Viewing the metadata
The metadata that describes the content stored in the SharePoint document library may be viewed in multiple ways (via the edit view option), and may be exported (for example if records are to be destroyed or transferred).
Every record includes a version history that provides details of who modified the content, and when (but not what changes were made unless this is recorded).
Process metadata is metadata that records events relating to the record or the aggregation in which it is kept.
Examples of process metadata include when:
Records are viewed or downloaded (date and by whom).
Records are modified (date and by whom, and ideally what changes were made).
Records are copied or moved (date and by whom).
Security controls were changed (date, by whom, and what changes were made).
Records are deleted/destroyed (date and by whom, with what authority).
While ISO 16175 describes the general requirement to keep process metadata, the actual requirement is likely to differ between organisations. Organisations with high compliance requirements, such as certain types of businesses or government, are more likely to want process metadata to be created, accessed when required, and protected against unauthorised modification.
Office 365 process metadata
Office 365 records process metadata in multiple ways in Exchange and SharePoint.
Emails generally cannot be modified after they have been sent. Accordingly, the primary process metadata for emails and Teams chat is likely to be in the deletion records stored in the Office 365 Compliance admin portal audit logs.
SharePoint/OneDrive process metadata is recorded as follows:
Viewed or downloaded, modified, copied or moved: This is recorded in the Office 365 Compliance admin portal audit logs.
Modified: This is recorded in the Date modified and Modified by metadata, as well as the version history (which also keeps the previous actual versions that can be compared if required).
Security changes: This is recorded in the Office 365 Compliance admin portal audit logs.
Destroyed: Depends, but generally this requires the capture of information manually, then stored elsewhere. For example, if the content of a document library is to be destroyed, then the metadata (along with details of the original library URL) should be exported (manually) first and saved somewhere. This is a manual process.
Note that audit log data in Office 365 is only retained for 90 days with an E3 licences, 365 days for an E5 licence.
Exchange/Outlook email has basic metadata. It is unlikely that it will ever be possible to add other metadata, unless email is copied to SharePoint document libraries.
Chats from MS Teams are stored in hidden folders in Exchange mailboxes.
Organisations that need to keep certain emails for specific compliance, recordkeeping or archival purposes, should consider capturing these in SharePoint document libraries. Organisations might also consider making more use of Office 365 Group mailboxes for business-specific content as these Groups also include both MS Teams chat and have an associated SharePoint site.
The metadata capabilities of SharePoint are unlimited but not all records need the same degree of metadata.
The majority of records can probably be managed in standard SharePoint document libraries using the default metadata columns, or with one or two additional site or library metadata columns added, where required.
The Office Graph will increasingly be able to bring together records dynamically in the context of the business or the end-user via Project Cortex and Delve, respecting security controls that may be in place. The centralised content search and retention policy capability in Office 365 will also enable businesses to find, retrieve and manage content across both Exchange and SharePoint.
AS/NZS ISO 23081 series, Information and documentation – Records management processes – Metadata for records
The main elements that impact on the management of records in Office 365 are Users (for licences), Roles and Groups, as can be seen in the screenshot.
Users – licencing and applications
Organisations that acquire Office 365 will generally have the relevant licences required (a) to set up and administer SharePoint Online, and (b) for users to use it (and OneDrive for Business).
This post assumes that organisations will have at least an E3 licence which includes SharePoint for end users, visible as an app when they log on to https://office.com, along with all other applications included in the licence, for example Exchange/Outlook, OneDrive for Business, MS Teams and so on. End users with access to these items will also be able to download and use the equivalent mobile device apps.
The three key roles that impact on the management of records in SharePoint are as follows:
Global Admin (GA)
Are responsible for managing the entire Office 365 environment. This includes creating new Groups (Security Groups, Distribution Lists and Office 365 Groups).
Are responsible for assigning key roles, including the SharePoint Administrator and Compliance Administrator (and other roles).
May have responsibility for, and/or the skills and knowledge required to set up and administer SharePoint Online and create new sites for the organisation.
May also be able to create and publish retention policies in the Compliance admin portal.
Note – Organisations that outsource the administration of Office 365 should always have at least one GA account to access the tenant if ever required. If they don’t have a log on, they should have or acquire a very good understanding of the access and privileges afforded to the outsourced company.
SharePoint Administrator (SP Admin)
The SP Admin role will usually be a ‘system’ role that is responsible for managing the SharePoint environment, including OneDrive for Business. As noted above, a GA with the right skills can also manage the SharePoint environment.
Generally speaking, SharePoint Administrators will focus on the technical and configuration aspects of SharePoint. They are not usually responsible for confirugint SharePoint to manage records, managing records, or creating and publishing retention policies. This role usually falls to either the GA or Compliance Administrator.
The Compliance Admin role is responsible, among other things, for the creation and publishing of retention labels and policies in the Compliance Admin portal. A GA can perform this role (along with all other roles) if required.
Compliance Admins will usually be responsible for disposition reviews linked with retention labels, and be involved in eDiscovery cases.
The Compliance Admin can search and view the audit logs for all activity across Office 365 and can carry out broad content searches with the ability to export the content of those searches. As this role is relatively powerful, it should be limited to key senior individuals in the organisation.
Office 365 and Security Groups
Office 365 Groups are Azure/Exchange objects just like Security Groups and Distribution Lists. Accordingly, there should be controls around their creation, including naming conventions.
As every Office 365 Group has an associated SharePoint site, organisations should consider restricting the ability for end users to create Office 365 Groups, and only allowing Global Admins and members of a Security Group to do this. Neither SharePoint Admins or Compliance Admins would normally create AD Groups.
If the ability to create Office 365 Groups is not restricted, an Office 365 Group will be created with an associated SharePoint site whenever:
A new Team is created in MS Teams.
A new Group is created from Outlook.
A new Yammer Group/Community is created.
The ability to share content externally from SharePoint and OneDrive for Business is controlled from the Office 365 Admin portal. This is a global setting that can be disabled by the Global Admins if required.
It is assumed, for the purpose of this post, that that setting is enabled to allow external sharing.
Note that enabling external sharing at the global level does not enable it globally for all SharePoint sites; sites must be individually modified to allow it.
The Compliance admin portal can be accessed by the GAs and also the Compliance Admins (and some other roles). It is where retention labels and policies are created (in line with the corporate file plan/BCS) and published, and disposition reviews are undertaken, so records managers need access.
Other options in this section that relate to the management of records include the audit logs, content search and eDiscovery.
Retention policies may be applied to all the key workloads in Office 365 where records are stored:
OneDrive for Business
Office 365 Groups
Retention labels published as retention policies are visible to and can be applied by end-users. Generally these are more likely to be applied at the document library level rather than to individual records, or in mailboxes or OneDrive for Business.
Retention policies that are not based on labels may be applied to all, or parts of, the four workloads listed above. For example, they may be applied to all, or a sub-set of Exchange mailboxes or OneDrive for Business accounts, or SharePoint sites. Retention policies may also be applied to individual or team chats in MS Teams.
Organisations seeking to use retention policies in Office 365 should understand how these work, have a plan for their implementation, and keep track of what has been applied where.
Retention policies for all mailboxes or all ODfB accounts may replace previous on-premise backup options for those workloads. It is unlikely that end-users will (or will want to) apply retention labels published as policies to individual emails or folders in mailboxes or OneDrive.
SharePoint sites are likely to have either or a combination of explicit and implicit/invisible retention policies. Implicit, single period retention policies may be more suitable for entire smaller, short-lived SharePoint sites. Explicit retention policies may be more suitable for the diverse range of content on more complex and long-lasting sites. Some sites may be created and populated around the need to keep a particular type of record for a long period of time – for example, employee records.
The Office 365 audit logs are found in the Compliance admin portal. For an E3 licence, the content in the logs is stored for 90 days.
As audit logs are an important element in keeping records, organisations may need to consider ways to retain this content for a longer period.
Note – SharePoint document libraries record the name of anyone who edited a document (and also previous versions), but they don’t record the name of anyone who simply viewed it. SharePoint lists also include audit trails, making it possible to track changes in individual rows of a list.
Content searches and eDiscovery
The Compliance admin portal provides two similar options to search for content across Office 365. Both the Content Search and eDiscovery options provide the ability to establish a ‘case’ that can be run more than once.
The eDiscovery option provides the added ability to put content on Legal Hold. Advanced eDiscovery is available with a higher licence.
Click on the links below to read the next two posts:
SharePoint Online Admin centre configuration.
SharePoint site collection provisioning and configuration to manage records.
This post highlights the need to understand how retention works in MS Teams, why it may be related to how long you keep emails (including for backup purposes), and why you need to consider all the elements that make up an Office 365 Group when considering how – and how long – to retain content in MS Teams.
Overview of retention in MS Teams
If you are unfamiliar with how retention works with MS Teams, these two related sites provide very useful detail.
The quote below from the second link is relevant to this post:
‘Teams chats are stored in a hidden SubstrateHolds folder in the mailbox of each user in the chat, and Teams channel messages are stored in a hidden SubstratesHolds folder in the group mailbox for a team. Teams uses an Azure-powered chat service that also stores this data, and by default this service stores the data forever. With a Teams retention policy, when you delete data, the data is permanently deleted from both the Exchange mailboxes and the underlying chat service.’
‘Teams chats and channel messages aren’t affected by retention policies applied to user or group mailboxes in the Exchange email or Office 365 groups locations. Even though Teams chats and channel messages are stored in Exchange, they’re only affected by retention policies applied to the Teams locations.’
One-to-one chat in MS Teams is stored in a hidden folder of the mailbox of each user in the chat. Documents shared in those chats are stored in the OneDrive for Business of the person who shared it.
Group chat in Team channels is stored in a hidden folder of the mailbox of the associated Office 365 Group – and also in an Azure chat service. Documents are stored in the Office 365 Group’s SharePoint site (other SharePoint site libraries may also be linked in a channel).
Another quote from the same post:
‘In many cases, organizations consider private chat data as more of a liability than channel messages, which are typically more project-related conversations.’
Teams content is kept in mailboxes, retention may be similar
Typically, in the on-premise past, organisations will have backed up their Exchange mailboxes (and possibly also enabled journaling, to capture emails), for disaster recovery, ‘archiving’ and investigations. Unless a decision is made to invest in cloud back-ups, Office 365 retention policies may also be applied to Exchange mailboxes, effectively replacing the need to back them up. Retention policies applied to Exchange mailboxes don’t affect the teams chat folder.
Organisations should probably apply the same retention period to both emails and Teams chats as they do to email mailbox backups now. That is, if mailboxes are typically kept for 7 – 10 years after the person leaves the organisation, then keep the Teams chats for the same period.
Note that, even if a poster deletes an item (if that option is enabled), it will still be retained if there is a retention policy.
Suggestions for retention in MS Teams
As there can be different retention requirements, depending on the subject matter, here are some suggestions for retention:
One-to-one chat is like email, you will never know everything that is being said or sent. So a single retention policy that mirrors email would be appropriate.
Teams chat is more likely to be about the subject of the Team, which is based on an Office 365 Group, its own mailbox, and has a SharePoint site. In this case, you could consider a retention policy applied to all Office 365 Groups or specific Groups – for example ‘Project Groups’, then ensure that the retention policy or policies cover all aspects of the Office 365 Group (mailbox, team chat, SharePoint).
If all the records relating to a particular subject matter (including email, chat and documents) must be retained for 25 years, then you need to understand all the options.
It underscores the need to plan carefully for retention management for all the key workloads in Office 365.
On-premise versions of SharePoint were standalone systems, usually administered by a trained and qualified SharePoint Administrator. Records managers may and may not have had access to or a role in that environment.
Generally, the only other group that would typically have access to SharePoint on-premise were the DBAs who managed the (SQL) database.
SharePoint Online is no longer a standalone system but a core part of the Office 365 ecosystem.
This post describes, for records and information managers, how SharePoint Onlineneeds to be understood in the context of the broader Office 365 administration, and how other admin roles can configure or change settings that can affect SharePoint and OneDrive for Business.
The highest level admin role in Office 365 is the Global Admin (GAs).
To protect the security of Office 365, there should be a very small number of GAs. GAs should have unique cloud-only log ons preferably using multi-factor authentication for added security. End-user accounts should NEVER be assigned the GA role.
GAs can access everything across Office 365, including the content of emails, SharePoint, OneDrive for Business and MS Teams. All activity carried out by GAs (and anyone else) is recorded in the audit logs.
Organisations that outsource the GA role to third-party companies need to be aware of the capability of the GA role and, ideally, also have at least one GA log-on account so they can, among other things, access the tenant and review the audit logs if required.
The key activities that GAs are responsible for, that impact on the management of SharePoint, are as follows.
Assigning licences. Licences (e.g., E3) provide user access to the various applications in Office 365, including Exchange, SharePoint, OneDrive for Business, MS Teams and Office (via http://www.office.com). Generally speaking it is inadvisable to remove individual options from licences. Note that the SharePoint licence gives access to use the application, it is not the admin role (next point).
Assigning roles. Roles provide admin access to the core applications (listed in the previous point) and to a range of activities (for example, Billing, Compliance, Security, User Admin). Office 365 Admin roles should always be cloud only and never assigned to normal end-user accounts. This ensure that the person logs on to perform an admin activity, as opposed to a general end-user activity. It is common (and good) practice for users may be logged on to two ;separate accounts at the same time.
Creating Groups. Groups are Azure/Exchange objects. The three main types of groups are: (a) Security Groups that control access to resources but are not email enabled; (b) Distribution Lists that provide the ability to email multiple people but don’t control access to resources; and (c) Office 365 Groups that a cross between Security Groups and Distribution Lists with much more capability. Office 365 Groups are a core element across Office 365. Every O365 Group has (a) an email mailbox, (b) and a SharePoint site. If the ability to create these types of groups is not controlled, every new Team in MS Teams will create an O365 Group with a SharePoint site (with no controls on naming). Accordingly, there needs to be close cooperation between the GA, the SharePoint admin and/or the records/information manager in relation to the creation of O365 Groups.
Enabling external access for SharePoint. This setting allows the GA to determine whether SharePoint sites and OneDrive for Business, and the content in them, can be shared externally. The setting only makes the option available for SharePoint sites but allows ODfB content to be shared externally. Individual sites must still be enabled (by the SharePoint admin) for external access.
SharePoint/OneDrive for Business Admin
The SharePoint Admin will normally be a qualified SharePoint administrator and may have administered earlier versions of SharePoint. They will also generally be the OneDrive for Business admin (as OneDrive is a SharePoint service).
The SharePoint Online admin role is much less complex in Office 365 than it was in the on-premise version. Records managers who currently manage an EDRMS could potentially become a SharePoint admin, with some training.
Additional training is required only if the organisation wishes to do additional customisation or development work, integration, or has third-party applications.
The SharePoint Admin has a number of roles:
Configuring SharePoint settings in the admin portal. This is usually a one-off activity that may be reviewed from time to time. Configuration settings should be documented.
Creating new SharePoint standard and communication sites – but NOT ‘modern’ team sites that are based on Office 365 Groups, as noted above. These should be created by the GAs who will need to be advised about (a) preferring naming conventions (if any) and (b) Group ownership and membership (which flows through to SharePoint site ownership and membership).
Provisioning new sites. This activity involves changing site collection features and site features to enable things like Document IDs and Document Sets. It also includes assigning the initial Site Collection Admin and Site Owner permissions. It may also include some basic additional options such as a new document library or list.
Assigning access and permissions. Records managers who have responsibility for managing records in SharePoint should be added to the Site Collection Admin section, ideally as part of a Security Group. This ensures that records managers can access all SharePoint sites as required (including the Preservation Hold library on sites where implicit retention policies have been applied) and, if they have the responsibility to do so, create and configure new document libraries to manage records. Both Site Collection Admins and Site Owners can apply explicit (visible) retention policies to document libraries and lists, if used.
Monitoring and managing the SharePoint environment, including resolving issues and working with Site Owners.
Managing the OneDrive for Business admin portal, including setting (a) the size of the ODfB storage and (b) the retention period for ODfB accounts after an end-user leaves.
Providing training to Site Owners, if no other training is provided.
The relationship between the various Office 365 admin elements, SharePoint admin, and the end-user experience is described in the graphic below.
SharePoint admins access the SharePoint admin portal by logging on to http://www.office.com, clicking on the ‘Admin’ option, the then SharePoint admin portal (or directly to that admin portal if they save it as a favorite).
End-users access SharePoint by logging on to http://www.office.com and clicking on the ‘SharePoint’ app, or via the mobile app.
Exchange Online Admins
The primary role of the Exchange Online (EXO) admin is to manage that application. The EXO admin may also be the MS Teams admin – see below.
If the creation of Office 365 Groups is not controlled as noted above, both EXO admins and end users can create a new Office 365 Group from Exchange or Outlook which in turn creates a new SharePoint site.
While emails can be copied from Exchange to SharePoint, Microsoft’s model assumes that the vast majority of emails will remain in end-user mailboxes.
Records managers need to work closely with the EXO admin/s and the Compliance admin/s (see below) to ensure that an appropriate Office 365 retention policy is applied to the content of the mailboxes. There may also be a requirement to remove the default MRM policies.
An Office 365 retention policy may initially appear to conflict with, but can support and replace previous backup strategies deployed to recover mailboxes in case of disaster or for investigation purposes. This means that a single retention policy that keeps all emails for a specific period of time will be applied to all mailboxes.
MS Teams Admins
The role of the MS Teams admin is to configure and manage the MS Teams environment. As noted above, the EXO admin may also be assigned the role of MS Teams admin.
MS Teams includes two main component parts:
Chat. One-to-one chats are stored in a hidden folder in the Exchange mailboxes of individual users. Channel chats are stored in a hidden folder in the mailbox of the linked Office 365 Group. These hidden folders are not subject to a retention policy applied to the rest of the mailbox.
Documents. These are stored in either (a) OneDrive for Business for one-to-one chat, or (b) in the linked SharePoint site for Teams channels.
Records managers should work with the MS Teams admin and the Compliance Admin to identify how retention policies will be applied to both the chat and SharePoint content in MS Teams.
Microsoft separated the Security and Compliance portals in early 2020. Consequently, there may be an admin to manage each component part – one for Security and one for Compliance.
The Compliance admin portal includes a range of actions relating to the management of information. These actions include:
Data classification. This option is still in preview but for E5 licence holders, will allow data to be classified automatically and retention policies applied to that content as an alternative to pre-defined (SharePoint site/library) ‘classification’.
Setting and monitoring alerts.
Viewing reports on various compliance matters, including the status of retention policies.
Creating and monitoring retention labels and policies. This includes retention policies for Exchange mailboxes, SharePoint Online, OneDrive for Business, and MS Teams.
Creating and monitoring data loss prevention policies.
Assigning permissions to individuals.
Managing GDPR data subject requests.
Searching audit logs (90 days of history only).
Searching for content across all of Office 365.
Reviewing disposition for records covered by explicit retention label policies, where this option is enabled
Some or all of these roles may be performed by senior records or information managers.
The Security admin portal provides access to the following actions, some of which may impact on SharePoint (sensitivity labels in particular).
Reviewing security related reports
Creating and managing sensitivity labels and information types (and also creating and publishing retention labels)
Creating a range of security-related policies including for devices, threat protection
SharePoint on-premise was a standalone system that generally did not interact or integrate much with other systems.
SharePoint Online is a core part of the broader Office 365 ecosystem. A range of roles and configuration settings set across that ecosystem have – or can have – a direct impact on SharePoint Online.
Records managers who are involved with SharePoint Online need to understand this crucial difference and either learn or seek to be assigned key roles that impact on the management of records across the Office 365 ecosystem, not just in SharePoint.
The retention of records in Exchange Online (EXO), SharePoint Online (SPO), OneDrive for Business (ODfB) and Office 365 (O365) groups can be achieved through the application of retention labels published in the O365 Security and Compliance admin portal.
This post describes:
How retention labels work (in summary), including the ‘per record’ rather than the container/aggregation retention model.
What happens to content in Office 365 when a retention period expires.
The options and actions that may influence the way retention labels/policies are configured, where and how they are applied, and the outcomes required.
The post highlights the need for information and records managers to be involved in all aspects of governance, site architecture and design, and decisions around specific settings and configuration, as well as being assigned specific roles, when Office 365 is implemented.
A quick summary of how O365 retention labels work
Records retention policies in O365 are based on ‘retention labels’ that are created in the O365 Security and Compliance admin portal under the ‘Classifications’ section. Multiple labels can be applied to a single policy.
Click this link to read Microsoft’s detailed guidance on retention policies.
Each retention label defines one of three potential outcomes at the end of the retention period, if retention is enabled, ‘keep forever’ is not selected, and the label is not used to classify the content as a record*:
The content will be automatically deleted. If the content is in SharePoint, it will first be sent to the Recycle Bin, from which it can be recovered within 90 days.
This option may be suitable for certain types of low value records.
A disposition review will be triggered to notify specific people. As with the previous point, SharePoint content will be sent to the Recycle Bin if a decision is made to delete it.
This option will require additional, human-intervention actions, as described below, if standard records management disposal review processes are followed.
The date when the above action will occur is based on one of four triggers:
Date last modified
When labelled applied
A event. The ‘out of the box’ (OOTB) event types are:
Employee activity. (Processes related to hiring, performance and termination of an employee)
Expiration or termination of contracts and agreements.
Product lifetime. (Processes relating to last manufacturing date of products).
A new event can also be added.
See this post for Microsoft guidance on event-driven retention.
An additional alternative option is available: ‘Don’t retain the content, just delete it if it’s older than n days/months/years.’ This is similar to the automatic deletion option above and may be suitable for certain types of records.
Declaring content as records
* The option to classify or ‘declare’ content as a record is not discussed further as relates to the way records are managed in the US. Microsoft’s guidance on labels notes that: ‘At a high level, records management means that: (a) Important content is classified as a record by users. (b) A record can’t be modified or deleted. (c) Records are finally disposed of after their stated lifetime is past.’ The standard on records management, ISO 15489, defines a record as ‘evidence of business activities, often (but not exclusively) in the form of a document or object, in any form’. This means that anything can be a record. The record may continue to be modified throughout its life.
When do retention labels become active?
Retention labels become active only when they are published. As part of the publishing process, a decision must be made if the label will apply to all (a single option) or selected parts of the O365 ecosystem:
The Exchange Online (EXO) mailboxes of all or specific recipients, or excluding specific recipients.
All or specific SharePoint Online (SPO) sites, or excluding specific sites.
All or specific OneDrive for Business (ODfB) accounts, or excluding specific accounts.
All or specific O365 Groups, or excluding specific groups. Note that content in Microsoft Teams (MS Teams) is included in the O365 Groups options that include both the SharePoint content and email/Teams chat content.
Auto-applying retention labels
Both the retention label and policy sections include the ability to auto-apply a retention policy if certain conditions are met.
Sensitive information types. These are the same types that appear in the Data Loss Prevention (DLP) section, for example ‘Financial data’ or ‘Privacy data’.
Content types and metadata (E5 licences only). See this post by Joanne Klein for a description of these options.
The ability of the first two options to accurately identify content and apply a retention policy should be investigated before they are relied on.
If you publish retention labels to SharePoint or OneDrive, it can take one day for those retention labels to appear for end users. In addition, if you publish retention labels to Exchange, it can take 7 days for those retention labels to appear for end users, and the mailbox needs to contain at least 10 MB of data.
In EXO, the default MRM policy needs to be removed before the new policy applies.
In ODfB, the policy is available to be manually applied on folders or documents. It does not automatically apply to content.
In SPO, the policy can be applied to document libraries or documents. To avoid removing the ability for users to legitimately need to delete documents in an active library it is recommended to apply the policy after the document library has ceased to be active.
Content in Office 365 Groups is covered by either the EXO (for email/teams chat content) or the SPO policy (applied to libraries).
Retention labels apply to individual records within aggregations
Records labels can be applied to aggregations of records (an entire email mailbox or folder, a SharePoint library or list, an ODfB account, O365 Groups) or individual records. However, the disposal process targets individual records (e.g., individual emails, single documents in SharePoint libraries, individual list items).
That is, even when all the individual records are disposed of, the parent aggregation remains in place without any indication that the records previously stored in it (sometimes known as a ‘stub’) have been destroyed.
This outcome has implications for the way the outcome of a retention label is set. It requires a choice between (a) delete automatically without review or (b) review before delete.
The latter option is made complicated by the requirement to review individual documents, including potentially in the original container (document library in SPO) and export metadata relating to those records if a record of the deletion is to be retained.
What happens when records reach the end of their retention period
As noted above, the outcome at the end of the retention period (trigger date + n days/months/years) will depend on the settings on the label.
Where the label was applied (EXO mailbox, SPO library or list, ODfB folder or document, O365 Group)
Whether the records would be deleted automatically or be subject to a disposition review.
If the records are to be deleted automatically:
SPO and ODfB records will be sent to the site/ODfB Recycle Bin for 90 days
EXO emails will be moved to a ‘Cleanup’ area for 14 days, before permanent deletion.
Aside from the audit logs (which by default only go back 90 days), no other record will be kept of the destroyed records.
If the records are subject to a disposition review, an email is sent to the person nominated. When that person clicks on the link in the email they are taken directly to the ‘Dispositions’ sub-section of the Records Management section of the O365 Security and Compliance centre.
It is arguable that retention policies with disposition review should not be applied to ODfB content as this will require the reviewer to review all the content that has been labelled by a user in their ODfB account.
For more information about this subject see this Microsoft page ‘Overview of disposition reviews‘. Microsoft note, on that page ‘To get access to the Disposition page, reviewers must be members of the Disposition Management role and the View-Only Audit Logs role. We recommend creating a new role group called Disposition Reviewers, adding these two roles to that role group, and then adding members to the role group.’
The dispositions dashboard shows the number of records that are pending disposition against each retention policy label:
Pending disposition tab
When the reviewer clicks on one of the retention policies listed, the following view opens for records ‘Pending disposition’:
An important point to note here is that records are listed individually, not in logical aggregations or collections. It is possible however to use the Search option on the left to filter by author (emails) or SharePoint site and/or site library. It is also possible to export the details (which does not include any unique metadata applied to documents in SharePoint libraries).
All the records displayed may then be selected and a ‘Finalise decision’ dialogue box appears with the following options:
Dispose of the records.
Extend the retention.
Re-label the records.
Disposed items tab
The Dispositions dashboard includes a ‘Disposed items’ tab.
Microsoft note that this tab ‘… shows dispositions [that] were approved for deletion during a disposition review and are now in the process of being permanently deleted. Items that had a different retention label applied or their retention period extended as part of a review won’t appear here.’
Importantly, once records are permanently deleted, they no longer appear in the ‘Disposed Items’ tab. This means that no record will be kept of the records that were destroyed.
Shortcomings of the O365 dispositions/disposal model for records stored in SPO
Only individual records appear, not all the items in a document library
If the retention outcome is based on the ‘created’ or ‘last modified’ date, individual records in SPO document libraries will start to appear as soon as they reach the retention end date. The reviewer may need (or want) to view the original library, which they can identify from the link is in the dispositions review pane.
Retention policies prevent deletion
As a retention label prevents the deletion of content by users, and this may put them off using SharePoint, it is recommended that retention in SPO document libraries be based on when the label was applied NOT when it was created or last modified. This will help to ensure that all documents appear in the disposition review area at the same time.
Event based triggers may not be suitable for disposition review
If the retention outcome is based on an event, or is auto applied and a disposition review is required, those records will appear randomly when the event is triggered. It could be difficult for records managers to decide the disposal outcome in this way without referring back to the library.
The dispositions review pane does not display the original metadata
The dispositions review pane displays only very basic metadata from the original library. Again, the reviewer may need to view the original library, export the metadata and store that in a secure location. Note that the exported metadata includes the URL of each original record including the library name.
The document library remains even when all contained records are destroyed
If the reviewer chooses to dispose of the records listed, only the content of the library (the individual documents stored in it) is deleted, not the actual library itself. No record (e.g., a ‘stub’ of the deleted item) is kept in the library of the deleted content.
The ‘Disposed items’ tab only shows records being destroyed
The ‘Disposed items’ tab only shows records in the process of being destroyed. It does not keep a record of what was destroyed. Records managers will need to retain the metadata of what was destroyed, when, based on what disposal authority, and with whose approval.
Dispositions really only provides a ‘heads up’ for further action
The Dispositions process may be instead used as a form of ‘heads up’ that records are starting to be due for disposal in a document library. This would allow the records managers (who should be Site Collection administrators) to review the library, export the complete set of metadata, and decide if the entire library can be deleted since it is no longer required.
Retention labels in O365 are an effective way of managing the retention and disposal of records in that environment, subject to the following points.
Emails will likely continue to be managed as complete aggregations of records – the mailbox. Users cannot be expected to create logical groupings and apply individual retention labels to those records.
Organisational records policies may mandate specific timeframes for the retention of email (e.g., 1 year), while HR/IT security policies may mandate that whole mailboxes are retained for a period of time after employees leave. It is important to understand the difference between these two models
Options to automatically transfer emails to SharePoint document libraries via rules may be possible using Flow but these rely on individual users to set up.
Consideration should instead be given to using O365 Group mailboxes, rather than individual personal mailboxes, for specific work related matters. For example, ‘Customer Complaints’, or ‘XYZ Project’.
OneDrive for Business Accounts
ODfB accounts may be covered by two forms of retention:
Retention labels that apply to all ODfB accounts while the account is active. These must be manually applied by users.
A separate retention period set for ODfB accounts after a user leaves the organisation.
If there is a requirement to prevent the deletion of content by a user from their ODfB account, the better way to achieve this is using an eDiscovery case with Legal Hold applied.
As most records will be stored in SharePoint document libraries (including Office 365 Group-based SP libraries), multiple retention labels will be required to address different types of content or retention requirements.
Careful consideration should be given to whether records can be deleted automatically at the end of the retention period or should be subject to disposition review, noting that the automatic deletion provides no opportunity to capture the metadata of the records.
The ‘auto-apply’ or event-based retention option should be used sparingly to avoid a trickle of records for disposal – unless there is enough trust that these can be accurately marked and deleted without review.
Shortcomings in the disposition review process support the following decisions for SharePoint Online content:
The number of retention labels should be minimised to avoid a very long drop-down menu when a label is applied. If current record retention or disposal authorities contain a lot of classes, some of these could potential be combined into a single class (e.g., ‘Company Records – 7 years’), while the site name and document library name should provide some context to the content to ‘map’ back to the original classes.
Retention labels should be applied when document libraries (or lists) become inactive as this will avoid conflict with users who want to delete content and also ensure that documents are ready for disposition review at the same time.
Retention labels applied to SPO document libraries should include the disposition review option unless a ‘delete only’ label is considered suitable for certain document libraries that clearly contain working documents or Redundant, Outdated and Trivial (ROT) content.
Records managers should review the content of all or most original SPO document libraries, and export the metadata of those libraries for storage in a separate location (such as an ‘archives’ site), or in the original library with the retention label changed to ‘Never Delete’. The original document library can then be deleted.