When people chat in Microsoft Teams (MS Teams), a ‘compliance’ copy of the chat is saved to either personal or (Microsoft 365) Group mailboxes. This copy is subject to retention policies, and can be found and exported via Content Search.
But what happens if there is no Exchange Online mailbox? It seems the chats become inaccessible which could be an issue from a recordkeeping and compliance point of view.
This post explains what happens, and why it may not be a good idea (from a compliance and recordkeeping point of view) not to disable the Exchange Online mailbox option as part of licence provisioning.
Licences and Exchange Online mailboxes
When an end-user is allocated a licence for Microsoft 365, a decision (sometimes incorporated into a script) is made about which of the purchased licences – and apps in those licences – will be assigned to that person.
E1, E3 and E5 licences include ‘Exchange Online’ as an option under ‘Apps’. This option is checked by default (along with many of the other options), but it can be disabled (as shown below).
If the checkbox option is disabled as part of the licence assigning process (not after), the end-user won’t have an Exchange mailbox and so won’t see the Outlook option when they log on to office.com portal. (Note – If they have an on-premise mailbox, that will continue to exist, nothing changes).
Having an Exchange Online mailbox is important if end-users are using MS Teams, because the ‘compliance’ copy of 1:1 chat messages in MS Teams are stored in a hidden folder (/Conversation History/Team Chat) in the Exchange Online mailbox of every participant in the chat. If the mailbox doesn’t exist, those copies aren’t made and so aren’t accessible and may be deleted.
If end-users chat with other end-users who don’t have an Exchange mailbox as shown in the example below, the same thing happen – no compliance copy is kept. The chat remains inaccessible (unless the Global Admins take over the account).
The exchange above, between Roger Bond and Charles, includes some specific key words. As we will see below, these chats cannot be found via a Content Search.
(On a related note, if the ability to create private channels is enabled and they create a private channel and chat there, the chats are also not saved because a compliance copy of private channel chats are stored in the mailboxes of the individual participants.)
Searching for chats when no mailbox exists
As we can see above, the word ‘mosquito’ was contained in the chat messages between Roger and Charles.
Content Searches are carried out via the Compliance portal and are more or less the same as eDiscovery searches in that they are created as cases.
From the Content Search option, a new search is created by clicking on ‘+New Search’, as shown below. The word ‘mosquito’ has been added as a keyword.
We then need to determine where the search will look. In this case the search will look through all the options shown below, including all mailboxes and Teams messages.
When the search was run, the results area shows the words ‘No results found’.
Clicking on ‘Status details’ in the search results, the following information is displayed – ‘0 items’ found. The ‘5 unindexed items’ is unrelated to this search and simply indicates that there are 5 unindexed items.
Double-checking the results
To confirm the results were accurate, another search was conducted where the end-user originally did not have a mailbox, and then was assigned one.
If the end-user didn’t have a mailbox but the other recipient/s of the message did, the Content Search found one copy of the chat message in the mailbox of the other participants. Only one item was found.
When the Exchange Online option was enabled for the end-user who previously did not have a mailbox (so they were now assigned a mailbox), a copy of the chat was found in the mailbox of both participants, as shown in the details below (‘2 items’).
Summary and implications
If end users chat in the 1:1 area of MS Teams and don’t have an Exchange Online mailbox, no compliance copy of the chat will be saved, and so it will not be found via Content Search.
If any of the participants in the 1:1 chat have an Exchange Online mailbox, the chat will appear in the mailboxes of those participants.
If all participants in the 1:1 chat have an Exchange Online mailbox, the chat will be found in the mailbox of all participants.
Further to the above:
If end users can delete chats (via Teams policies) and don’t have a mailbox, no copy of the chat will exist.
If end-users with a mailbox can delete Teams chats, but a retention policy has been applied to the chats, the chats will be retained as per the retention policy (in a hidden folder).
And finally, if you allow private channels, end-users can create private channels in the Organisation Team. The chats in these private channels are usually stored in the personal mailboxes of participants (not the Group mailbox) – so these chats will also be inaccessible and cannot be found via Content Search.
The implications for the above are that, if you need to ensure that personal chat messages can be accessed (from Content Search), then the participants in the chat must have an Exchange Online mailbox.
Further, if you allow deletion of chats but need to be able to recover them for compliance purposes, a retention policy should be applied to Teams 1:1 chat.
(Updated 3 September 2020 with reference to customised admin roles)
Microsoft 365 is a cloud-based collaboration and content system that includes a wide range of functionality to create, capture and manage records, primarily in SharePoint Online but also in OneDrive for Business, Exchange Online and in MS Teams.
This post outlines the roles and permissions required by records managers to manage records in Microsoft 365.
Whether all the roles and permissions will be granted may depend on a number of factors including technical competence, security and risk. Where they are not granted, records managers will need to ensure that the relevant IT resources can and will set up and manage the recordkeeping functionality as required.
Azure AD/Microsoft 365 Admin Center roles
There are around 50 roles that can be assigned to individuals in the Microsoft 365 admin center or the Azure Admin portal (which includes 11 more roles).
These roles may be grouped as follows:
Admin. For example, Global Admin and the Admins for Exchange Online, MS Teams, and SharePoint Online/OneDrive for Business.
Security and Compliance. For example, Security Admin, Compliance Admin, Compliance Data Admin
Identity management. For example, Authentication Admin, Guest Inviter, Licence Admin, Password Admin, User Admin
Device management. For example, InTune Admin, Printer Admin
Reader. For example, Global Reader, Message Center Reader, Reports Reader, Security Reader
There is no specific ‘records manager’ role in Microsoft 365. The closest in terms of functionality is the Compliance admin role that includes several several sub-roles including ‘RecordManagement’, ‘Disposition Management’ and ‘Retention Management’. Alternatively, a custom role may be created with those (and a couple of other) sub-roles, thereby restricted access to only the sub-roles that are specific to or required by records managers.
In addition to the role and sub-roles required to access the Compliance portal and carry out records management activities, records managers should also be assigned the Global Reader and Reports Reader roles so they can access and view the various dashboards on the Microsoft 365 admin center:
Audit logs. These cover the entire Microsoft 365 environment, kept for only 3 months (E3) or 12 months (E5).
Content Search (effectively eDiscovery)
Information Governance (where retention labels and retention policies are created and managed)
Records Management (which is essentially an extended set of IG functionality, including auto-application of labels, available to E5 licence holders, and disposition management)
Access to the Compliance admin portal is restricted to the Global Admins and Compliance Admin and Compliance Data Admin roles. These two roles include various sub-roles (including sub-roles that are not relevant to records management) that are described in considerable detail in this Microsoft page ‘Permissions in the Security & Compliance center‘.
The sub-roles that are most relevant to records managers are:
RecordManagement (required to manage and dispose record content)
Retention Management (required to create retention labels)
View-Only Audit Logs (audit logs cannot be modified)
Disposition Management (required to manage disposition)
Compliance Search (required to conduct a global ‘case’ search of anything anywhere in the Microsoft 365 platform, including ‘personal’ mailboxes and 1:1 Teams chats)
It is recommended that records managers – or select individuals with higher compliance responsibilities, be assigned either to one of the two Compliance Admin roles, or a custom role group with just the sub-roles listed above. This will enable records managers to access the Compliance portal to create, apply and manage records retention policies. They will also have access to the audit logs and content search options.
Note: The ‘Audit logs’ sub-role is actually assigned via a role group in the Exchange Online admin portal under the Permissions section. The three key roles in this section that contain these sub-roles are ‘Organisation Management’, ‘Compliance Management’ and ‘Records Management’. As the first two contain a very long list of sub-roles, it is recommended that the records manager/s be added to the ‘Records Management’ role group that includes the ‘Audit logs’ and ‘Retention Management’ sub-roles.
SharePoint Admin roles
From an admin point of view, there are essentially three SharePoint admin roles:
SharePoint administrator. This person has access to the SharePoint admin portal, manages the settings, creates and provisions new sites, and monitors the environment. They are usually also responsible for troubleshooting issues and may have some responsibility for development (including scripts) and customisations or integrations. Subject to the size and complexity of the environment, a records manager with good technical skills, including being an EDRM system admin, may be able to take on the role of SharePoint admin with some training. In most cases, however, this is likely to remain a specialised IT role.
Site Collection Administrator. This role sits between the SharePoint Admin and the Site Owner role and provides ‘back-end’ access to the SharePoint site. Generally speaking, the SharePoint Admin will always be a Site Collection Administrator, ideally added via an AD Security Group. If records managers are added to this AD Security Group, and that Group is added to the Site Collection Admin section of every SharePoint, they will have the ability to access every site (with all access and actions recorded in the audit logs). This access can be revoked on individual sites if necessary.
SharePoint Site Owner. The person assigned to this role will usually be someone working in the business area or group responsible for day to day management of the site. Records managers should not be Site Owners as this suggests that the records managers have day to day responsibility for managing the site (creating libraries for example).
Other factors to consider
Any content stored in OneDrive for Business accounts, Exchange mailboxes and MS Teams will remain accessible via a Content Search as long as it exists. If no retention policy has been applied to these workloads and the end-users deletes that content, there is no way to retrieve the deleted content after minimum periods (90 days for ODfB, 14 days for Exchange mailbox content).
The OneDrive portal includes a Storage section that determines how long the content will be retained after the account becomes inactive. This is separate from any retention policy that may be applied to the accounts via the Compliance portal. Records managers should understand these two elements (retention and storage period).
The Exchange Online admin portal includes a number of legacy recordkeeping elements, in particular the Messaging Records Management (MRM) policies in the compliance/retention policies section. Records managers do not need to be assigned the role of Exchange Online admin but need to engage with the admins regarding the application of Microsoft 365 retention policies. While it is possible to apply label-based retention policies to Exchange mailboxes, including advanced auto-application with E5 licences, in practice it may be much simpler to apply a few broad non-label retention policies to mailboxes.
The MS Teams admin portal does not include any recordkeeping settings or elements. However, the records manager should discuss and determine suitable retention requirements for both 1:1 chats and channel chats with the relevant admin. These are created and added via the Compliance admin portal. It is not possible to apply a label-based retention policy to Teams chats, accordingly there is (currently) no disposition review record of what was destroyed.
Records managers need an appropriate level of access to the Microsoft 365 ecosystem to manage records that have been created, captured and stored across the system. The following is recommended:
Global reader and Reports reader. These two roles provide read-only access to dashboards in the Microsoft 365 Admin portal, allowing records managers to review volumes and activities in the various workloads.
Compliance admin or a customised role group. The role group allows the creation and management of records retention policies and dispositions. It also provides access to audit logs and global content searches.
SharePoint admin (optional). This role would be suitable for a records manager with the required level of technical competence to manage SharePoint.
SharePoint Site Collection Admin (via a Security Group). This role allows records managers to access every site where the Security Group has been added to the Site Collection Admin group.
Two types of retention policy can be created in Microsoft 365:
Label-based retention policies, where the label is used to define the retention and retention outcomes. Labels must be published in a retention policy, a process that includes determining where the labels will be applied and appear (‘explicit’) to end users.
Non-label-based retention policies, where the policy includes the retention details and the outcomes. As part of the policy creation, these policies are then applied to specific Microsoft 365 workloads where they are mostly invisible to end-users (except in Exchange mailboxes). In SharePoint and OneDrive for Business, these policies create a Preservation Hold library that is only visible to Site Collection Admins and above.
It is possible to apply both a label-based retention policy and a non-label retention policy to the same SharePoint site. In theory, this would allow for (a) everything on the site to be covered by an overarching retention policy and (b) specific libraries or lists to be covered by a label-based policy.
In practice, it gets a little complicated, as described in this post.
Creating the two labels
For the purpose of this post, I will apply the two types of policy to a SharePoint site (‘FinanceAP’) that contains specific types of financial information that needs to be kept for 7 years, but I want to allow other content on the site to be destroyed after 5 years.
Retention labels are created in the Information Governance section of the Compliance admin portal in Microsoft 365. I created a label titled ‘Financial records’ with a retention period of 7 years. I then published that label to a retention policy named ‘Financial Records – 7 years’ and applied it only to the FinanceAP site.
More than one label can be published in the same policy, making this a useful option if your SharePoint architecture ‘maps to your file plan or Business Classification Scheme (BCS) and your records retention classes are based on either. It also allows you to create and add the same retention class for types of records that occur in multiple functions where the classes have the same retention – for example, ‘Meetings – 7 years’ or ‘Policy – 10 years’.
Once the policy has been published to a site or sites, the option (in Library Settings) to ‘Apply label to items in this list or library’ can be used to choose which label will apply to the content in the library, as shown below.
If the column ‘Retention label’ is checked, the retention label name appears in that column.
Non-label retention policy
Non-label retention policies are also created in the Information Governance section of the Compliance admin portal which also (a little confusingly) lists all the label-based policies as well.
The process of creating these policies includes the retention (e.g, 5 years) and retention outcome (delete) definitions, as well as the location where the policy will be applied.
For the purpose of this post I created a retention label named ‘Financial Working Records – 5 years’ and applied it to the same site (only) as the label-based policy.
I should expect now to find a Preservation Hold library (via Site Contents as a SharePoint admin) when something is deleted.
At this point, I have two retention policies, (a) one label-based and applied to the site, and (b) one that applies to the whole site.
What happens now?
In the document library where the label-based policy has been selected, I can see that the retention label (Financial Records) that has been applied to items in this library.
This means that I cannot delete this document unless (as an end-user with edit rights or admins) the retention label is removed. However, as we will see below, another policy is working behind the scenes.
In a document library where no label-based policy has been applied, I can see that no label appears under the Retention label policy. From an end-user point of view, it appears that the record can be deleted – or is it?
As this site is the subject of an ‘implicit’ or invisible retention policy that has been applied to the entire site, any attempt to delete anything will be captured by the back-end Preservation Hold library seen below via Site Contents (visible to Admins only).
Interestingly, any attempt to delete a document from a library where a label-based retention policy has been applied, which is ‘denied’ in the actual library, is recorded in the Preservation Hold library, although the document remains in the original library.
If anyone with access to the Preservation Hold library tries to delete that item there, they will receive this message:
The only way to remove this item is to remove the policy.
There are two ways to create retention policies in Microsoft 365 – (a) by creating a label and publishing it as retention policy, or (b) creating a retention policy without a label. Both are created from the Information Governance section of the Compliance portal in Microsoft 365.
This post describes how the policies are created and applied, and the outcome for each.
Retention policy based on a label
Retention labels are created from the Labels tab of the Information Governance section in the Compliance portal.
Ideally, the name of the label would be based on a disposal class in a records retention schedule or records disposal authority. For the purpose of this post, the new label was named ‘Test One – Retain five years then delete‘.
Adding the details of the retention provided a quick visual clue to the retention details.
The label was configured to have a retention period of 5 years after the date modified after which the content could be deleted, with no disposition review. No record would be kept of what was deleted.
Alternatively, it would be possible to select the Disposition Review option. This option sends an alert to the account indicated so the records could be reviewed before disposal. However, keep in mind that if the trigger is left as ‘Date created’ or ‘Date modified’, records will be ready for disposition review based on those dates. It might be better to choose ‘Date applied’ instead so that a group of records could be reviewed at the same time – for example, in a SharePoint library.
Retention labels have no life unless they are published. This is achieved from the ‘Publish labels’ option in the same tab.
One or more labels can be published in a retention policy. This could be useful if the records disposal authority contained multiple labels for a single function.
As part of the publishing process, a decision has to be made to apply the policy to somewhere in Microsoft 365. In this case, the decision was made to apply it to a single SharePoint site because that site contained records that mapped to the retention label.
Finally, the new policy was given a name. In this case the policy had only one label so the policy had the same name as the label.
If the organisation’s records disposal authority/retention schedule had multiple retention classes mapped to a business function, the policy could be named after the function and include all the relevant classes created as labels.
All of these labels would appear in the drop down list from the Library Settings on every site where the policy was applied.
But even when it was published in this way and applied to a specific SharePoint site, the label does not do anything. The policy has to be applied to a document library in that site – but even then it wouldn’t do anything until the label was selected from a drop down list.
The screenshot below shows how the specific label is selected on the library and whether it should be applied to all existing items.
The label can now be seen in the ‘Retention Label’ column (when made visible in the view settings).
Anyone who uses the library to add and edit content (site members) will notice an obvious change if they try to delete anything, including from their synced document libraries in File Explorer.
There is, however, a workaround that end-users can use to delete documents with a label: (a) check the box next to the document, (b) open the information panel and (c) remove the label on individual documents.
Once the label is removed it no longer appears against the record.
The document can now be deleted. It will move to the Recycle Bin, from where it could be restored for up to 93 days if a mistake was made.
The lessons to be learned from this are that:
It might be useful to make the library read only after the label is applied.
The label should only be applied when the library became inactive, to allow people to get on with their work while it is still active.
If anyone was concerned about content being deleted from an active library, they could set an alert on the library.
In the meantime, for all the other records that retained their label, when the retention period ended the records in the library would be transferred to the Recycle Bin where they would sit for 93 days before complete and permanent deletion.
If no-one restored a document from the Recycle Bin, the content would be deleted, permanently, forever, without any record. The records managers might need to make a copy of the metadata for the documents to be deleted before they are deleted.
Alternatively, if they had selected the option for a ‘disposition review’ when the label was created, this would alert the records managers (and others) of the need to review the contents of the library.
Once everything had been deleted, all that will be left will be an empty library with no record of what used to be there.
Retention policy without a label
Non-label retention policies are created from the ‘Retention’ tab of the Information Governance portal.
For the purpose of this post, this label will be named ‘Label Two – Retain 5 years then destroy’.
This policy has the same retention period, trigger, and action as Label One – 5 years after date modified – and then the content could be deleted without any record of this being kept. [Note – yes the screenshot below shows 7 years].
As the policy was being created it was applied directly to a SharePoint site.
The policy could start working immediately.
Nobody, apart from the person who created and applied it knew it had been applied. It didn’t appear anywhere on the SharePoint site where it had been applied.
But, as part of the application process, it created a Preservation Hold (PH) library visible only to Admins, including the Site Collection Admins but not the Site Owners.
(Note – if end-users are allowed to create SharePoint sites, or Teams in MS Teams that also creates a SharePoint site, they are made Site Collection Admins by default and can accordingly see the Preservation Hold Library).
Anything that was deleted in advance of the retention policy expiry would be moved to the Preservation Hold library.
The PH library had a specific purpose – to ensure that everything on the site was retained until the end of the retention period.
It was not possible to restore any items from the Preservation Hold library. It was also not possible to ‘clear’ that library. If an administrator tried, they would be advised that ‘something went wrong’.
The only way to delete from the Preservation Hold library would be to remove the policy.
However, if a record was deleted it would be placed in the Recycle Bin for up to 93 days, from where it could be restored. If a record was restored in this way (from the Recycle Bin), the copy in the Preservation Hold library would remain.
For most people this type of policy was fine. They could (appear to) delete records, unaware that nothing was actually deleted if it were subject to a retention policy.
The same thing happened if a similar type of retention policy was applied to OneDrive accounts. End-users could ‘delete’, but nothing was actually deleted as long as the retention policy was active.
At the end of the retention period, the content that was subject to this policy was transferred to the Recycle Bin where it remained for 93 days. It would then be deleted, permanently, forever, without any record.
All that was left would be an empty library with no record of its previous content. The only clue would be in the ID or the ID part of the Document ID (if enabled). If anything new was added to the library, the next number in sequence would indicate how many records had previously been saved to that library.
Lessons to be learned
There are several lessons to be learned.
The first is that you need a plan to create and apply retention labels or retention policies to content across Microsoft 365. The configuration options and implications of each option need to be understood before they are applied.
You need to understand the difference between (a) a retention label published as a policy and applied to a SharePoint site and then a library on that site, and (b) a retention policy applied to an entire SharePoint site.
Retention labels can be mapped to retention schedule/disposal authority classes and then applied to specific libraries on SharePoint sites where the policy is applied. Accordingly, it may be useful to create document libraries that map to those classes, and only store content in those libraries that is covered by a single retention class.
If single SharePoint document libraries are used to store content that maps to different retention classes, then it will become very complicated to accurately map retention labels to content.
In many cases, it may be possible to apply a single non-label retention policy to a single SharePoint site or sites. For example, all project sites might have the same 7 year retention applied to all the content on the site as there may be no real need to apply policies to individual libraries.
Whatever you do, have a plan to act, document your settings, and keep it up to date.
And understand what happens at the end of the retention period. There is no back up.
In the article, Tony describes where and how the chat component of MS Teams is stored and how this might affect eDiscovery.
He also makes the important point that, while it may be possible ‘… to backup Teams by copying the compliance records in an Exchange Online backup … you’ll never be able to restore those items into Teams.’ In other words, it is better to leave the data where it was created – in MS Teams. The post explains why this is the case.
This post draws on the article to describe the factors involving in managing the chat element of Teams as records. It notes that, while is is technically possible to export chat messages (in various ways), it may be much better from a recordkeeping point of view to leave them where they are and subject them to a retention policy.
Two key reasons for leaving chat messages in place are: (a) chat messages are dynamic and may not always be a static ‘thread’, and (b) the chat messages exported from Exchange may not contain the full content of the message.
What is a Teams chat?
A Teams chat consists of one or more electronic messages with at least two participants – a sender and a receiver.
There are two types of chat message in MS Chat:
One-to-one/one-to-many ‘chat’ (top icon above).
Channel-based Teams chat (second icon above). Teams chat is visible to all members of the Team. Within channel-based chats, a person may create a private channel which is visible only the person who created the private channel and any participants.
Messages created in both options could be regarded as records because they may contain evidence of business activity.
However, one-to-one chats have no logical subject or grouping. Only the chat messages in Team channel chat are connected through the context of the Team/channel.
Where and how are chat messages stored?
The following is a summary from Tony Redmond’s article.
Chat messages are stored directly in the backend Azure Cosmos DB (part of the so-called Microsoft 365 ‘substrate’). The version in the database is the complete version of the chat message.
The messages are then copied, less some content elements (for example: reactions, audio records, code snippets), to a hidden folder in either (a) end-user mailboxes for one-to-one chat and private channel chats, and (b) M365 Group mailboxes for channel chat.
Most export options, including the export option in Content Search and eDiscovery, draw their content from the mailbox version of the message. This has potential implications for the completeness of the chat message as a record.
Additionally, any export can only be a ‘point in time’ record unless there is absolute certainty that all chat on a given subject have ceased.
Implications for records managers
In addition to the concerns about a chat message (or exports of them) being complete, there are (at least) two other points relating to the management of chat messages as records in MS Teams:
Knowing if chat messages on any given subject exist.
Applying an appropriate retention policy.
Both of these points are discussed below.
The primary way to locate content on any given subject across Microsoft 365 is via the Content Search option in the Compliance portal. Access to the Content Search option is likely to be restricted. So, if records managers do not have access, they will need to ask the Global Administrators to conduct a search.
Searches can be configured to find content in any or all of the following locations:
Users, Groups, Teams
Office 365 group email
Skype for Business
Teams messages [the copy in the mailbox]
Office 365 group sites
Exchange public folders
Note that content search only works on the copies of the items in the Exchange mailboxes, not the backend Teams database. Accordingly, there is some potential for it to not find some content.
Both the mailbox content and the content discovered by the search can be exported. Teams chat messages can be exported as individual items or as a PST – but note that these message may exclude the elements as described in Tony’s article.
The problem with exporting the content either this way or via other export options (such as described in this post ‘How to export MS Teams chat to html (for backup)‘ (using the Microsoft Graph API) is that it creates a single ‘point in time’ copy; additional content could be added at any time and, if the chats were subject to a retention policy, they may already be deleted.
Managing chat messages ‘in place’ as records
As any export only creates a ‘point in time’ version, it makes more sense from a recordkeeping point of view to leave the chat messages where they are and apply one or more retention policies to ensure the records are preserved.
Ideally, organisations that may create or capture records on a given subject will have taken the time to establish a way for users to do this, including through the creation of a dedicated Microsoft 365 Group with an associated SharePoint site and Team in MS Teams.
For example, if there is a requirement to store all records relating to COVID-19, it would make sense (at the very least) to create a Microsoft 365 Group with that name; this will create: (a) a linked mailbox accessible by all members of the Group, (b) a SharePoint site with the same name, and (c) a Team in MS Teams. All of the content – emails, documents, chat, is linked via the same (subject) Group.
This model makes it easier to aggregate ‘like’ information and apply a single retention policy. It assumes there is (or will be) some degree of control over the creation of Teams (or very good communication to users) to prevent the creation of random Teams, Groups and SharePoint sites – AND to ensure that end-users chat about a given subject within a Team channel, not in one-to-one chat.
What retention period should be applied to chat messages?
The retention period applied to either one-to-one or Team channel messages will depend largely on the organisation’s business or regulatory requirements to keep records. There are two potential models.
The simplest model is to have a single retention policy for one-to-one chats, and a separate retention policy for all Teams channel chats.
As one-to-one chats are stored in the mailboxes of chat participants, it makes sense to retain the chat content for as long as the mailboxes. However, some organisations may seek to minimise the use of chat and have a much reduced retention period – even as little as a few days.
The creation and application of retention policies to Teams channel chat may require additional considerations. For example:
As every Team is based on a Microsoft Group that has its own SharePoint site, it is probably a good idea to establish Teams based on subjects that logically map to a retention class. For example, if ‘customer correspondence’ needs to be kept for a minimum 5 years, and there is a Group/SharePoint site/Team for that subject, then all the content should have the same retention policy – although the Group mailbox and SharePoint site may have a policy applied to the Group, with a separate (but same retention period) applied to the Team.
There may be a number of Teams that contain trivial content that does not need to be retained as records. These Teams could be subject to a specific implicit policy that deletes content after a given period – say 3 years.
In all cases, there is a requirement to plan for retention for records across all the Microsoft 365 workloads.
What happens to chat messages at the end of a retention period?
At the end of a Microsoft 365 retention policy period, both the mailbox version and the database version of the Teams chat message are deleted. To paraphrase Tony’s article, the Exchange Managed Folder Assistant removes expired records from mailboxes. Those deletions are synchronized back to Teams, which then removes the real messages from the backend database.
No record is kept of this deletion action except in the audit logs. Accordingly, if there is a requirement to keep a record of what was destroyed, this will need to be factored in to whatever retention policy is created.
When people chat in Microsoft Teams (MS Teams), a ‘compliance’ copy of the chat is saved to either personal or (Microsoft 365) Group mailboxes. This copy is subject to retention policies, and can be found and exported via Content Search. But what happens if there is no Exchange Online mailbox? It seems the chats become […]
In his April 2007 article titled ‘Useful Void: The Art of Forgetting in the Age of Ubiquitous Computing’ (Harvard University RWP07-022), Viktor Mayer-Schönberger noted that the default human behaviour for millenia was to forget. Only information that needed to be kept would be retained. He noted that the digital world had changed the default to […]
At the 2020 Microsoft Ignite conference, Jeff Teper presented a diagram titled ‘Microsoft 365’. The diagram showed only four icons: Teams, Outlook, Office and Edge. The implication of this diagram was that, for most end-users, Teams is now (or will become) their primary portal into Microsoft 365. As stated by Jeff Teper, SharePoint is a […]
Many organisations have complex records retention requirements that are described in records retention schedules, disposal authorities or records authorities. For example:
There may be different ‘levels’ of retention depending on the ‘state’ of a record. The final versions of certain records may have a longer retention requirement than the working versions.
For each business function there may be multiple types of records, each with their own retention requirement or ‘class’.
In some disposal authorities based on business functions, activities that produce records (for example ‘Meetings’) may appear in multiple functions with the same retention requirement.
This post describes multiple and different types of Microsoft 365 retention policies created with an E3 licence in the Information Governance section of the Compliance admin portal can be applied to a single SharePoint site.
Example retention schedules/disposal authorities
Most records retention schedules or disposal authorities list types (or ‘classes’) of records that are created or captured by the organisation, including through the completion of various activities or transactions, and define how long these records must be kept or retained by the organisation (or transferred to an archival institution).
These record types or classes are usually grouped, by business subject or function.
The following extract, from a private sector company records retention schedule, shows records grouped by subject type (‘Company records’).
In the example below, from the Victorian (Australia) government, records are grouped by function (‘Enquiries and Complaints’).
The diagram below presents a simple view of the examples above. For every subject type or business function, there may be one or more records description (based on the activity or transaction that creates or captures the record) with a corresponding retention period.
How does SharePoint manage records?
SharePoint Online team sites (including the sites linked with Microsoft 365 Groups and MS Teams) may be created to manage the records for a particular business area or function, or for a specific business activity.
Whether a single or multiple document libraries are used, SharePoint sites may contain a mix of record content. It may not always be possible to apply a single retention policy to the site.
For the purpose of this post, we will assume that the organisation has a business function named ‘Client Services’ – a generic name for a business unit that delivers client services.
The Client Services area has several SharePoint sites. One of these sites is named ‘Client Services’.
The ‘Client Services’ site, which has been active for several years, has multiple libraries for the activities it performs, including ‘Meetings’, ‘Procedures’, ‘Working papers’, ‘Rosters’, ‘Marketing’ and so on. Most of these libraries are created annually and consequently the year is added to the library name to help group content more efficiently – for example, ‘Meetings 2018’, ‘Meetings 2019’.
The organisation’s records retention authority has multiple classes for the Client Services function, including:
Marketing – Retain for five years
Meetings – Retain for seven years
Procedures – Retain for seven years.
Rosters – Retain for ten years
There is no class for general ‘working papers’ that may be created in support of the above activities, but the organisation would like to ensure that all content not otherwise covered by one of the ‘explicit’ retention policies above is retained by an ‘implicit’ or background policy.
Creating the Office 365 retention policy
Based on its requirements, the organisation will require two different options.
A single retention policy with a minimum three year retention for content (including ‘working papers’) not covered by any other longer retention period. This will be created as an ‘implicit’ or background policy and applied to the site. Any content that is deleted by the end users will be moved to the invisible (to end users) Preservation Hold library. Records covered by this policy will be automatically deleted – via the Recycle Bin – at the end of the retention period.
Multiple retention labels published in a single retention policy, that is applied on this site or other sites that can be mapped to the same function. This means that, when applied to a document library, every one of the labels will appear in the drop down menu in the library settings to apply a label. Depending on how the label has been configured, the records may be automatically deleted or subject to a disposition review.
Each retention label that is created will include a name and description, and then the label retention settings.
How long it is to be kept (e.g., 7 years).
What happens at the end of that period (delete automatically, disposition review, nothing).
Trigger for disposal – date created, date modified, date labeled. The ‘Date labelled’ option is preferred as it will not prevent day-to-day actions on the library or make the synced version read-only.
This process is repeated for each label. Each label can include the ‘File Plan’ settings, for example any reference numbers, the Function and Activity, and so on.
Here are two of the labels that have been created:
Publishing the labels
After each label has been created, they can then be published together in a single (‘Client Services) retention policy that is applied to the site (Client Services).
The published policy now appears in the list of label-based retention policies. It also appears under the ‘Retention’ tab of the Information Governance section, along with all other published label-based policies and policies that are not based on policies.
Non-label retention policy
The ‘implicit’ or background policy is created directly as a retention policy, without the need for a label. This policy, named ‘Temporary records’, has a three-year retention. It is applied directly to the site (or multiple sites).
Applying the label-based policies to the site
The Client Services site has several libraries as shown below.
We want to apply the label-based policies to the libraries named ‘Meetings 2020’, ‘Rosters 2020’ and ‘Marketing’. The general ‘Documents’ library will be covered by the implicit retention policy for ‘Temporary records’.
To apply the label-based policies to the library, click on the library and navigate to Library Settings where the option to ‘Apply label to items in this list or library’ is found.
A drop down list shows all available label-based policies. As the ‘Client Services’ policy was only applied to this site, only those labels appear. Only one option can be selected for each library.
It is usually a good idea to check the box (hidden behind the list of policies in the screenshot below) to ensure that anything already stored in the library will be covered by the policy.
The Meetings 2020 library has now been assigned the Client Services Meetings – 7 years policy. As soon as this label as been applied:
It will no longer be possible to delete any content.
If the library has been synced to File Explorer, the library in File Explorer will become read only.
The only way to to remove this restriction is to remove the policy. Accordingly, it may be better to apply the label only when the library has become inactive.
Note – The Temporary records implicit policy will continue to operate in the background and will apply to any content in any library or list not covered by an explicit policy. Anything deleted will be moved to the Preservation Hold Library accessible only by the Site Collection Admins or higher.
The final model can be visualised as follows:
The longest retention option will always take precedence. So, if an explicit label-based policy has a retention period of 2 years, and the background implicit retention policy has a retention of 5 years, the content will be kept for 5 years.
Note also that only the content of the libraries or lists is deleted at the end of the retention period. The library or list – and the site – remain.
As described in this post, it is possible to create multiple retention policies and apply them only to a single SharePoint site.
This allow organisations to create targeted groups of retention policies which is likely to be useful in organisations with detailed or function/activity based retention schedules.
Planning is required to ensure that there is appropriate and effective retention coverage for all the content created and captured in all SharePoint sites.
Office 365 is sometimes referred to as an ‘ecosystem’. In theory this means that records could be stored anywhere across that ecosystem.
Unlike the ‘old’ on-premise world of standalone servers for each Microsoft application (Exchange, SharePoint, Skype) – and where specific retention policies could apply (including the Exchange Messaging Records Management MRM policy), the various elements that make up Office 365 are interconnected.
The most obvious example of this interconnectivity is Microsoft Teams which stores chat content in Exchange and provides access to content stored in both SharePoint (primarily the SharePoint site of the linked Office 365 Group) and OneDrive, and has links to other elements such as Planner.
Records continue to be created and kept in the various applications but retention policies are set centrally and can apply to any or all of the content across the ecosystem.
Managing records in Office 365, and applying retention rules to those records, requires an understanding of at least the key parts of the ecosystem – Exchange, Teams, SharePoint and OneDrive and how they interrelate, and from there establishing a plan for the implementation of retention.
What types of records are created in Office 365?
Records are defined as ‘evidence of business activity’ and are often associated with some form of metadata.
Evidence of business activity is an overarching term that can include:
Documents and notebooks (in the sense of text on a page)
Plans, including both project plans and architectural plans and diagrams
Images/photographs and video
Chat and/or messages
Conversations (audio and/or video based)
Social media posts
All digital records contain some form of metadata, usually displayed as ‘Properties’.
Where are the records stored in Office 365?
Most records created organisations using Office 365 are likely to be created or stored in the following parts of the ecosystem:
Exchange/Outlook – for emails and calendars.
SharePoint and OneDrive – for documents and notebooks (in the sense of text on a page), plans, images/photographs and video.
Stream – for audio and video recordings.
MS Teams – for chat and/or messages, conversations (audio and/or video based). Note that 1:1 chats are stored in a hidden folder of the Exchange mailbox of the end-user/s participating in the chat, while Teams channel chat is stored in a hidden folder of the linked Office 365 Group mailbox.
Yammer – for (internal) social media posts.
It is also possible to import and archive certain external content such as Twitter tweets and Facebook content in Office 365.
The diagram below provides a overview of the main Office 365 applications and locations where records are created or stored. Under SharePoint, the term ‘Sites’ refers to all types of SharePoint sites, including those associated with Office 365 Groups. Libraries are shown separately because of the potential to apply a retention policy to a library – see below.
Note also that this diagram does not include network file shares (NFS) as the assumption is made that (a) NFS content will be migrated to SharePoint and the NFS made read only, and (b) all new content that would previously have been stored on the NFS is instead saved either to OneDrive for Business (for ‘personal’ or working documents) or SharePoint only.
Creating a plan to manage records retention across Office 365
In previous posts I have recommended that organisations implementing Office 365 have the following:
A basic architecture design model for SharePoint sites, including SharePoint sites linked with Office 365 Groups (and Teams in MS Teams).
A plan for creating and applying retention policies across the ecosystem.
Because SharePoint is the most likely location for records to be stored (aside from Exchange mailboxes and OneDrive accounts), there should be at least one retention policy for every SharePoint site (or group of sites), as well as policies for specific document libraries if the retention for the content in those libraries may be different from the retention on the overall site.
For example, a ‘Management’ site may contain a range of general content as well as specific content that needs to be retained for longer.
The site can be covered by a single implicit retention policy of (say) 7 years. This policy will delete content in the background, based on date created or data modified.
The document library where specific types of records with longer or different retention requirements are stored may have one or more explicit label-based policies applied to those libraries. This content will be retained while the rest of the site content is deleted via the first policy.
Structure of a retention plan for records in Office 365
A basic plan for creating and applying retention policies might look something like the following:
User mailboxes – one ‘general’ (implicit) retention policy for all mailboxes (say, 7 years after creation) and another more specific retention policy for specific mailboxes that require longer retention.
SharePoint sites – multiple (implicit) retention policies targeting one or more sites.
SharePoint libraries – multiple (explicit) label-based retention policies that are applied manually. These policies will usually a retention policy that is longer than any implicit retention policy as any implicit site policy will prevent the deletion of content before it reaches the end of that retention period.
Office 365 Groups (includes the associated mailbox and SharePoint site) – one ‘general’ (implicit) retention policy. See also below.
Teams channel chat – one ‘general’ (implicit) retention policy. Note that this content is stored in a special folder of the Office 365 Group mailbox.
1:1 chat – one ‘general’ (implicit) retention policy. This content is stored in a special folder of the participant mailboxes.
OneDrive documents – one ‘general’ (implicit) retention policy for all ODfB accounts, plus the configuration of retention after the account is inactive.
At a high level, the retention policy plan might look something like the following – ‘implicit’ policies are shown in yellow, SharePoint document libraries may be subject to ‘explicit’, label-based policies. The ‘+7 years’ for OneDrive relates to inactive accounts, a setting set in the OneDrive Admin portal.
To retain content for a Microsoft 365 group, you need to use the Microsoft 365 groups location. Even though an Microsoft 365 group has an Exchange mailbox, a retention policy that includes the entire Exchange location won’t include content in Microsoft 365 group mailboxes. A retention policy applied to an Microsoft 365 group includes both the group mailbox and site. A retention policy applied to an Microsoft 365 group protects the resources created by an Microsoft 365 group, which would include Microsoft Teams.
The actual plan should contain more detail and included as part of other recordkeeping documentation (perhaps stored on a ‘Records Management’ SharePoint site). The plan should include details about (a) where the policies have been applied and (b) the expected outcomes or actions for the policies, including automatic deletion or disposition review (for document libraries).
Keep in mind that, unless the organisation decides to acquire this option, there is no default backup for content in Office 365 – once a record had been deleted, it is gone forever and there may be no record of this beyond 90 days.
Records managers have been struggling with managing emails as records ever since they first appeared in the workplace.
For a long time the accepted practice, as with other digital records, was to print them out and put them on the appropriate file. With the introduction of electronic document and records management (EDRM) systems, end users were instead required to save or copy documents and emails to an electronic ‘file’ in that system.
In both cases, the emails remained in the user’s ‘personal’ mailbox, where they remained inaccessible for ‘privacy’ reasons. End-users and business areas would (and still do) conduct business via the email system, without these records being available to anyone except the sender and recipient/s. Attachments to emails sent to individual recipients were (and continue to be) not managed as records unless they were printed out or saved to the EDRMS.
Microsoft Office 365 has changed the paradigm for keeping records as described in the linked post, away from the central storage and management of records in one system (while leaving the originals in place), to the decentralised ‘in place’ storage and centralised management of records across Office 365.
This post provides an overview of the three main options for managing email as records in Office 365, in both Exchange and SharePoint.
In summary the options are:
Leave emails in place in Exchange mailboxes (personal and Office 365 Group mailboxes) and apply one or more Office 365 retention policies to mailboxes.
Same as previous point, and use Content Search to retrieve emails as required.
Same as previous point, and only copy specific emails to SharePoint
Keep in mind while reading this post that chat content from MS Teams is also stored in Exchange mailboxes but that content cannot be copied to SharePoint.
Option 1 – Leave emails in place and apply retention policy
In this option, emails remained stored in personal or Office 365 Group mailboxes. End users may create folders and ‘categorise’ the content as they wish, but no additional attempt is made to further categorise, add metadata to, or group the content according to recordkeeping requirements. The aggregation, from a recordkeeping point of view, is the end-user or Office 365 Group.
All mailboxes are subject to one or more retention policies set in the Office 365 Compliance portal to ensure that no emails are deleted before a pre-defined minimum period.
Note that retention policies can effectively replace a back-up regime used by IT for disaster recovery and investigation purposes purposes.
Emails are aggregated by user name or Office 365 Group and will remain in mailboxes for a minimum period of time as set by the retention policy.
Office 365 Group mailboxes provide the ability to group emails by a more specific subject (the Group name, which could map to a business function – e.g., ‘Correspondence Management’) and have the added positive of having an associated SharePoint site.
The negative with this option, from a recordkeeping point of view, is that all emails – regardless of subject or importance – are grouped by the ‘personal’ or Office 365 Group mailbox, and kept for the period defined in the retention policy. That is, there is no differentiation between (email) records that may need to be kept for a long period of time and those that are transient in nature.
If there is a requirement to ensure that certain emails are kept in different aggregations or for different periods of time, then option 3 should be considered.
Option 2 – Same as option 1 and use Content Search to retrieve emails
This option is the same as the first option, but the business can make use of Content Search to identify and isolate emails as required. Content Search is more or less the same as the search part of an e-Discovery case.
Note that access to the Content Search area is restricted to Office 365 Global Admins and Compliance Admins. This is because, as can be seen in the screenshot below, a Content Search can be set up to search for any content in email, documents and much more.
Content Searches can be set up from the ‘New Search’ option, or the Administrator can make use of a Guided search or Search by ID List. For the purpose of this email, only the ‘New search’ will be examined.
Configuring a new Content Search
Each content search can be configured against three main options as shown in the screenshot below: Keywords, Conditions, and Locations. Some searches may require a combination of these three options.
Keywords can be any words that may be found anywhere in the email, including the content.
The available conditions are listed below:
Size (in bytes)
The available search locations include any or all of the options below:
Office 365 group email
Skype for Business
Office 365 Group sites
Exchange public folders
For more detail on how to use Content Search and all the options available, go to this Microsoft site.
Running a search
After the search has been configured, it must be run. The speed of the search will depend on the complexity of the search, conditions, locations and the volume of content. Every search will appear in the list of searches that have been saved.
When complete, the search result will show a ‘Status’, showing the number of:
Once the search has completed, the results of the search may be exported. There are two configurable options for exported results.
All items, excluding ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
All items, including ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
Only items that have an unrecognized format, are encrypted, or weren’t indexed for other reasons
Exchange content export options:
One PST file for each mailbox
One PST file containing all messages
One PST file containing all messages in a single folder
Enable de-duplication for Exchange content (check box)
Content searches are likely to find and retrieve more relevant emails than might be saved elsewhere, as it looks through all emails. Provided a retention policy has been applied to the mailboxes, the content should still be accessible. If the emails have been deleted at the end of a retention policy, they will not be accessible any more.
Emails can be exported and – if necessary – the PST copied to a different system (such as SharePoint) for long-term storage with additional metadata as required.
Access to the Content Search option is restricted to Global Admins and Compliance admins, for good reason. Consideration might need to be given to governance or procedural rules. Note that Global Admins are always alerted when a new content search is created or run.
Each search must be pre-configured and run regularly to ensure that all emails are identified.
Content searches may retrieve too much unrelated content.
Option 3 – Same as option 2 and copy only select emails to SharePoint
This option mimics the legacy way of saving a record to a pre-defined separate aggregation, in this case to a SharePoint document library.
It differs from the first two options in that only certain select emails are copied (by end-users or using a third-party application) to specific SharePoint document libraries. It is still, however, possible (and preferable) to apply a retention policy to the original mailboxes.
Content search, which can be used at any time, will find the emails in both Exchange mailboxes and SharePoint as long as they have not been deleted via a retention policy expiry .
The positives with this option are that emails copied to a SharePoint document library:
Are grouped with other related records. This may be important from an organisational recordkeeping point of view, for example for certain key records. Consideration might also be given to setting up an Office 365 Group instead for these specific records.
Can have additional metadata.
Can be retained for a period of time, different from the original mailbox.
The problems with this option are that:
It requires some kind of action to copy the email.
It creates a copy of the email, it doesn’t remove the original.
An email copied to another system may not be the most recent in a thread, especially if that thread is still active.
Does not include the ‘chat’ elements from MS Teams.
Summing up the options
The idea of copying an email to a separate aggregation, container or file for recordkeeping purposes is a legacy concept inherited from the paper recordkeeping period. While attempts were made over the years to mimic that concept in EDRM systems, it has several weaknesses that mostly outweigh the alleged benefits.
Email (in Exchange) and documents (in SharePoint) continue to remain separate in Office 365 but there is now the potential to manage both equally through a combination of retention policies and pre-defined content searches.
The majority of business emails are never captured in separate recordkeeping systems. Microsoft’s centralised retention model and ability to apply to retrieve emails on the fly mean that it is more efficient and cost effective to leave emails in place. This does not exclude the potential to copy certain select emails to SharePoint.
Additionally, mailboxes associated with Office 365 Groups provide the ability to keep emails in a business context, away from inaccessible ‘personal’ email accounts. Records managers should consider the potential of using Office 365 Group mailboxes in this way for particular types of records.
Records management standards (see below) state that a defining feature of records is that they are associated with metadata – both ‘point of capture’ metadata and ‘process’ metadata that continues to evolve throughout the life of the record.
For at least two decades, the requirement to capture and store metadata for digital records has driven the implementation of centralised electronic document and records management EDRM systems, many of which began life as databases used to record metadata about physical records (files and boxes).
EDRM systems were (and still are) used to store copies of digital records created or captured natively in other systems, primarily network file shares and email. End-users were required to copy individual records to the EDRMS, a process that mirrored the storage of records (including printed digital records) in physical files.
Network file shares and email systems were not considered to be suitable as recordkeeping systems because they could not ensure the authenticity, integrity and reliability of records over time, including to manage and preserve metadata about the records stored in them.
The increasing implementation of Office 365, and in particular the use of SharePoint for the storage of records, has highlighted the extent to which recordkeeping metadata can – or even should – be applied to the content stored in that system.
This post discusses the need for metadata in records stored in Office 365, including in both Exchange/Outlook, MS Teams, and SharePoint/OneDrive for Business. It concludes that most records stored in Office 365 do not need additional metadata but, where such metadata is required, there is unlimited capability to add it.
Records and metadata
The international standard for records management, ISO 15489:2016, defines a record as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Records are said to be different from ‘non-records’ because they are associated or described with (mostly added) metadata that describes ‘the context, content and structure of records and their management through time’.
The standard for recordkeeping metadata is ISO 23081:2017. One records management professional (link at the end of the post) noted that there has been reluctant adoption of this standard, mostly because it was ‘too complex’ and ‘academic’, and used ‘foreign terms’. Unspecified vendors were said to have been dismissive of the standard.
Standard for managing digital records – ISO 16175
Part 2 of the standard ISO 16175:2011, ‘Guidelines and functional requirements for digital records management systems’ contains multiple requirements relating to metadata, across three broad categories:
Point of capture metadata. This includes metadata that forms part of the ‘metadata payload’ of the original record (e.g., date created, creator), other metadata added at point of capture, and metadata that provides additional context for the records.
Process metadata. This is metadata that records activities and changes to both the record and metadata over the life of the record.
The need to manage and control metadata over time.
This standard appears to reinforce the requirement for records to be stored and managed in dedicated recordkeeping systems.
On premise document and records management systems: These systems use metadata schema that specify metadata fields to be used in the system.
Cloud systems including Office 365: These systems can make use of enterprise ‘graphs’ that map people to documents and topics. The graph is built from the interactions of people with content across the different workloads of the suite.
Most people now accept the algorithm capabilities of Facebook, LinkedIn, eBay, Amazon and similar online systems to automatically connect us with information relevant to us, without having to add any metadata.
Given the volume and types of digital content, almost all of which has metadata ‘payloads’, how can we ever hope to add the required recordkeeping metadata?
Can’t we just rely on the algorithms and graphs?
How much metadata do you really need?
The answer to this question may depend largely on business, regulatory/compliance and/or government recordkeeping requirements relevant to the organisation and its jurisdiction. In my experience, across multiple very large and also very small organisations:
Most private sector organisations will likely have minimal metadata requirements beyond basic ‘point of capture’ and ‘process’ metadata already recorded in the system where the records are created or captured (including email), unless this is required for specific compliance or regulatory purposes, or where there is risk associated with poor recordkeeping. For example, in a major food processing company, records relating to the manufacture of food were very well documented and managed, while corporate records were managed haphazardly.
Most public sector organisations are required, for government accountability and transparency (and information retrieval) purposes, to apply a minimum set of both ‘point of capture’ and ‘process’ metadata for non-permanent records. Many government agencies have struggled to manage digital records effectively.
A small percentage of records captured or created in government agencies may require more extensive metadata, especially if those records are to be transferred to archival institutions for permanent retention.
Office 365 ‘workloads’
In Office 365, most business records will be created or captured in either Exchange/Outlook (includes MS Teams chats), or SharePoint or OneDrive for Business (for ‘working’ or personal content).
Exchange is a recordkeeping system in that it stores records with consistent metadata. The primary ‘weakness’, in terms of recordkeeping, is that ‘personal’ Exchange mailboxes aggregate records on a range of subjects by an individual user rather than by business subject. The mailboxes of Office 365 Groups, on the other hand, can be used to aggregate records about a business function/activity or subject.
SharePoint is a recordkeeping system that has extensive default metadata and almost unlimited additional metadata capability (see below). OneDrive for Business is a SharePoint service that has the same extensive default metadata capability.
There is, generally speaking, no requirement for organisations that have implemented Office 365 to allow the continued use of network file shares because the ‘save’ and ‘save as’ options in Office/Windows 10 points to SharePoint and OneDrive as the default save locations.
Metadata in Exchange mailboxes/MS Teams
Emails have the same metadata options in the header of every email:
Recipients (To, including CC and BCC)
(Plus more with routing information and security controls including DKIM, SPF, DMARC etc)
However, no other metadata can be added and some (or most) emails may never form part of the collated record of a given subject.
Because of this ‘limitation’, there has been an assumption ever since email was introduced that emails identified as records would have to be copied to a (separate) recordkeeping system.
In pre-digital days, this meant printing out emails and placing them on a paper file.
In organisations with EDRM systems, this meant copying the email to the EDRMs where additional metadata would be applied.
The original emails generally remained in place in individual mailboxes where they may be subject to backups and journaling in case they needed to be recovered for whatever reason including subpoenas (eDiscovery).
The Office Graph in Office 365 now provides the ability to connect the content in email with other content across that ecosystem, as noted in James Lappin’s post above. This is new – but it doesn’t rely on metadata or copying emails anywhere.
Metadata in SharePoint
As a SharePoint service, OneDrive for Business has the same default metadata columns. According it will not be described further here.
What metadata is required?
Organisations that plan to manage records in SharePoint should consider the following questions as part of their overall information architecture design to ensure records are kept in logical aggregations rather than randomly. This is important especially if end-users are allowed to create Office 365 Groups or Teams.
What point of capture and process metadata is required (for compliance, regulatory, recordkeeping purposes)? What is the source of this requirement?
Is there a difference in the metadata requirements for short-term (retain in the organisation) and permanent records that are to be transferred to archival institutions?
Do the required metadata columns already exist in SharePoint?
If they don’t exist, should the additional metadata columns be added as site columns or library columns?
Does any of the metadata need to be mandatory, and/or can it be a default setting – for example, a metadata column that has the default function and/or activity so the user doesn’t need to add this.
Where is the process metadata and how do you view or manage it? (See also below on this subject).
Information architecture and metadata
The information architecture of SharePoint, in terms of managing records as objects (e.g., documents, spreadsheets, images, etc), is relatively simple:
SharePoint site. The primary aggregation that can be linked to a business function (e.g., ‘Financial management’).
Document library/ies. Logical aggregations or containers of records that can be linked to business activities (e.g., ‘Meetings’).
Folders, document sets as content aggregations.
An effective site architecture can replace the requirement for metadata. For example, the name of the SharePoint site can map to a business function, and library names can map to activities, instead of applying a function and activity pair to each record. The URL address for the record provides the context:
If additional metadata is still required, SharePoint has extensive and almost unlimited capability.
Every new SharePoint site comes with a standard set of around 240 metadata ‘site columns’. The metadata columns include the Dublin Core metadata items.
New metadata columns can be created at the site level (‘site columns’). These are then can be used by all libraries and lists on the site. Here is a useful description of how to add new site columns from ShareGate: SharePoint 101: SharePoint Site Columns.
Every new SharePoint library comes with a standard set of metadata columns – see below. New metadata columns can be created at the library (or list) level, but these columns are only available to that specific library or list.
Default SharePoint document library columns
The default library metadata columns are as follows. Dublin Core metadata items are shown with [DC]:
App Created By
App Modified By
Check In Comment
Checked Out To
Compliance Asset Id
Created By [DC]
Document ID (when enabled as a feature)
Folder Child Count
Item Child Count
Item is a Record
Label applied by
Modified By Name [DC]
Retention label Applied
How metadata is added to records in SharePoint
Every digital record saved to SharePoint will have some form of native metadata (payload). Additional metadata may be added when the document is saved; this may be optional or mandatory.
When a digital record is saved to SharePoint, SharePoint only copies the title or name of the record, not the original created date or author.
When a Microsoft Office document is saved to a SharePoint document library, the Office document stores the library metadata (including the unique Document ID) in its own XML-based properties. This information is retained with the record even when the record is downloaded from SharePoint.
Viewing the metadata
The metadata that describes the content stored in the SharePoint document library may be viewed in multiple ways (via the edit view option), and may be exported (for example if records are to be destroyed or transferred).
Every record includes a version history that provides details of who modified the content, and when (but not what changes were made unless this is recorded).
Process metadata is metadata that records events relating to the record or the aggregation in which it is kept.
Examples of process metadata include when:
Records are viewed or downloaded (date and by whom).
Records are modified (date and by whom, and ideally what changes were made).
Records are copied or moved (date and by whom).
Security controls were changed (date, by whom, and what changes were made).
Records are deleted/destroyed (date and by whom, with what authority).
While ISO 16175 describes the general requirement to keep process metadata, the actual requirement is likely to differ between organisations. Organisations with high compliance requirements, such as certain types of businesses or government, are more likely to want process metadata to be created, accessed when required, and protected against unauthorised modification.
Office 365 process metadata
Office 365 records process metadata in multiple ways in Exchange and SharePoint.
Emails generally cannot be modified after they have been sent. Accordingly, the primary process metadata for emails and Teams chat is likely to be in the deletion records stored in the Office 365 Compliance admin portal audit logs.
SharePoint/OneDrive process metadata is recorded as follows:
Viewed or downloaded, modified, copied or moved: This is recorded in the Office 365 Compliance admin portal audit logs.
Modified: This is recorded in the Date modified and Modified by metadata, as well as the version history (which also keeps the previous actual versions that can be compared if required).
Security changes: This is recorded in the Office 365 Compliance admin portal audit logs.
Destroyed: Depends, but generally this requires the capture of information manually, then stored elsewhere. For example, if the content of a document library is to be destroyed, then the metadata (along with details of the original library URL) should be exported (manually) first and saved somewhere. This is a manual process.
Note that audit log data in Office 365 is only retained for 90 days with an E3 licences, 365 days for an E5 licence.
Exchange/Outlook email has basic metadata. It is unlikely that it will ever be possible to add other metadata, unless email is copied to SharePoint document libraries.
Chats from MS Teams are stored in hidden folders in Exchange mailboxes.
Organisations that need to keep certain emails for specific compliance, recordkeeping or archival purposes, should consider capturing these in SharePoint document libraries. Organisations might also consider making more use of Office 365 Group mailboxes for business-specific content as these Groups also include both MS Teams chat and have an associated SharePoint site.
The metadata capabilities of SharePoint are unlimited but not all records need the same degree of metadata.
The majority of records can probably be managed in standard SharePoint document libraries using the default metadata columns, or with one or two additional site or library metadata columns added, where required.
The Office Graph will increasingly be able to bring together records dynamically in the context of the business or the end-user via Project Cortex and Delve, respecting security controls that may be in place. The centralised content search and retention policy capability in Office 365 will also enable businesses to find, retrieve and manage content across both Exchange and SharePoint.
AS/NZS ISO 23081 series, Information and documentation – Records management processes – Metadata for records
Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.
Traditional legacy electronic document and records management (EDRM) systems have two components:
A database (e.g., SQL, Oracle) where the metadata about the records are stored
A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.
In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.
When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?
This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.
The Windows NTFS Master File Table (MFT)
Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).
In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:
As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.
The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size. This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.
If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.
When a file is deleted, the MFT records the deletion.
If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.
So, in summary, ‘deleting’ a file does not actually delete it. It may either:
Store the file in the Recycle Bin, making it relatively easy to recover, or
Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.
How does an EDRMS store and manage files?
The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.
Most EDRM systems are made up of two parts:
A database (SQL, Oracle etc) to store the metadata about the record.
An attached file store that stores the actual digital objects.
When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.
When digital records are uploaded to the EDRMS:
The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
Additional metadata may be added, including a content or record ‘type’.
The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
A link is made between the database record and the record object stored in the file store (the MFT record).
When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.
What happens when an EDRMS record is destroyed?
If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.
For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.
For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files.
If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
There is likely to be a backup of the EDRMS.
The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
Digital forensics can be used to recover ‘deleted’ files from the associated file share.
Consider this scenario:
An email containing evidence of something is saved to a container in an EDRMS.
The container of records is ‘destroyed’ after the retention period expires.
A legal case arises after the container is ‘destroyed’
A subpoena is made for all records, including those specific records.
Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?
Is it really possible to destroy digital records, and does it matter?
Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.
Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
Might it be possible to recover the record with digital forensics tools?
Does it matter?
It might be worth asking IT and your EDRMS vendor.