Posted in Classification, Compliance, Information Management, Records management, Retention and disposal

Classifying records in Microsoft 365

The classification of records is fundamental recordkeeping activity. It is defined in the international standard ISO 15489-1:2016 (Information and Documentation – Records Management) as the ‘systematic identification and/or arrangement of business activities and/or records into categories according to logically structured conventions, methods and procedural rules‘. (Terms and Definitions, 3.4)

The purpose of classification is defined by State Records NSW as follows: ‘In records management, records are classified according to the business functions and activities which generate the records. This functional approach to classification means that classification can be used for a range of records management purposes, including appraisal and disposal, determining handling, storage and security requirements, and setting user permissions, as well as providing a basis for titling and indexing‘. (Records Classification, accessed 13 January 2021.)

The ever-increasing volume of digital records, the many different ways to create them, and the multitude of record types that are created and storage locations, have made it more difficult to accurately and consistently manually classify records, including through the creation of pre-defined ‘containers’ or aggregations based on classification terms. Despite this, the requirement to link the classification of records with their retention and and disposal remains.

For over three decades, Microsoft’s applications and technology platforms have been used to create, capture, store and manage records. Some of these records (in the earlier period) were printed and placed on paper files, or stored (from around 2000) in dedicated electronic document and records management (EDRM) systems.

But the volume and type of digital content, including with new types of records (e.g., chat messages) and storage locations, continues to grow. In response, Microsoft invested heavily in addressing the need to classify records ‘at scale’.

This post looks at various ways to classify records, for retention and disposition purposes, in Microsoft 365.

The old-school, manual method – metadata

Most of the records in Microsoft 365 will be created, captured or stored in one of the four primary workloads: Exchange mailboxes, SharePoint sites/libraries, MS Teams chats (a ‘compliance copy’ of which is stored in Exchange mailboxes), and OneDrive for Business libraries. Some records may also exist in Yammer or other web page content (e.g., intranet).

Most SharePoint sites as well as Teams (that have a SharePoint site) will be created according to some form of business need to create, capture, store and share records; that is, the site or team purpose may be based on a business function or activity. This way of grouping records may in some ways be used as a way to classify records – by SharePoint site (e.g., function) or document library (e.g., activity).

Records may be stored in multiple document libraries, or within a folder structure of a single library.

A number of methods (some of which rely on others) can be used to add classification (and other) metadata to records stored in SharePoint document libraries:

1 – Creating the classification taxonomy in the Managed Metadata Service (MMS)/Term Store via the SharePoint Admin portal – Content Services – Term store, and then applying these terms in content types that are then deployed in SharePoint sites.

An example of Business Classification Terms in the MMS

2 – Creating global content types from the SharePoint Admin portal, in the Content Services – Content type gallery area (see ‘Finance Document’ example below) and then deploying these in specific SharePoint sites where site columns that contain classification terms will be added.

3 – Creating site columns that contain classification terms, including from the MMS, and adding these to global or site content types or document libraries where they can be applied to records.

In this example, the site column ‘BCS Function’ maps to the MMS BCS terms

4 – Creating site content types and adding site columns (including MMS-based columns), then adding these content types to document libraries.

In this example, the MMS-based column now appears in the library columns

But, most of the above is somewhat complicated and cumbersome and would normally only be used for and manually applied to specific types of records.

The simplest way to apply BCS/File Plan terms at the document (or document set) level is to (a) store records to the same BCS function or activity in the same library, (b) create site or library columns with default values and add these to the library. This way means that the default terms are applied automatically as soon as a new record is uploaded, including when shared/inherited from the site columns added to a document set that ‘contains’ a document content type.

Example metadata columns shared from the document set content type

However, keep in mind that SharePoint is just one of the workloads where records are stored.

Records in the form of emails, chats and ‘personal’ content (as well as Yammer messages and web pages) are created in and stored across the other workloads. Some attempt may be made to copy these other records (especially emails) in SharePoint sites but it starts to get complicated or impossible to do so with things like Teams chat messages.

In most cases (and according to Microsoft’s own recommendations), it is better to leave the records where they were created or captured (‘in place’), and apply centralised compliance controls (classification, retention labels and policies) to this content.

Leaving the records in place in this way does not exclude the ability to create SharePoint sites and document libraries in those sites that map to classification terms, and/or use the site column approach described above but these are more likely to be exceptions.

In fact, some form of logical structure is almost certain anyway as most end-users will probably want to access and manage information in their own specific work context (the Team/SharePoint site).

Trainable classifiers

Since not all records are stored in SharePoint and the ever-increasing volume of digital content stored across the Microsoft 365 platform, Microsoft needed to find a way to classify records ‘at scale’.

The solution was to use machine learning (ML) via trainable classifiers accessed in the ‘Data Classification’ section of the Microsoft 365 Compliance portal. This capability is only available to E5 licences.

The trainable classifiers solution was released to General Availability on 12 January 2021 (‘Announcing GA of machine learning trainable classifiers for your compliance needs‘, accessed 13 January 2021).

See the Microsoft web page ‘Learn about trainable classifiers‘ to learn more about this option. To quote from that page:

This classification method is particularly well suited to content that isn’t easily identified by either the manual or automated pattern matching methods. This method of classification is more about training a classifier to identify an item based on what the item is, not by elements that are in the item (pattern matching).’

Organisations (including E3 licence holders) may make use of five pre-defined trainable classifiers (Resumes, Source Code, Targeted Harassment, Profanity or Threat. A sixth classifier ‘Offensive language’, is to be deprecated). Custom classifiers require an E5 licence.

Custom classifiers require ‘significantly more work’ than the pre-existing classifiers and the process is quite involved (see the process flow diagram in the ‘Learn about’ page link above) but in summary it involves the following steps:

  • Creating the custom classifier.
  • Creating a set of manually selected example records (50 to 500) in a dedicated SharePoint Online site as the ‘seed’. This would include a range of emails in the seed examples.
  • Testing the classifier with the seeded documents.
  • Re-training with additional content – both positive and negative matches.

Once the classifier is published, it can be used to identify and classify related content across SharePoint Online, Exchange, and OneDrive (but not Teams).

The page ‘Default crawled file name extensions and parsed file types‘ provides details of all the record types that can be classified in this way. Note it is not clear if trainable classifiers can crawl the compliance copy of Teams chat messages stored in hidden folders in Exchange mailboxes.

Label-based retention policies can then be automatically applied to content that has been identified through the trainable classifier.

However, note that the classifier does not ‘group’, aggregate or ‘present’ (list) the records for review (except broadly via the Content Explorer); however, the label applied to the records can be searched via the ‘Content Search’ option in the Compliance portal. This is a much better option than not having any idea how many records of a particular classification may exist in Exchange mailboxes, OneDrive accounts, or general SharePoint sites. It requires some degree of ‘letting go’ of the ability to view and browse content classified this way, and trusting the system.

The main limit with trainable classifiers is that it requires an E5 or E5 compliance licence.

The other limitation is the management of the disposition of records that have been identified with trainable classifiers and had a label-based retention policy applied. There are significant shortcomings with the current ‘Disposition Review’ process, specifically the lack of adequate metadata to review records due for disposal or the details of what has been destroyed.

SharePoint Syntex

Another (but limited) option might be to use SharePoint Syntex (see ‘Introduction to Microsoft SharePoint Syntex‘ for an overview), although its range is limited to SharePoint and – it seems – only records that have a relatively consistent structure and format.

SharePoint Syntex evolved out of Project Cortex’s ability to extract and capture metadata from records. It can also be used through its ‘Document Understanding Model‘ (DUM) to provide a way to classify records stored in SharePoint Online (only). It makes use of a ‘seeding’ model that is similar to trainable classifiers (and may be based on the same underlying AI engine).

Broadly speaking, the DUM works on the basis of loading a small ‘seed’ set of (relatively consistently formated) example files into a dedicated Content Center (or Centers). This is very similar to the process of using trainable classifiers, except that the latter does not require a ‘content center’ SharePoint site to be created.

  • The example files are ‘trained’ by being ‘classified’ through the document understanding process based on a set of ‘explanation types‘ that are used to help find the relevant content. The three explanation types are: (a) phrase list (a list of words, phrases, numbers, or other characters used in the document or information that you are extracting); (b) pattern list (patterns of numbers, letters, or other characters); and (c) proximity (describes how close other explanations are to each other).
  • The document understanding model (DUM) produced through the explanation types is associated (and deployed) with a new or existing content type. 
  • Once applied to a SharePoint site library, the DUM/content type provides the basis for identifying and tagging (with metadata) other similar records in the location (e.g., the library) where the DUM has been deployed. 
  • If the documents have consistent content such as invoices, certain data from those documents can be extracted as metadata. 

Retention labels may be applied to records classified using SharePoint Syntex, as described on this page ‘Apply a retention label to a document understanding model‘.

Summing up – which one should be used?

The answer to this question will depend on your compliance requirements.

Smaller organisations may be able to set up SharePoint sites and document libraries with site columns/metadata that maps to their business classification scheme or file plan, and copy emails to those libraries. There may be little need to use AI-based classification methods.

In large and more complex organisations (with E5 licences), especially those with a lot of content stored across Exchange mailboxes and SharePoint sites (including Teams-based sites) there will most certainly be a need for some form of AI-based classification in addition to classification-mapped SharePoint sites (and Teams).

Organisations with E3 licences might use the manual methods described above for specific types of records, and consider acquiring additional E5 Compliance licences to make use of trainable classifiers or SharePoint Syntex for other records.

Posted in Classification, Compliance, Electronic records, Governance, Information Management, Microsoft Teams, Office 365 Groups, Products and applications, Records management, Retention and disposal, SharePoint Online

Managing MS Teams chat as records

(The image above was part of collector’s album issued in 1930 by Echte Wagner, a German margarine company. Source – https://flashbak.com/wonderful-futuristic-visions-of-germany-by-artists-in-1930-381451/)

On 19 May 2020, Tony Redmond published a very helpful article on the Office 365 for IT Pros website titled ‘Using Teams Compliance Data for eDiscovery‘.

In the article, Tony describes where and how the chat component of MS Teams is stored and how this might affect eDiscovery.

He also makes the important point that, while it may be possible ‘… to backup Teams by copying the compliance records in an Exchange Online backup … you’ll never be able to restore those items into Teams.’ In other words, it is better to leave the data where it was created – in MS Teams. The post explains why this is the case. 

This post draws on the article to describe the factors involving in managing the chat element of Teams as records. It notes that, while is is technically possible to export chat messages (in various ways), it may be much better from a recordkeeping point of view to leave them where they are and subject them to a retention policy.

Two key reasons for leaving chat messages in place are: (a) chat messages are dynamic and may not always be a static ‘thread’, and (b) the chat messages exported from Exchange may not contain the full content of the message. 

What is a Teams chat?

A Teams chat consists of one or more electronic messages with at least two participants – a sender and a receiver. 

msteamschatteams-1

There are two types of chat message in MS Chat:

  • One-to-one/one-to-many ‘chat’ (top icon above).
  • Channel-based Teams chat (second icon above). Teams chat is visible to all members of the Team. Within channel-based chats, a person may create a private channel which is visible only the person who created the private channel and any participants.

Messages created in both options could be regarded as records because they may contain evidence of business activity.

However, one-to-one chats have no logical subject or grouping. Only the chat messages in Team channel chat are connected through the context of the Team/channel. 

Where and how are chat messages stored?

The following is a summary from Tony Redmond’s article.

Chat messages are stored directly in the backend Azure Cosmos DB (part of the so-called Microsoft 365 ‘substrate’). The version in the database is the complete version of the chat message.

The messages are then copied, less some content elements (for example: reactions, audio records, code snippets), to a hidden folder in either (a) end-user mailboxes for one-to-one chat and private channel chats, and (b) M365 Group mailboxes for channel chat.

Most export options, including the export option in Content Search and eDiscovery, draw their content from the mailbox version of the message. This has potential implications for the completeness of the chat message as a record.

Additionally, any export can only be a ‘point in time’ record unless there is absolute certainty that all chat on a given subject have ceased. 

Implications for records managers

In addition to the concerns about a chat message (or exports of them) being complete, there are (at least) two other points relating to the management of chat messages as records in MS Teams:

  • Knowing if chat messages on any given subject exist. 
  • Applying an appropriate retention policy. 

Both of these points are discussed below. 

Finding content

The primary way to locate content on any given subject across Microsoft 365 is via the Content Search option in the Compliance portal. Access to the Content Search option is likely to be restricted. So, if records managers do not have access, they will need to ask the Global Administrators to conduct a search. 

Content searches are very powerful. This Microsoft article, ‘Keyword queries and search conditions for Microsoft 365‘ provides details on how to search. The screenshot below shows an example of a very simple keyword queries with the option to add conditions. 

ContentSearchQuery

Searches can be configured to find content in any or all of the following locations:

  • Users, Groups, Teams
    • Exchange email
    • Office 365 group email
    • Skype for Business
    • Teams messages [the copy in the mailbox]
    • To-Do
    • Sway
    • Forms
  • SharePoint
    • SharePoint sites
    • OneDrive accounts
    • Office 365 group sites
    • Teams sites
  • Exchange public folders

Note that content search only works on the copies of the items in the Exchange mailboxes, not the backend Teams database. Accordingly, there is some potential for it to not find some content.

Both the mailbox content and the content discovered by the search can be exported.  Teams chat messages can be exported as individual items or as a PST – but note that these message may exclude the elements as described in Tony’s article.

The problem with exporting the content either this way or via other export options (such as described in this post ‘How to export MS Teams chat to html (for backup)‘ (using the Microsoft Graph API) is that it creates a single ‘point in time’ copy; additional content could be added at any time and, if the chats were subject to a retention policy, they may already be deleted.

Managing chat messages ‘in place’ as records

As any export only creates a ‘point in time’ version, it makes more sense from a recordkeeping point of view to leave the chat messages where they are and apply one or more retention policies to ensure the records are preserved. 

Ideally, organisations that may create or capture records on a given subject will have taken the time to establish a way for users to do this, including through the creation of a dedicated Microsoft 365 Group with an associated SharePoint site and Team in MS Teams. 

For example, if there is a requirement to store all records relating to COVID-19, it would make sense (at the very least) to create a Microsoft 365 Group with that name; this will create: (a) a linked mailbox accessible by all members of the Group, (b) a SharePoint site with the same name, and (c) a Team in MS Teams. All of the content – emails, documents, chat, is linked via the same (subject) Group. 

This model makes it easier to aggregate ‘like’ information and apply a single retention policy. It assumes there is (or will be) some degree of control over the creation of Teams (or very good communication to users) to prevent the creation of random Teams, Groups and SharePoint sites – AND to ensure that end-users chat about a given subject within a Team channel, not in one-to-one chat. 

What retention period should be applied to chat messages?

The retention period applied to either one-to-one or Team channel messages will depend largely on the organisation’s business or regulatory requirements to keep records. There are two potential models. 

The simplest model is to have a single retention policy for one-to-one chats, and a separate retention policy for all Teams channel chats.

As one-to-one chats are stored in the mailboxes of chat participants, it makes sense to retain the chat content for as long as the mailboxes. However, some organisations may seek to minimise the use of chat and have a much reduced retention period – even as little as a few days. 

The creation and application of retention policies to Teams channel chat may require additional considerations. For example:

  • As every Team is based on a Microsoft Group that has its own SharePoint site, it is probably a good idea to establish Teams based on subjects that logically map to a retention class. For example, if ‘customer correspondence’ needs to be kept for a minimum 5 years, and there is a Group/SharePoint site/Team for that subject, then all the content should have the same retention policy – although the Group mailbox and SharePoint site may have a policy applied to the Group, with a separate (but same retention period) applied to the Team. 
  • There may be a number of Teams that contain trivial content that does not need to be retained as records. These Teams could be subject to a specific implicit policy that deletes content after a given period – say 3 years. 

In all cases, there is a requirement to plan for retention for records across all the Microsoft 365 workloads. 

What happens to chat messages at the end of a retention period?

At the end of a Microsoft 365 retention policy period, both the mailbox version and the database version of the Teams chat message are deleted. To paraphrase Tony’s article, the Exchange Managed Folder Assistant removes expired records from mailboxes. Those deletions are synchronized back to Teams, which then removes the real messages from the backend database.

No record is kept of this deletion action except in the audit logs. Accordingly, if there is a requirement to keep a record of what was destroyed, this will need to be factored in to whatever retention policy is created. 

 

Classifying records in Microsoft 365

The classification of records is fundamental recordkeeping activity. It is defined in the international standard ISO 15489-1:2016 (Information and Documentation – Records Management) as the ‘systematic identification and/or arrangement of business activities and/or records into categories according to logically structured conventions, methods and procedural rules‘. (Terms and Definitions, 3.4) The purpose of classification is defined […]

Posted in Classification, Electronic records, Information Management, Office 365, Records management, Retention and disposal

Planning for retention management in Microsoft 365

Fools rush to implement retention without thought‘ – Tony Redmond, 13 April 2017

Tony Redmond’s quote above, as well as the rest of the article in ‘Bringing Compliance to Office 365 Groups‘, is as relevant today as it was in 2017.

Tony is a contributing author to the e-book ‘Office 365 for IT Pros‘, essential reading for anyone doing anything with Microsoft 365. Page 921 of the May 2020 edition contains the following paragraph, which expands on the quote above and contains probably the best guidance ever required in relation to this subject:

It is sensible to write down each of the retention labels that you plan to use before creating anything. It is much easier to delay the release of a label and the training of users to use the label properly than it is to launch a label into general circulation only to discover that you later need to withdraw it. Another thing to consider is how easy it is for users to decide between different retention labels when the time comes for them to apply a label. Too many labels, misleading names, or too much choice can lead to frustration and bad decisions.

How do you go about writing down each of the retention labels as part of a plan – especially for a Microsoft 365 environment that is already in full swing?

This post provides some suggestions to help you do this.

What is your records retention and disposal status?

A good starting point is to establish the current records retention and disposal status for your organisation. Do you have a records retention schedule, also known as a disposal authority or records authority? 

If you have one of these documents, it would be useful to review it as a key part of the process is to ‘map’ the records retention classes to specific records across the various Microsoft 365 ‘workloads’ (e.g., Exchange, SharePoint, OneDrive, MS Teams etc), not just in one system (such as SharePoint).

You will need to know what and where these workloads are.

Where (and what) are the records in Microsoft 365?

If you are a records manager then there is a reasonably good chance that you have very little access to, or visibility of, all the content stored across Microsoft 365.

You may have access to one or more SharePoint sites, but unless you are a SharePoint Admin or Site Collection Admin on every site, your visibility will be very limited.

Most of the records in Microsoft 365 will be stored in Exchange, SharePoint, OneDrive for Business, or MS Teams.

  • Emails created and sent by users are stored in Exchange mailboxes. There may also be public mailboxes. Unless there is a plan (or third-party app) to copy these (or some of these) emails out of Exchange (e.g., to SharePoint), most email records will probably remain in user’s mailboxes.
  • Records that, in the past, would have been saved to a network file share (or EDRMS) will now be in SharePoint Online (corporate content) or OneDrive for Business (ODfB) (personal/working content).
  • Chat messages in MS Teams are stored in a hidden area of the Exchange mailbox of each user who participates in the chat. Any documents shared in this chat area are stored in the OneDrive for Business of the person who shared the document.
  • Channel-based Team chat messages in MS Teams are stored in a hidden area of the Exchange mailbox of the Office 365 Group linked with the Team. Any documents shared in this chat area are stored in the SharePoint site of the Office 365 Group linked with the Team.

So, fundamentally, records are stored in two primary workloads: Exchange mailboxes and SharePoint/OneDrive for Business.

What are the retention options?

There are two retention options in Microsoft 365. Both are configured in the Compliance portal of Microsoft 365. Access to this portal requires special privileges, which may not always be granted to records managers.

The two options are:

  • Retention labels published as retention policies and then applied to the various workloads (Exchange email, SharePoint, OneDrive, Office 365 Groups (Exchange/SharePoint content)). These are sometimes described as ‘explicit’ policies because they are visible to end users. Organisations with an E5 licence can extend the way these labels are applied and retention managed.
  • Retention policies that are applied directly to the various workloads (Exchange email, Exchange public folders, SharePoint, OneDrive, Office 365 Groups (Exchange/SharePoint content)). These are sometimes described as ‘implicit’ policies because they are not visible to end users. These policies automatically delete content at the end of a retention period, without any review possible.

Records managers will need to determine how to ‘translate’ each records retention class into one of the two options above, and how and where it will be applied in Microsoft 365.

Some of the options may also require the creation of new records retention classes – for example for the chat element in Microsoft Teams.

A suggested first model

Exchange mailboxes

Your IT probably already has some form of back-up regime (‘archive’) for mailboxes, used for disaster recovery and investigation purposes.

It might be worth creating two policies for mailboxes:

  • All end-user mailboxes could have a single ‘implicit’ retention policy (e.g., 7 years).
  • Mailboxes for specific staff (e.g., senior managers) could have a second, longer, ‘implicit’ retention policy. This policy will take over when the first one expires, but just for the mailboxes identified.

The use of retention policies in this way can replace the need for mailbox backups. No emails will ever actually be deleted while the retention policy is in place and all content can be retrieved via the Content Search option in the Compliance Portal. 

Content Searches can also be used to retrieve and export emails.

OneDrive for Business

As with end-user mailboxes, OneDrive for Business accounts are generally inaccessible to records managers. To ensure that the content in those accounts is not deleted, a single Microsoft implicit retention policy of, say, 7 years could be applied to all ODfB accounts.  This policy will create a hidden (to the user) ‘Preservation Hold’ library on the ODfB account.

Anything ‘deleted’ by the end user during the retention period will be moved to the Preservation Hold library, which is visible to the Global Admins and SharePoint Admins from this URL – /_layouts/15/viewlsts.aspx?view=14

In addition the OneDrive settings include the option (under ‘Storage’ in the ODfB admin portal) to retain OneDrive accounts for a period of time after they are inactive.

All content in these locations is accessible from a Content Search.

SharePoint

SharePoint is likely to be the most complicated in terms of retention policies if there is a requirement to keep content for different periods of time in accordance with the retention schedules/records disposal authorities.

There are likely to be three main options in relation to SharePoint content:

  • One or more implicit retention policy/ies applied to one or more sites. When applied to a SharePoint site, a ‘Preservation Hold’ library retains anything that is ‘deleted’ by end users.
  • One or more explicit label-based retention policies applied to one or more sites. When applied to a SharePoint site, the option to apply it appears for each document library on the site. Once applied (manually), end users cannot delete anything and if the library is synced to File Explorer, the File Explorer view of the library will be read only.
  • A combination of implicit and explicit retention policies.

The decision to apply what policy to what site will depend on your SharePoint architecture and the content stored in each site. For example:

  • A SharePoint site that only stores records that map to one records retention class could have either a single implicit policy (if there is no requirement for disposal review) or a single explicit policy that is applied manually to each library.
  • A SharePoint site that contains records that map to multiple retention classes, but for one business function and also ‘working papers’ could have (a) one implicit policy to cover the working papers and (b) one label-based retention policy with multiple labels – one for each class. This means, for (b), that a specific retention label can be applied to each library as required.
  • SharePoint sites linked with Office 365 Groups and Teams. Depending on the content in the site, it may be possible to apply a single retention policy for all M365 Groups (which covers both the SharePoint site and the mailbox), or a similar policy created for a Group of SharePoint sites (which excludes the mailbox).

MS Teams

As noted above, the chat content in MS Teams is stored in Exchange mailboxes – (a) the mailbox of each participant for one-to-one chat, and (b) the mailbox of the Office 365 Group for channel-based chat.

You may consider having a relatively short-term retention period for one-to-one chat. The retention period for the channel based chat will depend on the subject matter and should – ideally – be the same as for the linked SharePoint site. For example:

  • A Team set up for a specific business function and activity (or activities) will have channel based chat and a linked SharePoint site. Both should be subject to the same retention period.
  • A Team set up for low-level discussion about a subject that may be not be covered by any retention period could be subject to a general retention policy for the chat and the SharePoint content.

Bringing it together

As noted at the beginning of the post, if you are going to use retention policies in Microsoft 365 you need a plan and you need to document it. It doesn’t matter too much if the environment is already active.

However, you will need to have discussions with your Microsoft 365 Global Admins, Compliance Admins and SharePoint Admins and know where the content is stored.

  • The Global Admins can give you a list of every Office 365 Group and Team in MS Team (these are connected – every Team is based on an O365 Group).
  • The SharePoint Admins (or Global Admins) can give you a list of every SharePoint site.

There are some potential ‘quick wins’, such as agreement with IT regarding Exchange mailboxes, OneDrive for Business accounts, and MS Teams.

The more complex requirement is to map the classes in your records retention schedules/disposal authority to content stored in SharePoint, including for standard sites (not linked with Microsoft Groups), communication sites, and sites linked to Office 365 Groups.

You can start to do this by having a list of all the sites exported from the SharePoint Admin portal. This should allow you to see how many sites exist, how much content they hold, and if they are active or not.

It is probably a good idea for the records manager to be included as a Site Collection Administrator, including by being a member of a Security Group added to every SharePoint site. This will help the records manager gain visibility of the content of each site, however they should be very careful about browsing the content as everything is recorded in audit logs.

Document and plan

The outcome of all these actions should be one or more documents that describe (a) where records are stored and (b) the retention policy and action that will apply to those records.

  • For Exchange mailboxes, OneDrive for Business accounts, and MS Teams, this may be a single line for each policy.
  • For SharePoint, there should be a listing of every site and the retention policy or policies that apply to that site.
  • Additionally, for SharePoint sites where an explicit label-based retention policy is applied, the listing should show which libraries this has been applied to. If a disposal review option has been selected, there should be a process to ensure that the metadata of the library where the records are stored is exported and stored in a different location. The original library may then be deleted.