Posted in Electronic records, Governance, Information Management, Microsoft 365, Microsoft Teams, Records management, Retention and disposal, SharePoint Online

A basic retention model for Microsoft Teams

In my previous post about managing inactive Teams, the third option listed was to apply retention policies to those Teams. It included the graphic below.

This post provides more details of a basic retention model that can be applied to both active and inactive Teams.

Key takeaways

Key takeaways from this post for records and information managers:

  • Every Team has a ‘Posts’ (group chat messages) and ‘Files’ (documents etc) tab, and usually also starts with a Wiki tab (which can be removed). Other tabs may be added via the + option.
  • A Team in Microsoft Teams is not a single container or aggregation for the capture and storage of records. Almost all the records in a Team are stored in a hidden folder in Exchange Online (EXO) mailboxes (posts) or SharePoint Online (SPO) (files). Some records (conversations) may also be created and captured in the EXO mailbox of the associated Microsoft 365 (M365) Group.
  • It is not possible to apply a single retention policy to a Team; at least two separate policies will be required – one policy for the Team channel posts of EVERY team, and one or more policies for the content captured in SPO sites (files) or groups of sites.
  • Some records, created in and accessible from Teams, may be stored in other M365 applications (e.g., Tasks, Forms, WhiteBoard, etc) or third-party applications. It is not possible to apply any Microsoft 365 retention policy to records created by or captured in these applications.
  • Records and information managers should have access to the details (not necessarily the content) of every M365 Group, Team, and SPO site in order to establish a plan for the creation and application of retention policies to Teams. At a minimum, they should be assigned the Global Reader role (for details of M365 Groups and SPO sites) and the Compliance admin role (for retention policies).
  • It is relatively easy to overcomplicate the retention model for Teams, for example by applying separate retention labels to different folders and sub-folders in each channel ‘files’ tab.
  • Try to keep the model simple for as long as possible.

Core components of a Team

The main components of every Team are shown in the diagram below. If private channels are not allowed in the organisation, ignore the top two left and right elements.

The relationship of a Team to its M365 Group, Exchange mailbox and SharePoint site, showing where the content is stored (dotted lines).

As shown in the diagram above:

  • Every Team is directly linked with an M365 Group. Every M365 Group has an Exchange Online (EXO) mailbox and a SharePoint Online (SPO) site.
    • The Team, M365 Group, SPO site, and mailbox address (teamname@) all share the same name. The original name (which should be brief, <20 characters if possible) and the display name may be different.
    • The Owners and Members of the Team are the Owners and Members of the M365 Group and those Groups are added to the SPO site Owners and Members permission groups respectively.
  • A ‘compliance copy’ of every post in a normal channel is copied from the Azure-based Teams chat service (which is always inaccessible) to a hidden folder of the EXO mailbox of the M365 Group linked with the Team.
    • Where private channels are allowed, a ‘compliance copy’ of every post in a private channel is copied to a hidden folder of the ‘personal’ EXO mailboxes of all participants in the private channel.
  • Any content created or captured in the ‘Files’ tab of the Team channels is stored in the SPO site of the M365 Group linked with the Team. If any lists are created, they are either stored on the same SPO site or are linked from another site.
    • Where private channels are allowed, a separate SPO site is created (using the name of the ‘parent’ site followed by a hyphen then the private channel name, e.g., parentsitename-privatechannelnamesite). Any content created or captured in the ‘Files’ tab is stored in that SPO site.

So, a Team is a combination of at least four elements: the Teams user-interface (and back-end database), an M365 Group, a SPO site, and an EXO mailbox. The mailbox is used for three main purposes:

  • Email-based ‘conversations’ (when used).
  • Calendaring.
  • Storage of Teams posts.

This is why it is not possible to apply a single retention policy to a Team.

The basic retention model

The basic retention model for Teams assumes the following:

  • If the organisation’s retention schedule/disposal authority does not include coverage for Team posts (chat messages) and also general Team chats, there is a legally defensible policy that defines how long Team channel (including private channel) posts (and chats) will be retained. Note: This policy will define a single retention period for ALL posts and and a separate policy for ALL chats.
  • Records and information managers know the details of every M365 Group, Team (including number of private channels) and SPO site (including last activity and number of files).
  • One or more retention policies will be created for SPO sites.
  • One or more retention policies may be created for M365 Groups.
  • Unless it is done ‘manually’, there will be no review process before the content is destroyed at the end of the retention period.
  • No label-based retention policies will be applied (at this point). They may be added later as required (see below).
  • Unless the option to auto-expiry M365 Groups is used, there will be a manual process to delete inactive and empty M365 Groups or Teams; deleting either will also delete the linked SPO site.

Creating retention policies

Retention policies are created in the Information Governance section of the M365 Compliance admin portal under ‘Retention policies’.

Generally speaking, organisations should not create many of these policies as they should ideally target entire workloads (all SPO sites, all EXO mailboxes, etc) or in some cases major groupings (e.g., EXO mailboxes of senior executives, all other mailboxes).

And remember, these policies do NOT destroy the container (Team, SPO site, EXO mailbox), only the content in those containers.

Every new retention policy has three parts.

Name

The name of the retention policy should be easily recognisable, for example ‘Teams channel posts 7 years’ (all encompassing, for all channel posts, see next dot point), or ‘General SPO site retention 7 years’. The name section also includes a description that should always be used to link the policy to details in a retention schedule/disposal authority or corporate policy.

Location

The ‘location’ element is where the complexity arises as it is not possible to create a single retention policy for all the elements in a Team. Selecting either ‘Teams channel messages’ or ‘Teams private channel messages’ will disable all other options. It is not possible to select ‘SharePoint sites’ or ‘Microsoft 365 Groups’ AND any of the Teams options in the same policy.

Because of this limitation, at least two separate retention policies will be required for a basic retention model, with an additional one for private channels (if required):

  • A retention policy for either all or selected SharePoint sites, including private channel sites. The simplest model is to create a single retention policy for all SharePoint sites. This creates a preservation hold library on every site, retaining all deleted content for the minimum period required. Alternatively, and especially if there is a way to ‘group’ SPO sites (e.g., all project team sites), create retention policies for those groups and add in the site names. Always keep in mind that a retention policy applied to the SPO site has no connection with or impact on the channel posts.
  • A retention policy for all Teams channel messages. Note that this cannot include or exclude any Teams – it’s all or none. Depending on the retention selected for channel posts (next point), this could mean that channel posts are destroyed before (or after) the Team’s SPO content.
  • A retention policy for all Teams private channel posts. Similar to the previous point, this is an ‘all or none’ policy.

If the Team is also making use of the M365 Group’s ‘conversations’ in Outlook, consideration may also be given to creating a retention policy for M365 Groups (or included/excluded Groups). This policy will cover (a) Group ‘conversations’ and (b) the SharePoint site linked with the Group/Team. It will NOT cover the Team channel posts that may be stored in the M365 Group EXO mailbox. Note: It is possible to select just the M365 Group mailbox OR the M365 Group’s SPO site in this policy via a PowerShell script.

Retention period

Retention options are shown in the screenshot below. These options are the same for every retention policy.

Retention policies either automatically delete content after a minimum period or do nothing (includes the ‘retain items forever’ option). There is no disposition review. This means that the content in the SPO site and Team channel (including any ‘deleted’ content, which is not actually deleted, just hidden) simply disappears when the retention period expires.

Retention variations

Organisations may of course have different requirements or decide to apply retention differently. Each of these will still be some variation on the above model.

In most cases, there should be at least one retention policy in place for each of the different elements that make up a Team – the M365 Group, the SPO site, the channel posts, the private channel posts. Whether those policies have the same retention period will be up the organisation to determine, but in all cases, the details should be documented somewhere as currently this information is not easily available.

Retention labels

It is not possible to apply retention labels to Teams channel or private channel posts (or chats). There is only one option, and that is a single retention policy for each of these.

Retention labels may be applied to the content stored in the Teams linked SPO site, and these may be applied instead of using retention policies. This may be an effective model when combined with auto-expiry of M365 Groups as this (auto-expiry) will not occur if the content is subject to an active retention policy or retention label.

However, applying labels to the content stored in each Team channel ‘files’ tab has the potential to be a very complicated model that will become almost impossible to monitor or manage in time.

Each channel ‘files’ tab maps to a folder with the same name in the Documents library of the linked SPO site. As each Team channel may have been created for the records of a different subject with a different retention requirement, this means that each folder (or potentially even sub-folders) in the library may have a different label.

As retention labels (and policies) apply to individual items in the library (but not the folder), this means that individual items, stored in folders, that are subject to disposition review will come up for review in the future.

The application of multiple retention labels to folders within the single Document library of the SPO site is already complicated; having to review some of the individual items as part of a disposition review in the future is just adding to the complexity.

My view is that Teams should, as far as possible, ‘contain’ records relating to the same subject with the same single retention period that can be applied to the entire SPO site. Applying individual labels to folders or sub-folders within a single document library is a complex model both to apply and manage into the future.

What do to with empty Teams?

As noted already, retention policies (and labels) do not delete the SPO site, Team or M365 Group, only the content stored in them. Each of these ‘containers’ remain after the content has been destroyed within them.

Accordingly, it is advisable for records and information managers to (a) have access to the details of every SPO site, Team and M365 Group and (b) work closely with IT to determine when these containers can be deleted (and document that activity). Otherwise, the M365 environment will be left with the hollow shells of sites, Teams and Groups.

Further reading

The following Microsoft links provide further details on this subject.

Learn about retention policies and retention labels

Learn about retention for Microsoft Teams

Learn about retention for SharePoint and OneDrive

Create and configure retention policies

Apply retention labels to files in SharePoint or OneDrive

Teams messages about retention policies

Featured image: http://www.pexels.com

Posted in Artificial Intelligence, Classification, Electronic records, Information Management, Microsoft Viva, Products and applications, Records management

Are auto-generated topic cards the future for aggregations of records?

Humans have natural instinct for grouping, classification and categorisation of things. It helps us find what we are looking for and gives us a sense of satisfaction, whether it be household items, computer storage, or much broader social and population groupings.

Humans have created and kept records ever since we developed a way to record them, on stones, clay shards, papyrus, bamboo sheets, velum, paper and various other means. Multiple records were aggregated in ways that made sense to the people who created or kept them and wanted to find them again.

The introduction of computers at work from the late 1980s/early 1990s began the decline of traditional ways of aggregating records about a particular subject together in a physical ‘file’, although that practice has persisted to the present day because it was and still is easier to refer to. Lawyers (or more often the legal clerks) still attend digital courtrooms armed with printed copies of (usually digital) evidence and other materials for this reason.

Lawyers off to court – Image credit Sven Vik (NYC TV News Videographer)

The ‘problem’ of digital aggregations

While physical files provided the ability to store anything (printable) about a given subject in the one location, digital ‘files’ (or aggregations) suffered from the fact that emails and other content are created or stored in completely different locations.

The only way to keep emails together with other content about the same subject was for end-users to copy them to a network file share folder location or a digital recordkeeping system. In almost every case, the original email remained in the mailbox where it might still have an active life. Some email mailboxes became a primary (or alternative) storage location for both emails and attachments (as did some desktops!).

Keeping all digital records about a given subject in a single aggregation was never an easy task. It was never possible to be sure that everything was captured because it relied on end-users.

The email mailbox – SharePoint conundrum

In the same way that organisations decided to store copies of emails in network file shares or EDRM system, it was easy to see SharePoint as the replacement for both.

But Microsoft have never made it easy to ‘natively’ copy an email from Outlook to SharePoint. There isn’t even a download option for emails. Emails can be dragged and dropped to synced document libraries, and various third-party products exist, but the process usually relies on end-users (a) to copy the emails and (b) to copy them consistently. Neither of these can be guaranteed.

And, of course, the records created and captured in Microsoft 365 is not just in Outlook mailboxes and SharePoint. A number of other apps create content that could records (for example Yammer conversations, Teams chats, calendar entries, Planner tasks, even Whiteboard diagrams). Few of these records can be saved to SharePoint.

So, are digital aggregations impossible?

There is nothing stopping organisations doing whatever they can or want to group related records together. In Microsoft 365, the most logical way to do this is in SharePoint document libraries (the ‘Files’ tab in Teams channels). An entire SharePoint site (the ‘Files’ tab in MS Teams channels) provides a form of meta-grouping; that is, multiple document libraries grouped by the SharePoint site/Team.

But if we stand back for a moment, to look at the (Microsoft 365) forest, what we see is not just individual trees (SharePoint sites, Exchange mailboxes and so on). Just as in a forest the roots of all the trees connect via mycorrhiza networks, sometimes known as ‘wood wide webs’, something similar happens in Microsoft 365 (and many other online systems, including Facebook).

Trees networking

The equivalent of networks in these systems are the ‘graphs’.

Like other graphs, the Microsoft Graph draws on all the rich data created and stored by end-users, in this case across the Microsoft 365 ecosystem – our corporate relationships, who we connect with and how, what we are communicating or writing, what we like, the way we use our time and so on. The graph learns what is popular or trending and makes suggestions (while respecting permissions) as to what we might want to see or know about.

Project Alexandria and Viva

According to a post in the Microsoft Research blog published in April 2021 and titled ‘Alexandria in Microsoft Viva Topics: from big data to big knowledge‘, Project Alexandria is ‘a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents’.

The blog post also noted that ‘Alexandria technology plays a central role in the recently announced Microsoft Viva Topics, an AI product that automatically organizes large amounts of content and expertise, making it easier for people to find information and act on it’.

The Alexandria pipeline – from unstructured text to structured knowledge (From the blog post above)

The outcomes sound similar to traditional ‘manually’ created aggregations, although they don’t replace them. In fact, the more that content is manually curated, the more likely that Viva Topics can accurately connect them and other related content that might otherwise be missed.

While Viva Topics might appear to primarily focussed on supporting knowledge management outcomes and is currently limited to content stored in SharePoint, the technology has potential implications for records management. In particular, the age-old issue of how to find all information about a given subject (or know that a pre-defined aggregation contains all relevant information).

Viva Topic cards

As noted already, there is nothing stopping organisations from creating aggregations in ways that make sense to them and their end-users. SharePoint document libraries are the most logical form of aggregation that also happen to allow complex metadata, versioning and other features typically associated with EDRM systems. SharePoint document libraries are just one of several ways that content may be aggregated; Exchange mailboxes are another.

But, in most organisations, potentially relevant information AND records is frequently hidden from view in personal mailboxes and OneDrive accounts, in Teams chats, and in other applications (e.g., Planner). Viva Topics has the potential to leverage this information.

Once set up (as described in Set up Microsoft Viva Topics) , Microsoft Viva begins to work its magic, discovering topics. An example of a discovered topic (from ‘Manage topics at scale in Microsoft Viva Topics‘ is shown below.

While Topics are still limited to SharePoint content and people, there is potential to extend this model even further by including details about emails, chat messages or other content across the Microsoft 365 ecosystem – even if that information cannot be seen. For example:

  • Topic Name
  • Suggested people (perhaps grouped by AD manager or business area)
  • Suggested files and pages (you can see)
  • Authors of (n number of) emails that are related to the topic with an indication of volume over given periods (e.g., ‘251 emails in the past 6 months’) or a graphic representing this activity
  • Names of Teams that contain (n number of) chat messages related to the topic.
  • Participants in Teams 1:1 chats that contain (n number of) messages related to the topic.
  • Volume and date range of other related content (e.g., Tasks, Whiteboards, Forms, Yammer conversations).

Could Topic cards be the new aggregations?

Topic cards have the potential to resolve the age-old dilemma of digital aggregations, but they are unlikely to replace pre-defined ways to aggregate records including by copying emails to SharePoint document libraries. Those older methods will continue to exist for a long time.

But more importantly, they have the potential to draw out or highlight content that would otherwise be hidden from view – even if that content remains inaccessible.

When configured, Viva Topics already appear in search results, enhancing search outcomes.

It is only a matter of time before the probabilistic programming techniques of Project Alexandria, with expert human curation, begins to provide the type of high precision knowledge base construction for all relevant content about a given subject, first described by Microsoft researchers in May 2019.

Perhaps they may even support or link with retention and disposal processes, highlighting records due for disposal within a given period or even preventing their premature disposal.

Posted in Electronic records, Information Management, Planner, Records management, Retention and disposal, Tasks

Managing tasks as records in Microsoft 365 Planner/Tasks

There are several ways to create, record and assign tasks in organisations. These may include:

  • Personal tasks (or calendar entries) in email applications such as Outlook, or set via the Microsoft ‘To Do’ application.
  • Team and Group-based tasks created and managed in various ways, including on physical white boards, via Microsoft 365 Planner/Tasks or ‘Tasks by Planner for Teams’.
  • Project-based tasks, including in Microsoft Project or other similar applications. Depending on the type of project (e.g., agile or waterfall), this may also involve tasks pinned on Kanban boards.
  • Activity-based tasks, including in dedicated task-based software such as Jira, Trello, etc.

This post describes the three main elements of tasks in Planner/Tasks (including via Teams), where the records are stored, and recordkeeping considerations.

An important point to consider while reading this post is whether you regards tasks in Planner (or Tasks by Planner for Teams) as records? If your answer is yes, then you will need to think about how these records will be managed.

(Thanks to the team at Office365 for IT Pros for some of the detail in this post).

What is Planner?

The Planner option in office.com

To quote from the e-book ‘Office 365 for IT Pros’, Microsoft Planner (also known as ‘Tasks by Planner and To Do’ in Teams) is ‘a lightweight task-oriented planning application’ that is based on membership of Microsoft 365 Groups (click link if you are unfamiliar with Microsoft 365 Groups).

The Planner app in Teams

While there is some functional similarity between Microsoft Project and Planner, organisations soon (or will need to) learn which one is most appropriate for their business needs. Based on my own experience:

  • MS Project is best for tracking activities and tasks for major projects.
  • Planner is useful for general group task assignment and tracking of those tasks.

What are the three main elements of tasks in Planner?

Every task in Planner has three main elements:

  • Data. The details of the task itself including the ‘bucket’ it belongs to, progress, priority, dates, notes and a checklist.
  • Attachments. This may include either uploaded documents or links. Two tasks cannot have the same attachment, for reasons explained below.
  • Comments. These are effectively ‘conversations’.

When a new task is added via Planner or Teams (Tasks by Planner for Teams) via the ‘+ Add task’ option, an end-user simply needs to enter the task name, set a due date (if required), and assign if (if required).

Adding a task

After the new task has been created, the end-user may click on the three dot menu to add a label, assign the task, copy it, copy a link to it, move it, or delete it. Note that deleting a task does NOT delete any attachments or comments.

Task 3-dot menu options

The end-user may also click on the name of the tasks, which offers the options shown below to add attachments or make comments.

What is stored where?

Task data

According to Office 365 for IT Pros, ‘Planner stores the metadata for plans, including information describing the tasks and buckets that make up each plan, in an Azure data service’. Click this link to learn in which country your Planner data is stored)

The accessible metadata about each plan can be seen when the plan is exported to Excel.

  • Task ID (for example: QXkIWsgkqkO5rLu5pvfMhQgAEyXz)
  • Task Name
  • Bucket Name
  • Progress
  • Priority
  • Assigned To
  • Created By
  • Created Date
  • Start Date
  • Due Date
  • Late (true/false)
  • Completed Date
  • Completed By
  • Description (= Notes)
  • Completed Checklist Items
  • Checklist Items
  • Labels

As can be see, the Plan metadata does not include or show references to attachments or notes. There is no way of knowing from the exported data if the task had any attachments or comments

Task attachments

Any task can have attachments or links to other content. When uploaded ‘from computer’, these attachments are not stored in Planner but in the Documents library of the Team’s SharePoint site (the ‘Files’ tab), at the same level as (public) channel folders, as described in detail below. There is no option to choose where they will be saved.

This can be quite confusing, especially as all attachments uploaded from a computer, for all Tasks may be stored in the same location, without reference to the task. (This underlines the importance of saving the required attachments to the Teams channel Files tab first).

In the example below, the Teams channel ‘New Sites’ has a plan named ‘New sites tasks’. A task (‘Does this seem right’) has been added with an attachment ‘ExamplePDFA’. (Note, the visual of the document is a check-box option; only one visual can be displayed if there are multiple attachments).

Example task with an attachment.

As noted already, if uploaded from a computer, an attachment is actually stored in the Documents library at the same level as the channel folders, which means they are not visible from the Files tab for the channel as can be seen in the screenshot below.

The task attachments are NOT stored in the channel Files tab

To get to the task attachments from Teams you have two options:

  • Go to the ‘General’ channel, click on the ‘Files’ tab, then click on the ‘Documents’ option (to the left of ‘> General’). ALL attachments to ALL tasks for every channel in the entire Team are stored in this location. This needs to be kept in mind if anyone syncs the library to File Explorer as there is no indication that these attachments belong to a task in Planner.
  • By clicking on ‘Open in SharePoint’ and then navigating to the top of the Documents library as can be seen below.

In the same way that the task data exported to Excel does not show any reference to attachments, attachments uploaded from a computer (or, for that matter, attachments from Teams files) show no reference to the related task.

From a retention point of view:

  • If retention labels have been applied to the Team’s folders in SharePoint, these labels will not apply to uploaded documents linked with tasks.
  • If a retention policy has been applied to the entire site, then these attachments will be deleted in line with that policy.

The following could happen:

  • Anyone with delete rights, not knowing why these uploaded documents exist, to simply delete them.
  • A member of the Team or Group could add more content to the library at the same level as the uploaded attachments, especially if they are working via File Explorer. (Keep in mind that a new channel is NOT created when a new folder is created in the library at the same level as the channel linked folders.)

Also, if the person who created or is editing the tasks ‘removes’ the document from the three dot menu next to an existing attachment, that attachment is not deleted from the library, which is why there are two documents titled ExamplePDFA above, one with the extra ‘ 1’.

Removing an attachment doesn’t delete it, adding to the potential confusion.

Although it may be difficult to enforce in reality, asking end-users to attach or create a link to a document already stored in a Teams Files tab is better practice.

Task Comments

Task Comments are threaded conversations that are captured in the Microsoft 365 Group’s mailbox. If the Team was created first, the M365 Group mailbox will not be visible to the end users in their Outlook client. However, they will receive a copy of the conversation in their normal inbox.

In the example task below below, which was created in a Team with a visible Outlook mailbox, there is one initial comment to indicate the task was created, then two additional notes.

In the Outlook client, each of these added comments is visible as a thread ‘in reply’ to the original task.

Curiously, the copy that appears in the end-user’s Inbox also shows the retention period for all other Inbox emails. It is not clear if this retention policy will apply to the task conversations or not.

The header of the thread in the Inbox shows a retention policy, not visible in the one above.

Managing records in Planner/Tasks

Are tasks records?

If organisations decide that tasks are records, they will need to consider how they will be managed given:

  • The way that Planner stores task data, attachments, and comments separately. Planner task data is made visible via the Teams interface, it is not stored in Teams.
  • The ability for members of Teams to create multiple plans with multiple tasks with multiple uploaded attachments (all stored in the same location without reference to the task it relates to).
  • The fact that a Group/Team may create a range of different types of content, not just in Teams.
  • The inability to apply retention policies to tasks in Planner, while retention policies might affect uploaded attachments, Teams files or comments as conversations in Outlook.
  • The inability to close or archive a plan, or export all the content as a single entity.

At a minimum, all the task data could be exported to Excel and stored somewhere – perhaps even on the Team’s SharePoint site. The exported data will not include any attachments or comments (neither of which are not referenced in the Excel export). One problem with this approach may be deciding when and if the task data is to be exported, and if the original plan should then be deleted – who is responsible?

If organisations decide that tasks are not records, they should still consider how to manage the various elements of each task and plan from a retention point of view.

  • At what point can a plan be deleted? Does the deletion need to be recorded somewhere?
  • What if the Team decides to delete it anyway? There is currently no information governance/retention coverage for Planner but attachments and comments (if any) may remain.

Perhaps the easiest approach is to regard Planner tasks as low-level working content, not really records, in the same way that tasks in the former Outlook were generally overlooked as being records.

Posted in Artificial Intelligence, EDRMS, Electronic records, Exchange Online, Information Management, Microsoft 365, Microsoft Teams, Records management, SharePoint Online

Different approaches for managing records with Microsoft 365

The COVID pandemic from early 2020 led to the requirement for many employees to work from home (WFH). IT Departments scrambled to enable this capability, many making use of Microsoft (MS) Teams that was already bundled with their Microsoft 365 licences.

The rapid enabling and uptake (rather than an actual ‘implementation’) of MS Teams was more often than not achieved without much consideration for recordkeeping requirements or an overall plan for using Microsoft 365.

MS Teams became popular quickly, increasing from around 30 million active users daily in early 2020 to around 250 million by mid 2021 (Source: ZDNet quoting Microsoft latest results). End-users could chat with each other and with external people (and on their phones too!), have video meetings, create new teams with channels and private channels, share and collaborate on content via the ‘Files’ tab in Teams, create and manage tasks, and more. They also continued to use email.

Anecdotal evidence suggests that the capture of records to on-premise electronic document and records management system (EDRMS) declined from early 2020. One reason suggested for this was that it was too hard to save some cloud records such as Teams chats or content from the Files tab to an on-premise system. Alternative approaches for managing records with Microsoft 365 began to evolve.

This post discusses four approaches to managing records in Microsoft 365, summarised in the diagram below.

Which approach have you taken? Answer my (anonymous) short survey here (Microsoft Forms).

Approach 1 – EDRMS + key Microsoft 365 applications to create and capture

Approach 1 – EDRMS plus the main Microsoft 365 applications

This model has two elements:

  • Retaining an existing centralised recordkeeping system (the EDRMS) for the storage of records.
  • Using email, Teams, SharePoint or OneDrive to create or capture records to be copied to the EDRMS, and leaving other content (in theory non-records) ‘in place’.

The main positive aspect of this model is that records are (in theory) captured and managed in the EDRMS with all the traditional recordkeeping options. Some leading EDRMS vendors now offer solutions that integrate with Microsoft 365 and make it easier to capture records from Microsoft 365. But the model is still based on a centralised recordkeeping system and the requirement for end-users to copy content identified as records.

The main negative aspects of this model include the following points:

  • End-users still have to identify and copy records to the EDRMS.
  • Not all records created or captured in Microsoft 365 can be copied to the EDRMS.
  • Additional products or add-ons may be required to enable the copying.
  • The record is copied to the EDRMS, not moved, so remains in place with no controls.
  • Records that remain stored in Microsoft 365 applications may not be subject to the same degree of recordkeeping controls available in the EDRMS. Unless they acquire a third-party product (see next approach) to overcome this problem (which is unlikely for cost reasons), organisations must use the out of the box recordkeeping capability in Microsoft 365. This capability may not meet all requirements for keeping records if not properly configured.
  • There is a real risk that some records that remain in Microsoft 365 may be lost, especially if settings allow content to be deleted and there is no retention policy or backup.
  • EDRM system admins and records managers will need to learn a lot more about Microsoft 365.
  • The unified logs in Microsoft 365 only retain the details for 3 months (E3) or 12 months (E5) – although SharePoint’s versioning history can provide a lot of ‘modified’ event metadata for the life of the document (up to the the maximum number of versions allowed). (Update: Microsoft 365 customers can retain the audit log for up to 10 years with an add-on license. Many export audit data to a SEIM such as Azure Sentinel where they can retain the log for as long as they want.)

On a positive note, however, Microsoft 365 includes a wide range of search, audit, monitoring and reporting tools, as well as security and protection controls, that improve the ability for records managers to find, manage and protect records (or potential records) in Exchange mailboxes, MS Teams chats and posts, SharePoint sites and OneDrive accounts AND put that content on a legal hold. So, as long as those options are enabled, the risk of losing records is reduced.

Approach 2 – Third-party application + Microsoft 365 applications for creation, capture and storage

Approach 2 – Third-party product plus Microsoft 365

A number of Microsoft partners have developed applications to manage records in Microsoft 365. Several have been available for a decade or more, originally designed to manage records primarily in on-premise SharePoint environments.

Most of these third-party applications were developed to comply with the same recordkeeping standards used by EDRMS vendors. These applications are generally either:

  • Replacements for EDRM systems (often requiring migration from the EDRMS).
  • New implementations where there was no EDRMS beforehand.

It is not common to see both an EDRMS and one of these third-party products being used together, because of licensing cost reasons.

The main positive aspect of using a third-party dedicated application is that records created or captured in Microsoft 365 can be stay there and be managed according to recordkeeping requirements. Some of these applications are invisible to end-users, making them even more attractive.

The main potential negative aspect of using a third-party application, which is the same for any other vendor product, is that it creates a dependency on the vendor to maintain the product. Microsoft 365 continues to evolve and any third-party application must keep up with these changes. Two questions might be asked:

  • Will this dependency become a ‘tech debt’ liability in the future, if a ‘better’ option comes along?
  • How hard will it be to transfer to a different vendor in the future? Generally speaking this is less likely if the vendor is an established Microsoft partner, but the question should still be asked. For example, many organisations decided to use the Google suite of products but have now decided to use Microsoft 365.

Organisations seeking to implement third-party applications to manage records in Microsoft 365 should have a very detailed understanding of the underlying Microsoft 365 environment beforehand and the impact the third-party application might have on this environment. Some of the considerations might include:

  • The requirement to provide the third-party vendor with admin (including global admin) access to the Microsoft 365 tenant. Is this a security concern?
  • The location of records – in some cases, third-party vendors may use, move or back up content to one of their Microsoft 365 tenants. Is this a security concern? How can you monitor activity on your content if it’s not in your tenant?
  • The use of the central Term Store or Content Types to support the application. Will this create a dependency or make it harder for people to work, for example by requiring end-users to select Content Types or add metadata.
  • Changes to SharePoint settings and architecture, including the addition of hidden columns. Will these changes be consistent with your own architecture model?
  • How and where event metadata (audit logs) will be captured and managed.
  • How retention outcomes will be managed.

Approach 3 – One or more Microsoft 365 applications are the default ‘recordkeeping systems’ (no EDRMS or other application)

Approach 3 – Individual systems highlighted are the ‘recordkeeping’ systems

This approach focuses on the applications where most records are likely to be created or captured in Microsoft 365 – Exchange mailboxes, MS Teams, SharePoint, and OneDrive for Business – and therefore considers other content created and/or stored in other Microsoft 365 applications (e.g., Yammer, Forms, Planner/Tasks, etc) as being non-records.

There are several variations on this model including the following:

  • Outlook and Teams are the primary ‘recordkeeping systems’ as they are the two applications that are most used. Teams has been positioned as the primary interface for both SharePoint and OneDrive (via the ‘Files’ tab). The ability to also access both SharePoint and OneDrive from File Explorer via the sync option makes it even less likely that SharePoint or OneDrive will be accessed by end-users.
  • All four applications are the recordkeeping systems, using the various controls and settings available in the various admin portals, as well as the Compliance admin portal for retention policies.
  • SharePoint is the primary recordkeeping system, configured to mimic EDRMS capability. In this case, end-users would be expected to copy emails from Outlook or records from OneDrive, similar to the way they would have to do this for an EDRMS. Various controls and settings, such as ‘back end’ retention policies, might be applied to the other main applications to ensure that any records in those systems (such as Teams chats or emails) are not destroyed before a given period.

The main positive aspects of this approach are (a) simplicity and (b) cost savings, mostly by not having to purchase an EDRMS or third-party application.

However, these potential positives should not compromise the requirement for both IT and records management to have a very good understanding of, detailed approach to, and governance for, managing records in Microsoft 365. In other words, simply saying that one or more of these four applications is the recordkeeping system is not sufficient; additional work is required to ensure that records stored in them are managed appropriately.

There are several potential negative aspects of this model:

  • With the exception of SharePoint, none of the other three systems can be configured to manage records based on standards used for EDRM systems. Given that SharePoint has been positioned behind the Teams user interface, and SharePoint document libraries can be synced via Teams to File Explorer, any recordkeeping functionality configured in SharePoint should in theory be accessible or useable via Teams and possibly also File Explorer, but this is mostly not the case. So, SharePoint on its own, accessed via the browser only, is not really an option. Additionally, without effective controls, the Files (SharePoint) element of Teams has the potential to become the future equivalent of legacy network file shares full of redundant, outdated and trivial content.
  • If only one or two systems are considered to be the only recordkeeping systems, there is a risk that records may not be saved and/or could be lost, especially if end-users can delete records and there is no back up option.
  • Managing records in this way requires both access to and a very good understanding of the applications designated to be the recordkeeping systems by both IT and records managers.
  • Retention policies (either the base level information governance or more expensive records management) may not be adequate, in terms of both application and coverage, and retention outcome management.
  • Exporting the records to another system or transferring them to another organisation, could become a complex task.
  • Accessing audit logs over a long period (see first approach, last dot point, above).

Approach 4 – All of Microsoft 365 is the recordkeeping system

Approach 4 – All of Microsoft 365 is the recordkeeping system

This approach is similar to the previous one except that it takes a broader approach and requires a degree of ‘letting go’ of the standards used by EDRMS systems (and third-party products). It is also the Microsoft default.

The approach assumes that records may be created or captured anywhere in Microsoft 365, saved to Microsoft 365 via archive connectors, or accessed (subject to access controls) via search connectors. Records are managed ‘in place’, meaning wherever they are created or captured, using a range of tools already available in Microsoft 365. Additional ‘in place’ controls allow certain items to be declared as records.

The approach requires both a very good technical understanding of the Microsoft 365 environment and effective governance by IT and records managers. If internal skills are lacking, it may also require a third-party organisation to implement the system – but based on what recordkeeping model? A reliance on a third-party to implement the recordkeeping elements has several risks (see below).

The main positives of this approach include the following:

  • Records that are created or captured in the Microsoft 365 environment remain there. There is no requirement to copy them to a separate system.
  • Some records, such as emails, can be copied to SharePoint if required.
  • The combination of Teams and SharePoint sites allows for multiple models to manage records – for example, high value records could be managed in a dedicated SharePoint site with multiple dedicated libraries and additional controls (metadata, retention, permissions etc), whereas low level records could be managed in the single ‘Documents’ library presented as the Files tab in a Team, or via File Explorer.
  • All the content (records and non-records) stored across Exchange, Teams, SharePoint/One drive can be searched (subject to roles and permissions). This allows records managers (and others such as Legal) to identify if records may be hidden in personal mailboxes or Teams chat or OneDrive accounts.
  • Minimum retention periods can be applied to all the content (not just records), ensuring that records that may be hidden in Teams chats, OneDrive accounts, or personal mailboxes, will be retained for minimum periods. This option also helps to reduce the volume of redundant, outdated and trivial content that may build up over time otherwise.
  • Retention labels can be applied, including automatically (and using machine learning), to records in mailboxes, SharePoint sites and OneDrive accounts (but not Teams chats or posts, yet).

The main negatives of this model are the same as those listed for the previous model with more focus on the need for both IT and records managers to have a very detailed understanding of and establish effective governance for the entire environment where records may be created or captured, not just the main four applications. This requires some effort to achieve and should not be understated. It is not uncommon to see IT staff with Global Admin managing the entire Microsoft 365 environment using default settings and/or records managers will little technical knowledge or appropriate access struggling to understand how the environment works and drawing on experience with EDRM systems.

Some organisations may engage third-party implementation specialists to configure and set up the environment. Organisations that decide to go down this path should ensure they have the details of this configuration and can support it in the longer run, or the environment (or parts of it) could end up becoming difficult to manage or support over time.

Approach 5 – A potential future model

Microsoft 365 includes a wide range of settings, options and capabilities that have a significant impact on the way records can and will be managed across Microsoft 365 in the future.

Microsoft 365 will continue to evolve over time, including in ways that will support how records are managed. But it is important to keep in mind that Microsoft 365, or its component applications, is not and will never be an EDRMS based on standards such as DOD 5015.2. Microsoft 365 is too complex, and the volume and type of content stored in it too large, for any part of it to be considered the ‘records management’ system.

A new approach is required for the identification and management of records. This approach may draw on existing recordkeeping standards and concepts but is likely to rely more heavily on new and evolving ways to work with information, including records.

Some of these ways have been around for a decade or so in the form of graph-based machine learning (ML), process automation, artificial intelligence (AI). Examples include Google, Facebook, LinkedIn, Netflix, Amazon, eBay and so on. These examples have one thing in common – they all take advantage of the various ‘signals’ and ‘digital exhaust’ voluntarily offered by their users to identify and present things that match your interests – jobs, friends, things to purchase, movies. Post something on Facebook or (perhaps) talk about a particular subject near your phone, and related ads will appear.

So, what is different about Microsoft 365? End-users are related to each other thanks to Active Directory, they connect and communicate with others via email or Teams, they share content, they attend meetings. All of these (and a lot more) signals feed into the underlying Graph and allow connections to be made and suggestions.

There is nothing stopping organisations setting up dedicated SharePoint sites with multiple well-named libraries to manage certain records and leaving other content and records to the world of Teams Files. But all of this information can be related based on context, including who created it, what team that person was in, who they connect with, what access do they have and so on.

Perhaps by 2035, the primary approach to records management will be relying on all the digital connections and signals, machine learning, the Graph and AI to identify all related records in context, not just the ones neatly placed in a SharePoint document library. Records may be automatically identified as important and needing stronger controls based on this context – who created, sent or received it, whether it relates to a subject that is trending (or was in the past).

Instead of just a simple pre-defined aggregation of records (which will still be a valid way to aggregate records), future aggregations will include a wider range of content, created automatically, likely presented in the form of ‘cards’.

Viva Topics is an interesting pre-cursor to this possible future model.

Viva Topics presented in Teams

The following text is from the Microsoft page ‘Alexandria in Microsoft Viva Topics: from big data to big knowledge‘:

Looking further ahead, Alexandria’s ability to extract information automatically gives us the opportunity to customize the knowledge discovery process. By automatically retrieving the set of types and properties being talked about in an organization’s documents, Alexandria can create a knowledge base with a bespoke schema exactly tailored to the needs of each organization and using the familiar language and terminology that people in the organization are used to. Read more about the proposed schema-based design in our research paper.

We are only beginning to dream of the experiences that an automatically created and updated knowledge base can enable, but it is already clear that it could transform the future of how we work. The era of big knowledge is coming sooner than you might think.

Whatever the new approach is, managing records in Microsoft 365 will require new skills on the part of information and records managers.

Posted in Conservation and preservation, Digital preservation, Electronic records, Records management, Retention and disposal

The challenge of identifying born-digital records

A recent ‘functional and efficiency’ review into the National Archives of Australia (also known as the ‘Tune Review’, published on 30 January 2021) noted the ‘rapid and ever-evolving challenges of the digital world’.

It stated that ‘the definition of a ‘record’ needs to reflect current international standards, be more directly applied to digital technologies, and more clearly provide for direct capture of records that are susceptible to deletion, such as emails, texts or online messages’.

The review also highlighted the difficulties associated with ingesting digital records ‘via manual intensive activities (due to lack of interoperable systems)’ and proposed a new model based on the ‘continuous automated appraisal of [Agency] digital records that would require a combination of artificial intelligence and skilled archivists’.

The review underlined the challenges of identifying and managing born-digital records, and the need for better solutions.

This post explores the challenges of accurately and identifying born-digital records in order to manage them.

Identifying and protecting records

Records usually provide evidence of something that happened – an action, an activity or process, a decision, or a current state (including a photograph or video record). They may have or be associated with descriptive metadata used to provide context to the records and guide or determine retention.

Like all other types of evidence, the authenticity, integrity and reliability should be protected for as long as they must be kept.

In the paper world, this outcome was achieved by storing physical records (including the printed version of born-digital records) on paper files or in physical storage spaces.

For the past twenty years or so, this outcome was achieved for (some) digital records by (mostly manually) copying them from a network drive or email system (or via a connector) to a dedicated electronic records management (ERM) system and then ‘locking’ them in that system to prevent unauthorised change or deletion. Most ERM systems consisted of a database for the metadata and an associated network drive file store for the objects.

The main problem with this centralised storage model – however good it might be at protecting copies of records stored in it – was that the original versions, along with all the other records that were not identified or could not be copied to the ERMS, remained where they were created or captured.

And the records stored ‘in’ the ERMS were actually stored on a network file share on a server that was (a) accessible to IT, and (b) almost always backed up. So, yet more copies existed.

The challenge of born-digital records

There are several key challenges with born-digital records:

  • Consistently and accurately identifying (or ‘declaring’) all records in all formats created or captured in all locations. For too long, the focus has primarily been on emails and anything that can be saved to a network drive with the onus of identifying a record on end-users.
  • Ensuring their authenticity, reliability and integrity over time. For records stored in the ERMS, this has usually involved locking them from edit, including through the ‘declaration’ process, or preventing deletion. But in almost all cases, the original version (in email, on the network drives), could continue to be modified. Other records that were not identified or stored in an ERMS may be deleted.
  • Ensuring that born-digital records will remain accessible for as long as they are required.

It is not possible to consistently and accurately manually (or even automatically) identify every born-digital record that an organisation creates or captures to ensure their authenticity, reliability, integrity or accessibility over time. Only a small percentage of born-digital records are copied to an ERMS.

Records remain hidden in personal mailboxes, personal drives and third-party (often unauthorised) systems. Records may exist in multiple forms and formats, sometimes created or stored in ‘private’ systems or on social media platforms. They may take the form of text or instant messages or social networking posts and threads. They may be drawings, images, voice or video recordings.

Even if a record is identified, it is not always possible to save it to an ERMS. Text or instant messages on mobile devices are a case in point that has been a problem for at least two decades. More recent examples include chat messages, reactions (emojis, comments), and recordings of online meetings.

And even if a high percentage of born-digital records could be stored in the ERMS, the original versions will almost always remain where they were created or captured.

A different approach is needed.

Triaging records?

One approach to the problem would be to accept that not all records have equal value. That is, not all records need to be managed the same way.

To some degree, this way of thinking is already reflected in classes in the structure of records retention schedules and the attention paid to each:

  • Records that have permanent or archival value and need to be transferred to archival institutions.
  • Specific types of records that must be created or kept by the organisation for a minimum periods (sometimes quite long but not ‘forever’), for legal, compliance or auditing purposes.
  • Records that are not subject to legal or compliance requirements but which the organisation decides to keep for a minimum period of time.
  • Everything else.

Triaging records means that they can be managed as required at each level, but nothing is missed. It requires a risk management approach.

For records of permanent value, or are subject to legal or compliance requirements, it means that ensuring that these records receive the most attention and every effort it made to ensure that they are and can be identified (declared) and managed accordingly. This would include ensuring that it is possible to identify and capture these records in the systems used to create or capture them, for example, key emails.

A similar approach would be taken to records that need to be kept for legal, compliance or auditing purposes but with an understanding that some of these records (e.g., emails) may remain in the original system where they were created or captured. Technological solutions may be used to identify or tag these records. The destruction of these records should be subject to some form of review and a record kept of the approval and what was destroyed.

For all other records would remain stored wherever they were created or captured and subject to minimum retention periods after which they can be destroyed without review – but a record kept of the basic metadata of each record (including original storage location).

Protecting – or proving – the authenticity, integrity and reliability of records

The assumption behind the protection of records is that they should not be changed or deleted.

The reality, with digital records, is that they may change at any time through new threads, new revisions, new chats, or even through photoshopping.

A more realistic approach may be to use information about what was changed, by whom, and when – not to protect the record but to provide an evidentiary trail to prove what it is or was. The ‘smoking gun’ evidence for most born-digital records is the metadata that is recorded when it was captured or modified, not (necessarily) the added descriptive metadata.

For example:

  • Someone may author a document (metadata records each revision, and each revision can be viewed).
  • The document may be approved electronically (recorded in metadata).
  • Someone then modifies the approved version.
  • All of the above is recorded in the ‘modified’, ‘modified by’ and approval metadata.
  • The record should (or may) also recorded who viewed the record, and when.

EXIF metadata stored on images provides a similar form of evidence (and may even include GPS information).

Which record is more likely to be accepted as evidence:

  • A record stored in an EDRMS, versions or revisions of which may exist in multiple other places, including on network file shares, email system and even backup tapes
  • A record stored in a system that shows the full set of metadata about access and changes, or the most recent thread of an email discussion?

Conclusions

At the end of the day, it should be possible to confirm the authenticity, reliability and integrity of records based on information/metadata that forms part of the born-digital record: who and when it was created, the context in which it was created and its relationship with other records.

Perhaps, instead of focussing on trying to identify and capture all born-digital objects that might be records and ‘protecting’ a version of that record, it may be more practical and easier to leave most records where they were created or captured (and retained by retention policies) and use change or revision metadata to provide evidence of authenticity.

This may, in the end, be a much easier way to protect the authenticity of records than having to rely on manual identification or declaration.

Posted in Artificial Intelligence, Classification, Electronic records, Information Management, Microsoft 365, Records management, Retention and disposal

Can Microsoft technology classify records better than a human?

In late 2012, IDM magazine published an article I co-authored with Umi Asma Mokhtar in Malaysia titled ‘Can technology classify records better than a human?’

The article drew on research into recent advances in technology to assist in legal discovery, known as ‘computer-assisted coding’, or ‘predictive coding’, including the following two articles:

Grossman and Cormack’s article noted that ‘a technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege‘. By contrast, an ‘exhaustive manual review’ required ‘one or more humans to examine each and every document in the collection, and to code them as response (or privileged) or not‘.

The article noted, somewhat gently, that ‘relevant literature suggests that manual review is far from perfect’.

Peck’s article contained similar conclusions. He also noted how computer-based coding was based on a initial ‘seed set’ of documents identified by a human; the computer then identified the properties of those documents and used that to code other similar documents. ‘As the senior reviewer continues to code more sample documents, the computer predicts the reviewer’s coding‘ (hence predictive coding).

By 2011, this new technology was challenging old methods of manual review and classification. Despite some scepticism and slow uptake (for example, see this 2015 IDM article ‘Predictive Coding – What happened to the next big thing?‘), by 2021, it had become an accepted option to support discovery, sometimes involving offshore processing for high volumes of content.

Meanwhile, in an almost unnoticed part of the technology woods, Microsoft acquired Equivio in January 2015. In its press release ‘Microsoft acquires Equivio, provider of machine learning-powered compliance solutions‘, Microsoft stated that the product:

‘… applies machine learning … enabling users to explore large, unstructured sets of data and quickly find what is relevant. It uses advanced text analytics to perform multi-dimensional analyses of data collections, intelligently sorting documents into themes, grouping near-duplicates, isolating unique data, and helping users quickly identify the documents they need. As part of this process, users train the system to identify documents relevant to a particular subject, such as a legal case or investigation. This iterative process is more accurate and cost effective than keyword searches and manual review of vast quantities of documents.’ 

It added that the product would be deployed in Office 365.

Classifying records

The concept of classification for records was defined in paragraph 7.3 of part 1 of the Australian Standard (AS) 4390, released in 1996. The standard defined classification as:

‘… the process of devising and applying schemes based on the business activities generating records, whereby they are categorised in systematic and consistent ways to facilitate their capture, retrieval, maintenance and disposal. Classification includes the determination of naming conventions, user permissions and security restrictions on records’.

The definition provided a number of examples of how the classification of business activities could act as a ‘powerful tool to assist in many of the processes involved in the management of records, resulting from those activities’. This included ‘determining appropriate retention periods for records’.

The only problem with the concept was the assumption that all records could be classified in this way, in a singular recordkeeping system. Unless they were copied to that system, emails largely escaped classification.

Fast forward to 2020

Managing all digital records according to recordkeeping standards has always been a problem. Electronic records management (ERM) systems managed the records that were copied into them, but a much higher percentage remained outside its control – in email systems, network files shares and, increasingly over the past 10 years, created and captured on host of alternative systems including third-party and social media platforms.

By the end of 2019, Microsoft had built a comprehensive single ecosystem to create, capture and manage digital content, including most of the records that would have been previously consigned to an ERMS. And then COVID appeared and working from home become common. All of a sudden (almost), it had to be possible to work online. Online meeting and collaboration systems such as Microsoft Teams took off, usually in parallel with email. Anything that required a VPN to access became a problem.

2021 – Automated classification for records (maybe)

The Microsoft 365 ecosystem generated a huge volume of new content scattered across four main workloads – Exchange/Outlook, SharePoint, OneDrive and Teams. A few other systems such as Yammer also added to the mix.

Most of this information was not subject to any form of classification in the recordkeeping sense. The Microsoft 365 platform included the ability to apply retention policies to content but there was a disconnect between classification and retention.

Microsoft announced Project Cortex at Ignite in 2019. According to the announcement, Project Cortex:

  • Uses advanced AI to deliver insights and expertise in the apps that are used every day, to harness collective knowledge and to empower people and teams to learn, upskill and innovate faster.
  • Uses AI to reason over content across teams and systems, recognizing content types, extracting important information, and automatically organizing content into shared topics like projects, products, processes and customers.
  • Creates a knowledge network based on relationships among topics, content, and people.

Project Cortex drew on technological capabilities present in Azure’s Cognitive Services and the Microsoft Graph. It is not known to what extent the Equivio product, acquired in 2015, was integrated with these solutions but, from all the available details, it appears the technology is at least connected in one way or another.

During Ignite 2020, Microsoft announced SharePoint Syntex and trainable classifiers, either of which could be deployed to classify information and apply retention rules.

Trainable classifiers

Trainable classifiers were made generally available (GA) in January 2021.

Trainable classifiers sound very similar to the predictive coding capability that appeared from 2011. However, they:

  • Use the power of Machine Learning (ML) to identify categories of information. This is achieved by creating an initial ‘seed’ of data in a SharePoint library, creating a new trainable classifier and pointing it at the seed, then reviewing the outcomes. More content is added to ensure accuracy.
  • Can be used to identify similar content in Exchange mailboxes, SharePoint sites, OneDrive for Business accounts, and Microsoft 365 Groups and apply a pre-defined retention label to that content.

In theory, this means it might be possible to identify a set of similar records – for example, financial documents – and apply the same retention label to them. The Content Explorer in the Compliance admin portal will list the records that are subject to that label.

SharePoint Syntex

SharePoint Syntex was announced at Ignite in September 2020 and made generally available in early 2021.

The original version of Syntex (as part of Project Cortex) was targeted at the ability to extract metadata from forms, a capability that has existed with various other scanning/OCR products for at least a decade. The capability that was released in early 2021 included the base metadata extraction capability as well as a broader capability to classify content and apply a retention label.

The two Syntex capabilities, described in a YouTube video from Microsoft titled ‘Step-by-Step: How to Build a Document Understanding Model using Project Cortex‘, are:

  • Classification. This capability involves the following steps: (a) Creation of (SharePoint site) Content Center; (b) Creation of a Document Understanding Model (DUM) for each ‘type’ of record; the DUM can create a new content type or point to an existing one; the DUM can also link with the retention label to be applied; (c) Creation of an initial seed of records (positives and a couple of negatives); (d) Creation of Explanations that help the model find records by phrase, proximity, or pattern (matching, e.g., dates); (e) Training; (f) Applying the model to SharePoint sites or libraries. The outcome of the classification is that matching records in the location where it is pointed are assigned to the Content Type (replacing any previous one) and tagged with a retention label (also replacing any previous one).
  • Extraction. This capability has similar steps to the classification option except that the Explanations identify what metadata is to be extracted from where (again based on phrase, proximity or pattern) to what metadata column. The outcome of extraction is that the matching records include the extracted metadata in the library columns (in addition to the Content Type and retention label).

As with trainable classifiers, Syntex uses Machine Learning to classify records, but Syntex also has the ability to extract metadata. Syntex can only classify or extract data from SharePoint libraries.

Trainable classifiers or Syntex?

Both options require the organisation to create an initial seed of content and to use Machine Learning to develop an understanding of the content, in order to classify it.

The models are similar, the primary difference is that trainable classifiers can work on content stored in email, SharePoint and OneDrive, whereas Syntex is currently restricted to SharePoint.

Predictive coding

On 18 March 2021, Microsoft announced the pending (April 2021) preview release of an enhanced predictive coding module for advanced eDiscovery in Microsoft 365.

The announcement, pointing to this roadmap item, noted that eDiscovery managers would be able to create and train relevance models within Advanced eDiscovery using as few as 50 documents, to prioritize review.

So, can Microsoft technology classify records better than humans?

In their 1999 book ‘Sorting Things Out: Classification and its Consequences‘ (MIT Press), Geoffrey Bowker and Susan Leigh Star noted that ‘to classify is human’ and that classification was ‘the sleeping beauty of information science’ and ‘the scaffolding of information infrastructures’.

But they also noted how ‘each standard and category valorizes some point or view and silences another. Standards and classifications (can) produce advantage or suffering’ (quote from review in link above).

Technology-based classification in theory is impartial. It categorises what it finds through machine learning and algorithms. But, technology-based classification requires human review of the initial and subsequent seeds. Accordingly such classification has the potential to be skewed according to the way the reviewer’s bias or predilections, the selection of one set of preferred or ‘matching’ records over another.

Ultimately, a ‘match’ is based on a scoring ‘relevancy’ algorithm. Perhaps the technology can classify better than humans, but whether the classification is accurate may depend on the human to make accurate, consistent and impartial decisions.

Either way, the manual classification of records is likely to go the same way as the manual review of legal documents for discovery.

Image source: Providence Public Library Flickr

Posted in Access controls, Information Management, Information Security, Microsoft 365, Microsoft Teams, Office 365 Groups, SharePoint Online

Understanding permission groups in Teams and SharePoint

One of the most confusing aspects of Teams and SharePoint in Microsoft 365 is the relationship between permission groups used to control access to both of these resources. This is especially the case as every Team in MS Teams has an associated SharePoint site (the ‘Files’ tab).

This post explains how permission groups work between MS Teams, Microsoft 365 Groups and SharePoint.

SharePoint permission groups

Before discussing how Teams permissions relate to SharePoint, here is a brief reminder of how SharePoint permissions work.

SharePoint has always had three default permission groups, prefixed by the URL name of the site, as shown in the screenshot below (the name of the site always prefixes the words Owners, Members and Visitors).

Site Owners

  • People (including in a Group, see below) added to the Owners permission group have full access (full control) to all parts of the site and are usually responsible for managing the SharePoint site. There would normally be two or three site owners.

Site Members

  • People (including in a Group, see below) added to the Members permission group have add/edit (contribute) rights.

Site Visitors

  • People added to the Visitors permission group have read-only (view) rights.

These permissions are set at the site level and inherited on everything in the site, unless that inheritance is broken and unique permission are applied. Additional permission groups can be created as necessary but most SharePoint sites only use the default Owners, Members and Visitors groups.

Microsoft 365 Groups

Microsoft 365 Groups were introduced in 2017 and control access to resources, like Security Groups.

However, unlike Security Groups, which usually provide access to individual resources (such as a single SharePoint site, or Line of Business (LOB) system), Microsoft 365 Groups control access to multiple linked Microsoft 365 resources.

Microsoft 365 groups, distribution lists, mail-enabled security groups, and security groups (collectively referred to as Active Directory (AD) groups, are all created in ‘Groups’ area of the Microsoft 365 Admin portal.

When a new group is created, the following options appear.

As noted above, Microsoft 365 groups are recommended. It is important to understand the relationship between Microsoft 365 groups, Teams and SharePoint.

A new group has a visible mailbox and a Team is created by default

When a new Microsoft 365 group is created (from the dialogue above), it creates:

  • At least one Owner must be specified. The Owner/s are responsible for managing the Members group.
  • An Exchange mailbox with the same email @ name as the Microsoft 365 group. The mailbox is visible in Outlook to the members of the Group.
  • A SharePoint site with the same URL name as the Microsoft 365 group.
  • By default (unless the checkbox is unchecked), a new Team is also created in MS Teams.

When a new Team is created from MS Teams, or a new SharePoint Team site is created, it creates:

  • A Microsoft 365 Group with an Exchange mailbox and a SharePoint site (‘Files’ tab).
  • The name of the Team becomes the name of the Group and the SharePoint site.
  • The mailbox is not visible in Outlook and is only used for calendaring and for the storage of Teams chats (in a hidden folder).

Importantly, when a new Microsoft 365 group or Team is created (which creates a Microsoft 365 group), the Group Owners: (a) are the same as the Team Owners and (b) are added to the SharePoint Owners permission group, as explained below. .

Group/Team Owners and Members

In other words, the Microsoft 365 group owners (group) is added to the SharePoint site owners permission group – a ‘group within a group’.

That is, the Microsoft 365 group controls access to the Team and the SharePoint site as shown in the diagram below. Security Groups may also be added to the Microsoft 365 Group site, but this does not provide access to the Team.

The relationship between Microsoft 365 Groups, Teams and SharePoint

This ‘group within a group’ model is visible from the ‘Site Permissions’ section of the gear/cog icon as shown below (the name of the Microsoft 365 Group/Team/SharePoint site is ‘SharePoint Admin’). The SharePoint Admin Group Owners (group) is in the SharePoint site owners group, and the SharePoint Admin Group Members (group) is in the Site members group.

If a mouse hovers over the Group ‘icon’ (in the above example, GO or GM), it is possible to view the members of the Group and, for Owners, to modify that list. Confusingly, the ‘GM’ in the SharePoint site permissions group becomes ‘SG’ in the drop down list.

You can also see the ‘group within group’ model from the back-end ‘Advanced permissions’ section of the SharePoint site, but you cannot manage the Microsoft 365 Group members here.

Implementing the model

As with Security Groups, the members of Microsoft 365 Groups will usually be a logical group of people who require access to something, in this case access to the SharePoint site or the Team (for chat, files, or other resources).

The main thing to remember is that membership of the (backend) Microsoft 365 Group provides access to BOTH the Team and the Team’s SharePoint site (the ‘Files’ tab in a Team).

  • Every Team in MS Teams will usually consist of the members of a logical group with a common interest – a business unit, project team, or with some other work relationship, for example, the members of a committee. The Team Owners are responsible for managing the Team Members.
  • The Team Owners are the SharePoint site owners and are responsible for managing the site if they decide to access it directly. The Team Members are the SharePoint site members and have the ability to add or edit content, usually via the ‘Files’ tab in Teams.

Note: Security Groups with the same members as Microsoft 365 Groups (and Teams) may already exist. There is no need to add a Security Group if it has the same members as a Microsoft 365 Group.

As noted earlier, a Group/Team does not have visitors with read-only rights. Every Member of the Team has add/edit access to both the Team and its associated SharePoint site.

  • If there is a requirement to give specific other people either add/edit or read-only access to the SharePoint site, that outcome is achieved by adding people by name, or a Security Group, to either the SharePoint Members or Visitors group.
  • If there is a requirement to give everyone in the organisation either add/edit rights, or read only access, to the SharePoint site, that outcome is achieved by adding ‘Everyone except external users’ to either the SharePoint Members or Visitors group.

External guests may also be added to the Team and the Team’s SharePoint site.