Office 365 is sometimes referred to as an ‘ecosystem’. In theory this means that records could be stored anywhere across that ecosystem.
Unlike the ‘old’ on-premise world of standalone servers for each Microsoft application (Exchange, SharePoint, Skype) – and where specific retention policies could apply (including the Exchange Messaging Records Management MRM policy), the various elements that make up Office 365 are interconnected.
The most obvious example of this interconnectivity is Microsoft Teams which stores chat content in Exchange and provides access to content stored in both SharePoint (primarily the SharePoint site of the linked Office 365 Group) and OneDrive, and has links to other elements such as Planner.
Records continue to be created and kept in the various applications but retention policies are set centrally and can apply to any or all of the content across the ecosystem.
Managing records in Office 365, and applying retention rules to those records, requires an understanding of at least the key parts of the ecosystem – Exchange, Teams, SharePoint and OneDrive and how they interrelate, and from there establishing a plan for the implementation of retention.
What types of records are created in Office 365?
Records are defined as ‘evidence of business activity’ and are often associated with some form of metadata.
Evidence of business activity is an overarching term that can include:
Documents and notebooks (in the sense of text on a page)
Plans, including both project plans and architectural plans and diagrams
Images/photographs and video
Chat and/or messages
Conversations (audio and/or video based)
Social media posts
All digital records contain some form of metadata, usually displayed as ‘Properties’.
Where are the records stored in Office 365?
Most records created organisations using Office 365 are likely to be created or stored in the following parts of the ecosystem:
Exchange/Outlook – for emails and calendars.
SharePoint and OneDrive – for documents and notebooks (in the sense of text on a page), plans, images/photographs and video.
Stream – for audio and video recordings.
MS Teams – for chat and/or messages, conversations (audio and/or video based). Note that 1:1 chats are stored in a hidden folder of the Exchange mailbox of the end-user/s participating in the chat, while Teams channel chat is stored in a hidden folder of the linked Office 365 Group mailbox.
Yammer – for (internal) social media posts.
It is also possible to import and archive certain external content such as Twitter tweets and Facebook content in Office 365.
The diagram below provides a overview of the main Office 365 applications and locations where records are created or stored. Under SharePoint, the term ‘Sites’ refers to all types of SharePoint sites, including those associated with Office 365 Groups. Libraries are shown separately because of the potential to apply a retention policy to a library – see below.
Note also that this diagram does not include network file shares (NFS) as the assumption is made that (a) NFS content will be migrated to SharePoint and the NFS made read only, and (b) all new content that would previously have been stored on the NFS is instead saved either to OneDrive for Business (for ‘personal’ or working documents) or SharePoint only.
Creating a plan to manage records retention across Office 365
In previous posts I have recommended that organisations implementing Office 365 have the following:
A basic architecture design model for SharePoint sites, including SharePoint sites linked with Office 365 Groups (and Teams in MS Teams).
A plan for creating and applying retention policies across the ecosystem.
Because SharePoint is the most likely location for records to be stored (aside from Exchange mailboxes and OneDrive accounts), there should be at least one retention policy for every SharePoint site (or group of sites), as well as policies for specific document libraries if the retention for the content in those libraries may be different from the retention on the overall site.
For example, a ‘Management’ site may contain a range of general content as well as specific content that needs to be retained for longer.
The site can be covered by a single implicit retention policy of (say) 7 years. This policy will delete content in the background, based on date created or data modified.
The document library where specific types of records with longer or different retention requirements are stored may have one or more explicit label-based policies applied to those libraries. This content will be retained while the rest of the site content is deleted via the first policy.
Structure of a retention plan for records in Office 365
A basic plan for creating and applying retention policies might look something like the following:
User mailboxes – one ‘general’ (implicit) retention policy for all mailboxes (say, 7 years after creation) and another more specific retention policy for specific mailboxes that require longer retention.
SharePoint sites – multiple (implicit) retention policies targeting one or more sites.
SharePoint libraries – multiple (explicit) label-based retention policies that are applied manually. These policies will usually a retention policy that is longer than any implicit retention policy as any implicit site policy will prevent the deletion of content before it reaches the end of that retention period.
Office 365 Groups (includes the associated mailbox and SharePoint site) – one ‘general’ (implicit) retention policy. See also below.
Teams channel chat – one ‘general’ (implicit) retention policy. Note that this content is stored in a special folder of the Office 365 Group mailbox.
1:1 chat – one ‘general’ (implicit) retention policy. This content is stored in a special folder of the participant mailboxes.
OneDrive documents – one ‘general’ (implicit) retention policy for all ODfB accounts, plus the configuration of retention after the account is inactive.
At a high level, the retention policy plan might look something like the following – ‘implicit’ policies are shown in yellow, SharePoint document libraries may be subject to ‘explicit’, label-based policies. The ‘+7 years’ for OneDrive relates to inactive accounts, a setting set in the OneDrive Admin portal.
To retain content for a Microsoft 365 group, you need to use the Microsoft 365 groups location. Even though an Microsoft 365 group has an Exchange mailbox, a retention policy that includes the entire Exchange location won’t include content in Microsoft 365 group mailboxes. A retention policy applied to an Microsoft 365 group includes both the group mailbox and site. A retention policy applied to an Microsoft 365 group protects the resources created by an Microsoft 365 group, which would include Microsoft Teams.
The actual plan should contain more detail and included as part of other recordkeeping documentation (perhaps stored on a ‘Records Management’ SharePoint site). The plan should include details about (a) where the policies have been applied and (b) the expected outcomes or actions for the policies, including automatic deletion or disposition review (for document libraries).
Keep in mind that, unless the organisation decides to acquire this option, there is no default backup for content in Office 365 – once a record had been deleted, it is gone forever and there may be no record of this beyond 90 days.
Few organisations create original records on paper any more. Almost all the paper records that are created these days are the printed versions of born-digital records.
In a somewhat ironic twist, many organisations seek to digitise (or ‘scan’) the printed versions of born-digital records.
And yet, there apparently continues to be an ongoing problem in many organisations (particularly government organisations) about ‘going digital’.
Why is ‘going digital’ so hard for records management?
On one hand, allowing people to even print and store the printed version of digital records on paper files helps to perpetuate the problem of going digital.
On the other hand, many older style recordkeeping systems require content to be copied from one system where they have been created, captured or stored, to another. The requirement to copy a record (if it is not automated) requires a conscious, voluntary (and selective) action on the part of the end-user. It does not guarantee that the copied record is the final version of a document or, in the case of email, that there is no additional replies in a thread. And, the original remains in the originating system.
Additionally, some types of records cannot easily be copied to a centralised recordkeeping system. Examples include Twitter tweets, Facebook content, instant messaging texts, chat, video, and conferencing text, audio and video. And even when they can, there is no certainty that the version saved is the most recent.
The elephant in the room – digital recordkeeping has not evolved
In my opinion, the primary reason why digital records continued to be printed, and why organistions find it hard to ‘go digital’, is because many recordkeeping systems and practices have not evolved with the digital world.
Instead, they remain based on the idea that all records should be stored, with added metadata, in a central recordkeeping system. Anything that does not fit this model, and any system that doesn’t meet all the standards for keeping records in this way, is regarded as ‘non-compliant’.
Vendors of these traditional, centralised recordkeeping systems highlight how these systems meet recordkeeping compliance requirements, which in turn further cements these systems as being the only way compliance requirements can be met. These systems increasingly are unable to capture the full range of digital content and consequently, ‘going digital’ becomes a problem – because the system isn’t working.
It’s a vicious cycle.
How paper recordkeeping turned into digital recordkeeping
Until the early 1990s, paper files (and boxes) were the really the only way we had to store records. During the late 1980s and 1990s, many organisations acquired databases to keep track of these paper files and the boxes in which they were stored.
At the beginning of the digital world, in the early to mid 1990s, in the absence of any other method, digital records were usually printed and placed on the same paper files.
By the end of the 1990s, the databases that were originally used to keep track of paper files were adapted to manage digital records in digital ‘files’ (folders, containers).
But the opportunity was missed to evolve the paper recordkeeping paradigm into something more suitable for digital records.
Additionally, none of the leading software manufacturers, Microsoft in particular, did anything to incorporate recordkeeping in their various systems and applications.
Recordkeeping systems used to manage digital content, retained the same ‘filing’ concept where end-users, after receiving suitable training, had to (voluntarily) copy the digital record (including emails) to the digital ‘file’, leaving the original in place.
The idea of a central recordkeeping system, to where all records are to be copied, makes almost no sense in the digital world. For almost twenty years, it has overlooked or even ignored the ever-increasing volume and types of digital records and persisted with a centralised model.
In my opinion, the problem of ‘going digital’ for many organisations has been directly related to the fact that recordkeeping systems have not evolved from the original centralised model.
It doesn’t make sense in a digital world.
Fixing the problem
In my opinion, one of the key problems is not so much that many older style recordkeeping systems are based around a paper recordkeeping paradigm (because this paradigm can still be valid, especially for high value or archival records), but that organisations think they should manage all records according to the same paradigm, or otherwise they will not somehow ‘comply’ (especially with government recordkeeping requirements).
Records may have different ‘value’. There are a lot of low-quality, low-value records.
Some records can go from being innocuous (‘OK’ in an email reply) to being critical very quickly (when the ‘OK’ becomes evidence of fraud).
Not all records need to have complex recordkeeping metadata. In fact, most digital records already have extensive metadata payloads.
Emails will continue to remain separate from other records.
Only a small percentage of records need to be kept for a long time.
Digital records can be categorised or classified in multiple ways over time. Pre-defined classification applied to a digital record may not accurately capture the full context (or potential context) of a record and may even impede it.
Digital records may remain active even after they are captured, including as new versions, new replies in a thread, modified images and so on.
When managed well, digital records can be managed and accessed in place, in the system in which they were created or captured.
Digital records that need special attention, including records that require long-term storage, can still be managed in ‘files’ or ‘containers’, but this needs to be implemented in a way that is simple for end-users to understand.
Organisations should, in my opinion:
Embrace digital recordkeeping.
Abandon the idea that all records must be copied to a central recordkeeping system.
Accept that any system can contain records – including line of business systems that also capture documents as records – and focus on how to manage the records in those systems.
Use the recordkeeping capability of the systems where records are created or captured.
Focus most effort on records of high value, or records that need to be kept for a long time including for archival purposes.
Let end-users create and work with born-digital records where they are created or captured, without the additional overhead of having to copy these to another system.
Implement high-level architecture models and monitor where information is being stored.
Use a combination of global retention policies and auto-classification to protect the integrity, reliability and authenticity of records.
Use search and discovery to find content, wherever it is stored, whenever it is required.
On 27 March 2020 I asked, via Twitter, whether organisations that rolled out MS Teams will wonder in the future who created all the random (and randomly-named) SharePoint sites.
The reason for this question was because many organisations, scrambling to establish ways for staff to work from home, decided to make use of MS Teams in their (often newly implemented) Office 365 suite of apps.
I have seen multiple organisations since late 2019 ask ‘who created all those SharePoint sites?’ when they reviewed the list. The current COVID-19 work-from-home situation will only make this situation ‘worse’ and, without effective oversight or controls, result in the creation of multiple uncontrolled SharePoint sites.
Unlike other products like Zoom, Whatsapp, Facetime and Skype, however, MS Teams is not a standalone product, but a core element in the Microsoft Office 365 ecosystem.
The key point is this – every Team in MS Teams has a linked SharePoint site (and an Exchange mailbox, where all the chat content is stored). You can’t disable these options.
What happens if you create a Team in MS Teams?
The good thing about the one-to-one chat element of MS Teams is that it’s relatively intuitive and easy to use, including on the mobile app. You only need to tell users it’s like Skype or Whatsapp, but for internal user only, and most pick it up quickly.
The Teams part of MS Teams is not quite as intuitive, but early adopters generally understand the basic concepts – that a Team has members, and you can have multiple chat channels for each Team.
Once end-users understand how a Team works (and this can take some time because one-to-one chat can include multiple people), they might notice this option at the bottom left of the app:
Creating a new team sounds like a great idea, so end-users may try:
My guess is that end-users are more likely to want to ‘build a team from scratch’ as shown below, because the second option doesn’t really make sense.
There is a good chance they will want the Team to be ‘Private’, although may not fully understand what this means. A Public Team sounds like a Yammer Group (or Community).
So far, so good, the end-user can give the Team any name they like:
At the bottom of the naming screen is the option to ‘Create’. The end-user is then invited to add members to their new Team. This seems a fairly obvious step, and they can add whoever they want. New members are by default ‘Members’ but they can be changed to ‘Owners’ if necessary. There is no control over this process.
The new team now appears on the left-hand menu of MS Teams:
The new team opens at the default ‘General’ channel.
On the main part of the Team, the following options are offered:
Along the top, ‘Posts’, ‘Files’, ‘Wiki’ and a + to add more applications. (Hint – the ‘Files’ option points to the SharePoint site that has been created behind the scenes).
Across the middle, three options to ‘Add more people’, ‘Create more channels’ and ‘Open the FAQ’
At the bottom, the option to ‘Start a new conversation’ with various other options including the ‘Meet now’ video option.
The end-user can now get on with chatting, sharing files, and adding apps to do other things.
But what else has happened?
As noted above, the ‘Files’ tab in the General channel gives a clue to the existence of the connected SharePoint site. End-users may not care terribly much about this, for them it provides the option to create, upload, share and collaborate on files.
A new Office 365 Group is created
But before we get to the SharePoint site, it’s important to understand the one-to-one relationship between a Team in MS Teams and an Office 365 Group. If you do not know what an Office 365 Group is, please read this Microsoft guidance on Office 365 Groups.
In very simple terms:
Every new Team in MS Teams creates a new Office 365 Group.
The Owner of the Office 365 Group is the Owner of the team; the members of the Group are the Members of the team, as added by the person who created the Team.
The new Office 365 Group appears in the list of Groups in the Office 365 Admin portal, as shown below. Access to this part of the Admin portal is normally restricted to Global Admins (who would normally be responsible for creating other types of AD Groups, such as Security Groups and Distribution Lists.
A new Exchange mailbox has been created
Note that the process has also created an Exchange mailbox with a Group email address. The new Exchange mailbox will now appear in the Outlook client of everyone in the Team – something they are unlikely to notice.
As noted above, all the chat messages in the Team are stored in a hidden folder in the Exchange mailbox for the Team.
A new SharePoint site has been created
If we go across to the SharePoint Admin portal, which is normally restricted to Global Admins and SharePoint Admins, we can see that a new SharePoint site has been created, and is owned by the ‘Group owners’.
The SharePoint Admin has had no involvement in the creation, naming, or structure of this new site. And, just to add another factor, the SharePoint Admin cannot access the site – see below.
The Team owner may not realise it, but they now have a SharePoint site. The new site’s ‘Documents’ library appears in the ‘Files’ tab as shown below.
And, just to add a confusing element, the site includes the invitation (at the bottom left) to create a new Team!
As noted above the SharePoint Admin can ‘see’ that this site exists in the list of sites but cannot actually access it. The Global Admin, on the other hand, can access it.
So the person responsible for managing SharePoint across the organisation cannot access the SharePoint site, which is not a good thing from an information governance point of view.
The reason they cannot access the site is because they were not added to the Site Collection Admin Group when the site was created. And, just to make it a bit more confusing, the ‘Users and Permissions’ section of Site Settings, where the ‘Site collection administrators’ section is found (see screenshot below), does not appear in Office 365 Group-based SharePoint sites.
So, how does the SharePoint Admin get access to this site to configure and manage it? There are two ways:
The Global Admin can go to /_layouts/15/mngsiteadmin.aspx (after the site name URL) and add them (or a Security Group with them in it) there.
The SharePoint Admin can click on the site details in the SharePoint admin portal and add him/herself as an Owner. This puts them in the Site Collection Admin section along with the Group Owner.
This post began with a simple question – if organisations allow end-users to create Teams to work from home, how will they manage all the SharePoint sites that are created through the process described above?
There is no one answer to this question but it’s worth understanding exactly what happens – and what else is created (including Planner) – when a Team is created. Organisations seem to go one of two ways:
Let end users create Teams and deal with the consequences later, including attempts at auto-classification and retention policy application across the various elements of the new Office 365 Group – mailbox, SharePoint site, Team chat. This is the Microsoft default and the preference of many organisations that are don’t have compliance issues or can accept the risks of uncontrolled information stores.
Control the creation of Teams, but make any controlled process as easy as possible for end-users to keep them working quickly, and manage the content in mailboxes, SharePoint and Teams proactively. While not the preferred option, it will help with the management of corporate information down the track.
Records managers have been struggling with managing emails as records ever since they first appeared in the workplace.
For a long time the accepted practice, as with other digital records, was to print them out and put them on the appropriate file. With the introduction of electronic document and records management (EDRM) systems, end users were instead required to save or copy documents and emails to an electronic ‘file’ in that system.
In both cases, the emails remained in the user’s ‘personal’ mailbox, where they remained inaccessible for ‘privacy’ reasons. End-users and business areas would (and still do) conduct business via the email system, without these records being available to anyone except the sender and recipient/s. Attachments to emails sent to individual recipients were (and continue to be) not managed as records unless they were printed out or saved to the EDRMS.
Microsoft Office 365 has changed the paradigm for keeping records as described in the linked post, away from the central storage and management of records in one system (while leaving the originals in place), to the decentralised ‘in place’ storage and centralised management of records across Office 365.
This post provides an overview of the three main options for managing email as records in Office 365, in both Exchange and SharePoint.
In summary the options are:
Leave emails in place in Exchange mailboxes (personal and Office 365 Group mailboxes) and apply one or more Office 365 retention policies to mailboxes.
Same as previous point, and use Content Search to retrieve emails as required.
Same as previous point, and only copy specific emails to SharePoint
Keep in mind while reading this post that chat content from MS Teams is also stored in Exchange mailboxes but that content cannot be copied to SharePoint.
Option 1 – Leave emails in place and apply retention policy
In this option, emails remained stored in personal or Office 365 Group mailboxes. End users may create folders and ‘categorise’ the content as they wish, but no additional attempt is made to further categorise, add metadata to, or group the content according to recordkeeping requirements. The aggregation, from a recordkeeping point of view, is the end-user or Office 365 Group.
All mailboxes are subject to one or more retention policies set in the Office 365 Compliance portal to ensure that no emails are deleted before a pre-defined minimum period.
Note that retention policies can effectively replace a back-up regime used by IT for disaster recovery and investigation purposes purposes.
Emails are aggregated by user name or Office 365 Group and will remain in mailboxes for a minimum period of time as set by the retention policy.
Office 365 Group mailboxes provide the ability to group emails by a more specific subject (the Group name, which could map to a business function – e.g., ‘Correspondence Management’) and have the added positive of having an associated SharePoint site.
The negative with this option, from a recordkeeping point of view, is that all emails – regardless of subject or importance – are grouped by the ‘personal’ or Office 365 Group mailbox, and kept for the period defined in the retention policy. That is, there is no differentiation between (email) records that may need to be kept for a long period of time and those that are transient in nature.
If there is a requirement to ensure that certain emails are kept in different aggregations or for different periods of time, then option 3 should be considered.
Option 2 – Same as option 1 and use Content Search to retrieve emails
This option is the same as the first option, but the business can make use of Content Search to identify and isolate emails as required. Content Search is more or less the same as the search part of an e-Discovery case.
Note that access to the Content Search area is restricted to Office 365 Global Admins and Compliance Admins. This is because, as can be seen in the screenshot below, a Content Search can be set up to search for any content in email, documents and much more.
Content Searches can be set up from the ‘New Search’ option, or the Administrator can make use of a Guided search or Search by ID List. For the purpose of this email, only the ‘New search’ will be examined.
Configuring a new Content Search
Each content search can be configured against three main options as shown in the screenshot below: Keywords, Conditions, and Locations. Some searches may require a combination of these three options.
Keywords can be any words that may be found anywhere in the email, including the content.
The available conditions are listed below:
Size (in bytes)
The available search locations include any or all of the options below:
Office 365 group email
Skype for Business
Office 365 Group sites
Exchange public folders
For more detail on how to use Content Search and all the options available, go to this Microsoft site.
Running a search
After the search has been configured, it must be run. The speed of the search will depend on the complexity of the search, conditions, locations and the volume of content. Every search will appear in the list of searches that have been saved.
When complete, the search result will show a ‘Status’, showing the number of:
Once the search has completed, the results of the search may be exported. There are two configurable options for exported results.
All items, excluding ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
All items, including ones that have unrecognized format, are encrypted, or weren’t indexed for other reasons
Only items that have an unrecognized format, are encrypted, or weren’t indexed for other reasons
Exchange content export options:
One PST file for each mailbox
One PST file containing all messages
One PST file containing all messages in a single folder
Enable de-duplication for Exchange content (check box)
Content searches are likely to find and retrieve more relevant emails than might be saved elsewhere, as it looks through all emails. Provided a retention policy has been applied to the mailboxes, the content should still be accessible. If the emails have been deleted at the end of a retention policy, they will not be accessible any more.
Emails can be exported and – if necessary – the PST copied to a different system (such as SharePoint) for long-term storage with additional metadata as required.
Access to the Content Search option is restricted to Global Admins and Compliance admins, for good reason. Consideration might need to be given to governance or procedural rules. Note that Global Admins are always alerted when a new content search is created or run.
Each search must be pre-configured and run regularly to ensure that all emails are identified.
Content searches may retrieve too much unrelated content.
Option 3 – Same as option 2 and copy only select emails to SharePoint
This option mimics the legacy way of saving a record to a pre-defined separate aggregation, in this case to a SharePoint document library.
It differs from the first two options in that only certain select emails are copied (by end-users or using a third-party application) to specific SharePoint document libraries. It is still, however, possible (and preferable) to apply a retention policy to the original mailboxes.
Content search, which can be used at any time, will find the emails in both Exchange mailboxes and SharePoint as long as they have not been deleted via a retention policy expiry .
The positives with this option are that emails copied to a SharePoint document library:
Are grouped with other related records. This may be important from an organisational recordkeeping point of view, for example for certain key records. Consideration might also be given to setting up an Office 365 Group instead for these specific records.
Can have additional metadata.
Can be retained for a period of time, different from the original mailbox.
The problems with this option are that:
It requires some kind of action to copy the email.
It creates a copy of the email, it doesn’t remove the original.
An email copied to another system may not be the most recent in a thread, especially if that thread is still active.
Does not include the ‘chat’ elements from MS Teams.
Summing up the options
The idea of copying an email to a separate aggregation, container or file for recordkeeping purposes is a legacy concept inherited from the paper recordkeeping period. While attempts were made over the years to mimic that concept in EDRM systems, it has several weaknesses that mostly outweigh the alleged benefits.
Email (in Exchange) and documents (in SharePoint) continue to remain separate in Office 365 but there is now the potential to manage both equally through a combination of retention policies and pre-defined content searches.
The majority of business emails are never captured in separate recordkeeping systems. Microsoft’s centralised retention model and ability to apply to retrieve emails on the fly mean that it is more efficient and cost effective to leave emails in place. This does not exclude the potential to copy certain select emails to SharePoint.
Additionally, mailboxes associated with Office 365 Groups provide the ability to keep emails in a business context, away from inaccessible ‘personal’ email accounts. Records managers should consider the potential of using Office 365 Group mailboxes in this way for particular types of records.
Most information management professionals (in the context of this post – records managers, information managers, librarians) are familiar with the use and application of metadata.
The metadata in their domain of work may:
Form part of the built-in properties of digital records, and remain with it wherever it is stored, as part of its metadata ‘payload’. Commonly this is the title or name, date created, and creator/author.
Record (usually additional) details about, or provide the context for an object when it is captured or registered in a system (‘point of capture’ metadata). This metadata may include classification terms or numbers, object types, access and security controls, storage location, and the container or aggregation.
Record various actions and events through the life of the object (‘process metadata’), including when the record was accessed/used, modified or deleted/destroyed, and by whom.
This post discusses how the metadata and the metadata skills and related knowledge of information management professionals are closely related to a broader set of skills and knowledge including enterprise data modelling. Information management professionals might consider learning more about this subject as a career path.
What is data modelling?
Most of the definitions for the term ‘data modelling’ have the same three ‘layers’ (or variations), usually shown in the form of a pyramid:
Conceptual: A (usually simple) model that shows all the high-level data entities and their relationships across an organisation. For example ‘Customer’, ‘Employee’, ‘Property’, ‘Organisation’. See below for an example.
Logical: A more detailed model of each entity in the conceptual model, showing the multiple logical entities that exist for each conceptual entity, their attributes and associations (relationships). The attributes for each level 2 entity are more or less metadata fields. See below for an example.
Physical: A database schema or framework for how data is actually stored in a database. Physical data models are usually very complex and are often regarded as Intellectual Property (IP) by product vendors.
This post focuses only on the first two layers, conceptual and logical.
Elements that make up data models
Both the conceptual and logical data models are made up of entities, attributes and relationships/associations.
At the conceptual level, entities are the highest level groupings that are related to each other. For example: ‘Services’ or ‘Products’, ‘Vendors’, ‘Employers’, ‘Property’, ‘Customers’, ‘Accounts’ or ‘Organisation’.
At the logical level, entities are the various data elements that make up the level 1 entity). One way to think of this would be to consider a data entry screen where all the details about an entity must be recorded. For example, what data would or should be captured about an Employee, or a Service? In complex organisations, second level logical models may have several hundred entities or may need be broken down into related sub-entities.
At the conceptual level, attributes may simply be the logical layer entity names and a definition for each.
At the logical level, entity attributes define each of the entities. Depending on the complexity of the model, level 2 entities may be single entities (for example, ‘Employment Type’) or grouped (for example, ‘Personal details’ or ‘Contact details’); in these cases, the attributes become the equivalent of metadata or field names, e.g., ‘Surname’, ‘First name’, ‘Gender’, ‘Date of Birth’.
At the conceptual level, relationships are usually relatively simple. For example, an employee ‘works in’ the organisation; the organisation ‘sells’ services or products to customers who ‘pay’ for them.
At the logical level, relationships are also relatively simple but there will be more of them as there are more entities. For example, an employee ‘has’ a position in the organisation, and ‘has’ a salary level. The employee ‘has’ personal details (e.g., name, date of birth).
Example Level 1 conceptual model
The following diagram is an example of a conceptual model, showing the high-level entities and the relationships between them. What are the level 1 entities in your organisation?
In many organisations, line of business systems can be mapped to each of these entities because that is where the data for those entities is stored. For example:
Employee information may be managed in a Human Resources Information System
Accounts or financial information may be managed in a Financial Management Information System.
Client information may be managed in a Client/Customer Relationship System and other client-specific systems.
Property information management may be managed in a Property Management System.
Unstructured data, in the form of records, relating to each of these entities, may be managed in a centralised document and records management system, network file shares, email, or other alternatives.
Example level 2 logical model
The following diagram is an example of an actual logical model for the entity ‘Employee’, showing the multiple entities and the relationships between them. If you have identified the level 1 entities in your organisation, what are the level 2 entities?
If you plan to create data models, it is a good idea to use appropriate software; neither Visio nor PowerPoint are in any way suitable for level 2 data modelling. The level 2 model above was created using the software application ‘Enterprise Architect’ from Sparx Systems. This system allows multiple entities to be created independently and then brought together in a model as required, with relationships automatically indicated. In the model above, the level 1 entity precedes the level 2 entity name. There are three entities: Employee, Common (as these entities are common to multiple level 1 entities), and Organisation. The attributes for each entity indicate whether they form part of a group (e.g., ‘Contact details’), or are metadata attributes in their own right (for example ‘Family Name’ or ‘Date of Birth’ in the ‘Personal Details’ entity, which is a grouping related to the ‘Employee Details’ entity.
In the same organisation, the ‘Client’ level 2 entity data model contained several hundred entities and so several sub-entities (based on the ‘Client type’ entity) were created.
Data models and data dictionaries
The metadata attributes depicted for each entity in the level 2 model show only the ‘field’ name. For example the Personal Details entity includes the field names ‘Family Name’, ‘First Name’, ‘Gender’ etc. The data model does not normally provide further detail.
Instead, details for all entities, including their attributes and associations, can be defined in data dictionaries. The following are examples of the information that should be defined for each level 2 entity attribute:
Name (e.g., ‘Gender’)
Text/String (‘free text’).
Choice (and the actual choice options)
Relationship (e.g., with other attributes)
This type of detail and form should be familiar to most information management professionals. Data dictionaries can also include other more specific metadata entity details as well, including things like Function, Activity, Document Type and so on, even if these are not represented in actual data models.
The value of data models and data dictionaries
Conceptual and logical data models – and the data dictionaries that describe the details in these models – are essential information artifacts, especially in larger organisations with multiple business systems. They:
provide an easy to understand, visual conceptual and logical overview of all the data across the organisation;
can be used in discussions with third-party database vendors, as they define an ideal objective against which acquisition decisions may be made;
help to understand why there are issues or problems with data quality. For example, a database may allow free text data for important data elements that then cannot be easily analysed;
help to define issues with or support analytical and business intelligence outputs;
show that the organisation is serious about managing data consistently and appropriately;
have the potential to help reduce costs or increase efficiencies. For example, a critical database may allow for free text entry (which takes time and is prone to error) when a choice option would be far more efficient and accurate.
The pathway from metadata to data modelling
As noted at the beginning of this post, most information management professionals have the skills and knowledge required to manage metadata. Information management professionals can draw on these core skills to develop or refine data models for the organisation.
In the first instance, knowing if data models even exist would be a good step. Even if they already exist, a discussion with IT (or the relevant person responsible for managing the data models) could be a good idea.
Records management standards (see below) state that a defining feature of records is that they are associated with metadata – both ‘point of capture’ metadata and ‘process’ metadata that continues to evolve throughout the life of the record.
For at least two decades, the requirement to capture and store metadata for digital records has driven the implementation of centralised electronic document and records management EDRM systems, many of which began life as databases used to record metadata about physical records (files and boxes).
EDRM systems were (and still are) used to store copies of digital records created or captured natively in other systems, primarily network file shares and email. End-users were required to copy individual records to the EDRMS, a process that mirrored the storage of records (including printed digital records) in physical files.
Network file shares and email systems were not considered to be suitable as recordkeeping systems because they could not ensure the authenticity, integrity and reliability of records over time, including to manage and preserve metadata about the records stored in them.
The increasing implementation of Office 365, and in particular the use of SharePoint for the storage of records, has highlighted the extent to which recordkeeping metadata can – or even should – be applied to the content stored in that system.
This post discusses the need for metadata in records stored in Office 365, including in both Exchange/Outlook, MS Teams, and SharePoint/OneDrive for Business. It concludes that most records stored in Office 365 do not need additional metadata but, where such metadata is required, there is unlimited capability to add it.
Records and metadata
The international standard for records management, ISO 15489:2016, defines a record as ‘information created, received, and maintained as evidence and as an asset by an organization or person, in pursuit of legal obligations or in the transaction of business’.
Records are said to be different from ‘non-records’ because they are associated or described with (mostly added) metadata that describes ‘the context, content and structure of records and their management through time’.
The standard for recordkeeping metadata is ISO 23081:2017. One records management professional (link at the end of the post) noted that there has been reluctant adoption of this standard, mostly because it was ‘too complex’ and ‘academic’, and used ‘foreign terms’. Unspecified vendors were said to have been dismissive of the standard.
Standard for managing digital records – ISO 16175
Part 2 of the standard ISO 16175:2011, ‘Guidelines and functional requirements for digital records management systems’ contains multiple requirements relating to metadata, across three broad categories:
Point of capture metadata. This includes metadata that forms part of the ‘metadata payload’ of the original record (e.g., date created, creator), other metadata added at point of capture, and metadata that provides additional context for the records.
Process metadata. This is metadata that records activities and changes to both the record and metadata over the life of the record.
The need to manage and control metadata over time.
This standard appears to reinforce the requirement for records to be stored and managed in dedicated recordkeeping systems.
On premise document and records management systems: These systems use metadata schema that specify metadata fields to be used in the system.
Cloud systems including Office 365: These systems can make use of enterprise ‘graphs’ that map people to documents and topics. The graph is built from the interactions of people with content across the different workloads of the suite.
Most people now accept the algorithm capabilities of Facebook, LinkedIn, eBay, Amazon and similar online systems to automatically connect us with information relevant to us, without having to add any metadata.
Given the volume and types of digital content, almost all of which has metadata ‘payloads’, how can we ever hope to add the required recordkeeping metadata?
Can’t we just rely on the algorithms and graphs?
How much metadata do you really need?
The answer to this question may depend largely on business, regulatory/compliance and/or government recordkeeping requirements relevant to the organisation and its jurisdiction. In my experience, across multiple very large and also very small organisations:
Most private sector organisations will likely have minimal metadata requirements beyond basic ‘point of capture’ and ‘process’ metadata already recorded in the system where the records are created or captured (including email), unless this is required for specific compliance or regulatory purposes, or where there is risk associated with poor recordkeeping. For example, in a major food processing company, records relating to the manufacture of food were very well documented and managed, while corporate records were managed haphazardly.
Most public sector organisations are required, for government accountability and transparency (and information retrieval) purposes, to apply a minimum set of both ‘point of capture’ and ‘process’ metadata for non-permanent records. Many government agencies have struggled to manage digital records effectively.
A small percentage of records captured or created in government agencies may require more extensive metadata, especially if those records are to be transferred to archival institutions for permanent retention.
Office 365 ‘workloads’
In Office 365, most business records will be created or captured in either Exchange/Outlook (includes MS Teams chats), or SharePoint or OneDrive for Business (for ‘working’ or personal content).
Exchange is a recordkeeping system in that it stores records with consistent metadata. The primary ‘weakness’, in terms of recordkeeping, is that ‘personal’ Exchange mailboxes aggregate records on a range of subjects by an individual user rather than by business subject. The mailboxes of Office 365 Groups, on the other hand, can be used to aggregate records about a business function/activity or subject.
SharePoint is a recordkeeping system that has extensive default metadata and almost unlimited additional metadata capability (see below). OneDrive for Business is a SharePoint service that has the same extensive default metadata capability.
There is, generally speaking, no requirement for organisations that have implemented Office 365 to allow the continued use of network file shares because the ‘save’ and ‘save as’ options in Office/Windows 10 points to SharePoint and OneDrive as the default save locations.
Metadata in Exchange mailboxes/MS Teams
Emails have the same metadata options in the header of every email:
Recipients (To, including CC and BCC)
(Plus more with routing information and security controls including DKIM, SPF, DMARC etc)
However, no other metadata can be added and some (or most) emails may never form part of the collated record of a given subject.
Because of this ‘limitation’, there has been an assumption ever since email was introduced that emails identified as records would have to be copied to a (separate) recordkeeping system.
In pre-digital days, this meant printing out emails and placing them on a paper file.
In organisations with EDRM systems, this meant copying the email to the EDRMs where additional metadata would be applied.
The original emails generally remained in place in individual mailboxes where they may be subject to backups and journaling in case they needed to be recovered for whatever reason including subpoenas (eDiscovery).
The Office Graph in Office 365 now provides the ability to connect the content in email with other content across that ecosystem, as noted in James Lappin’s post above. This is new – but it doesn’t rely on metadata or copying emails anywhere.
Metadata in SharePoint
As a SharePoint service, OneDrive for Business has the same default metadata columns. According it will not be described further here.
What metadata is required?
Organisations that plan to manage records in SharePoint should consider the following questions as part of their overall information architecture design to ensure records are kept in logical aggregations rather than randomly. This is important especially if end-users are allowed to create Office 365 Groups or Teams.
What point of capture and process metadata is required (for compliance, regulatory, recordkeeping purposes)? What is the source of this requirement?
Is there a difference in the metadata requirements for short-term (retain in the organisation) and permanent records that are to be transferred to archival institutions?
Do the required metadata columns already exist in SharePoint?
If they don’t exist, should the additional metadata columns be added as site columns or library columns?
Does any of the metadata need to be mandatory, and/or can it be a default setting – for example, a metadata column that has the default function and/or activity so the user doesn’t need to add this.
Where is the process metadata and how do you view or manage it? (See also below on this subject).
Information architecture and metadata
The information architecture of SharePoint, in terms of managing records as objects (e.g., documents, spreadsheets, images, etc), is relatively simple:
SharePoint site. The primary aggregation that can be linked to a business function (e.g., ‘Financial management’).
Document library/ies. Logical aggregations or containers of records that can be linked to business activities (e.g., ‘Meetings’).
Folders, document sets as content aggregations.
An effective site architecture can replace the requirement for metadata. For example, the name of the SharePoint site can map to a business function, and library names can map to activities, instead of applying a function and activity pair to each record. The URL address for the record provides the context:
If additional metadata is still required, SharePoint has extensive and almost unlimited capability.
Every new SharePoint site comes with a standard set of around 240 metadata ‘site columns’. The metadata columns include the Dublin Core metadata items.
New metadata columns can be created at the site level (‘site columns’). These are then can be used by all libraries and lists on the site. Here is a useful description of how to add new site columns from ShareGate: SharePoint 101: SharePoint Site Columns.
Every new SharePoint library comes with a standard set of metadata columns – see below. New metadata columns can be created at the library (or list) level, but these columns are only available to that specific library or list.
Default SharePoint document library columns
The default library metadata columns are as follows. Dublin Core metadata items are shown with [DC]:
App Created By
App Modified By
Check In Comment
Checked Out To
Compliance Asset Id
Created By [DC]
Document ID (when enabled as a feature)
Folder Child Count
Item Child Count
Item is a Record
Label applied by
Modified By Name [DC]
Retention label Applied
How metadata is added to records in SharePoint
Every digital record saved to SharePoint will have some form of native metadata (payload). Additional metadata may be added when the document is saved; this may be optional or mandatory.
When a digital record is saved to SharePoint, SharePoint only copies the title or name of the record, not the original created date or author.
When a Microsoft Office document is saved to a SharePoint document library, the Office document stores the library metadata (including the unique Document ID) in its own XML-based properties. This information is retained with the record even when the record is downloaded from SharePoint.
Viewing the metadata
The metadata that describes the content stored in the SharePoint document library may be viewed in multiple ways (via the edit view option), and may be exported (for example if records are to be destroyed or transferred).
Every record includes a version history that provides details of who modified the content, and when (but not what changes were made unless this is recorded).
Process metadata is metadata that records events relating to the record or the aggregation in which it is kept.
Examples of process metadata include when:
Records are viewed or downloaded (date and by whom).
Records are modified (date and by whom, and ideally what changes were made).
Records are copied or moved (date and by whom).
Security controls were changed (date, by whom, and what changes were made).
Records are deleted/destroyed (date and by whom, with what authority).
While ISO 16175 describes the general requirement to keep process metadata, the actual requirement is likely to differ between organisations. Organisations with high compliance requirements, such as certain types of businesses or government, are more likely to want process metadata to be created, accessed when required, and protected against unauthorised modification.
Office 365 process metadata
Office 365 records process metadata in multiple ways in Exchange and SharePoint.
Emails generally cannot be modified after they have been sent. Accordingly, the primary process metadata for emails and Teams chat is likely to be in the deletion records stored in the Office 365 Compliance admin portal audit logs.
SharePoint/OneDrive process metadata is recorded as follows:
Viewed or downloaded, modified, copied or moved: This is recorded in the Office 365 Compliance admin portal audit logs.
Modified: This is recorded in the Date modified and Modified by metadata, as well as the version history (which also keeps the previous actual versions that can be compared if required).
Security changes: This is recorded in the Office 365 Compliance admin portal audit logs.
Destroyed: Depends, but generally this requires the capture of information manually, then stored elsewhere. For example, if the content of a document library is to be destroyed, then the metadata (along with details of the original library URL) should be exported (manually) first and saved somewhere. This is a manual process.
Note that audit log data in Office 365 is only retained for 90 days with an E3 licences, 365 days for an E5 licence.
Exchange/Outlook email has basic metadata. It is unlikely that it will ever be possible to add other metadata, unless email is copied to SharePoint document libraries.
Chats from MS Teams are stored in hidden folders in Exchange mailboxes.
Organisations that need to keep certain emails for specific compliance, recordkeeping or archival purposes, should consider capturing these in SharePoint document libraries. Organisations might also consider making more use of Office 365 Group mailboxes for business-specific content as these Groups also include both MS Teams chat and have an associated SharePoint site.
The metadata capabilities of SharePoint are unlimited but not all records need the same degree of metadata.
The majority of records can probably be managed in standard SharePoint document libraries using the default metadata columns, or with one or two additional site or library metadata columns added, where required.
The Office Graph will increasingly be able to bring together records dynamically in the context of the business or the end-user via Project Cortex and Delve, respecting security controls that may be in place. The centralised content search and retention policy capability in Office 365 will also enable businesses to find, retrieve and manage content across both Exchange and SharePoint.
AS/NZS ISO 23081 series, Information and documentation – Records management processes – Metadata for records
In the last few months, as more and more organisations implement Office 365, I have been asked one of two questions relating to teams:
From IT – How do we stop end users creating a new Team in MS Teams
From end users – Why can’t I create a new Team?
This post is for end-users, to help understand why the ability to create a new Team in MS Teams has been disabled.
A Team is (much) more than it appears
The simple reason is because of the flow-on effect (see below) and the need for IT to maintain control over the environment, especially the creation of SharePoint sites.
The diagram below, an extract of a larger diagram created by Matt Wade (credit below image), visually shows what happens when a new Team is created (and, for that matter, various other elements).
A new Team creates a range of other things (described below) including a SharePoint site. The SharePoint site that is created is visible as the ‘Files’ tab in the Team channel, as you can see below:
A Team is directly linked with an Office 365 Group
The thing that links all these things together is what are called ‘Office 365 Groups’ (O365 Groups).
O365 Groups only exist in Office 365 and are like a cross between: (a) an Active Directory (AD) Security Group (that controls/grants access to IT resources and systems) and (b) usually small Distribution Lists (a list of people you can email) – but with a lot more functionality.
What do you get with every Office 365 Group?
As can be seen in the diagram above, every O365 Group creates a number of other Office 365 elements. Each Group:
Has at least one owner. This is the person who creates the Group, and becomes the linked SharePoint site owner and the owner of the Team. If there is only one owner, then the owner leaves, there is no-one to manage the group, SharePoint site and Team members. This is one good reason why this should be centralised in IT (who usually create all other AD group types).
Has members. Members usually belong to a logical and generally smaller (<30 people) business unit or work team, similar to membership of an AD Security Group. Membership of the Group (and Team and SharePoint site) is managed by the Owner.
Has a dedicated SharePoint site. The URL of the site is the same as the Group. The members of the Group have default add/edit rights to the SharePoint site. Others, and AD Security Groups, can also be added to the SharePoint site directly (for example, as visitors) but that only gives them access to the site, NOT the Team or the mailbox.
Has an email address/mailbox. The mailbox for the Group appears in the Outlook of every member of the group. You can send and receive mails to/from that Group (similar to a Distribution List).
Has a Planner and a OneNote notebook.
Can be linked to a Team in MS Teams when the Group is created.
What happens if you allow end-users to create Teams?
Conversely, if you create a Team in MS Teams, it creates everything in the previous dot points but with no controls for:
Office 365 Group/Team naming. End-users can create a Team with whatever name they want, which then assigns the same name to the Office 365 Group and SharePoint site.
Group membership. The person who creates the Team becomes the Owner of the O365 Group and is responsible for managing the Group/Team membership.
SharePoint site structure including document library/ies and folders. If the Team uses only the default ‘Documents’ library, it is very likely to create multiple folders, including via File Explorer. The likely outcome is the mess that is often found on network file shares.
Everything else that comes with every Team, including Planner and OneNote.
Some organisations have allowed their employee to create new Teams in MS Teams and then had to retrospectively clean up the mess created by random SharePoint sites, poor Team names, confusion between O365 Group members and AD Security Group membership and quite a bit more.
Should we even use Teams?
Yes. Read this post from CMSWire titled ‘The State of Play with MS Teams‘ to see why it is a very useful application to implement. Three points from that article:
Chat is the most used function in Teams, making up 70% to 95% of all messages. Chat has 13 times the number of messages than Teams channels. Chat is being used to keep local teams connected in real time.
Staff, on average, are members of three teams but are mostly active in one. While most employees have a “favored” team, Teams operating as forums or communities were identified to help employees engage beyond their local team.
The most active team has 25 members, all active and connected to each other, interacting at the rate of 365 channel interactions/per day or 14 interactions/per member/per day. This does not include chat.
Note that the most active team has 25 members. This underlines the point made earlier that Office 365 Groups work best when there are fewer than 30 members.
Where is the data stored?
Finally, where is the data stored?
Chats are stored in a hidden folder in the participant’s email mailboxes.
Documents are stored in the OneDrive of participants.
Chats in the Team channels
Chats are stored in a hidden folder in the Office 365 Group’s mailbox.
Documents stored in these channels are stored in the O365 Group’s linked SharePoint site.
Should we use Teams?
Yes, definitely, but understand what is happening ‘under the hood’ if you allow end-users to create new Teams.
Organisations that are new to Office 365 should consider disabling the ability for end-users to create Teams by disabling the ability for end-users to create Office 365 Groups.
Smaller organisations can leave the option available but ensure that there is a guide for the creation of new Teams, including naming conventions and Group/Team membership management.
It will generally be better to centralise the creation of MS Teams in IT as they will normally be responsible for the creation of Active Directory Security Groups and should therefore be responsible for the creation of the more powerful Office 365 Groups.
This post describes how Office 365 Groups, with their associated mailbox and SharePoint site, can be useful when there is a business requirement to manage aggregations of email and document-based records for a given function or subject, and why this option should be considered as part of your architecture model for managing records in Office 365 and SharePoint.
What are Office 365 Groups?
Office 365 Groups are a relatively new concept and still remain poorly understood, especially in organisations implementing Office 365.
Because they are not well understood, organisations may implement all the elements of Office 365 without necessarily being aware that:
Every new Group in Outlook, Team in MS Teams, Planner, and Yammer group creates an Office 365 Group with an associated SharePoint site.
End-users can create SharePoint sites from their individual SharePoint portal (the default ‘Team site’ option creates an Office 365 Group)
SharePoint sites will proliferate without any controls over naming or content.
In very simple terms, Office 365 Groups:
Are Azure Active Directory (AD) objects, similar to Security Groups (also known as AD Groups).
Like Security Groups, have members and give those members access to certain (but usually different) IT resources.
Have an associated Exchange mailbox and SharePoint site (that links these two in the same context), and can be linked with a Team in MS Teams. The members of the Group have access to both the mailbox and the SharePoint site. Members of Security Groups may also be added to the SharePoint site but won’t be able to see the mailbox or the Teams chat.
Are similar to Distribution Lists or a shared mailbox in the sense that all the members of the Group can receive emails from the one mailbox.
Include other functionality, including Planner, and can be linked with Yammer groups/communities.
The records management problem that Office 365 Groups can help to resolve
One of the key recordkeeping problems in many organisations (aside from managing digital records generally) is the disconnect between email and other forms of records on the same subject or context.
Emails are created, sent to and from, and stored in mostly inaccessible ‘personal’ accounts (with the odd shared mailbox). Emails may have attachments and the attachment may be the only version of the record. Shared mailboxes partially help this issue but there remains a disconnect between emails and other records.
Other digital records (including saved email attachments) may be stored across network file shares (including ‘personal drives’ and the local C drive) and other locations, including USB drives and unofficial cloud-based systems. Unofficial and unapproved storage locations heighten information security risks both from storing official records in unofficial locations, but also the potential for malicious links.
In the early computer days the only way to keep these records together was to print them and put them on a file, a practice that (sadly) continues to exist. Over the last 20 years, electronic document and records management (EDRM) systems have provided a similar functionality by requiring end-users to copy original documents to a digital version of the file (leaving the originals in place in most cases).
The problem that most organisations face is that records about the same subject or business context, in multiple forms (email, documents) and formats (including chat, messaging and social media) are stored across multiple systems (Outlook, network file shares, personal drives, personal apps on mobile devices).
As a case in point, in 2018 I asked a business unit of so-called ‘mobile workers’ how they kept in touch and ‘collaborated’. Their responses included Facebook Messager, Whatsapp, DropBox (and others) and private emails.
How can Office 365 Groups help recordkeeping?
As noted above, Office 365 Groups have a mailbox, SharePoint site and a Team in MS Teams.
If set up correctly, a single Office 365 Group can provide a single point of context for both email and other digital records including chat, without end-users having to copy or move content anywhere else.
The following are examples (from real life experience) of how Office 365 Groups can be used to keep related records in context.
Organisation often have a central point for the management of all incoming correspondence. In the past, this was likely to be a shared mailbox and correspondence might be uploaded (including after scanning of paper mail) to an EDRMS or other system for routing and responses.
Instead of using a shared mailbox, the Office 365 Group mailbox can be used to receive all emails (where they can remain), and its connected SharePoint site can be used for the storage of digital content including template responses (that could also be stored in Content Types created on the site). A Team in MS Teams may also be created to provide a forum for discussion about the correspondence or other matters – for example, a channel to discuss draft responses.
All the members of the Office 365 Group can access the mailbox, chat and collaborate on content within Team channels, and use the SharePoint site. They can use the ‘share’ option (rather than attaching document to emails) to share drafts with others, or use the ‘out of the box’ ‘Request Sign Off’ flow available in every document library to seek approval.
Additionally, if the Group’s SharePoint site front page has the default ‘Site Activity’ web app displayed, this will show new emails coming in to the Group’s mailbox as shown in the image below. These emails are only accessible to the members of the Group.
Storing records by function/activity/subject
Office 365 Groups could be used to manage the records of a particular business function and activity, particularly those where there is a lot of email.
For example, the functions of ‘Fleet Management’ (or ‘Asset Management’), ‘Property Management’, or even ‘Financial Management’ are all likely to have both email and document type records.
Emails relating to the function can be sent to and managed from the Group’s mailbox by the members of the Group, rather than using a shared mailbox (which is disconnected from the other records).
Documents (including emails from the mailbox, if required) relating to the function can be stored in SharePoint document libraries that map to the activities that are being performed. For example, in a ‘Meetings’ library.
There may be a single retention policy for the entire site or one or more label-based retention policies used on individual libraries, or a combination of both (the longest retention policy will take precedence).
In this way, emails and documents about the same subject can be managed from a single Office 365 Group.
Senior executive management
Office 365 Groups can be used to support and secure communication between senior executives. It provides them with a single restricted access mailbox and SharePoint site, access to both of which are controlled by membership of the Office 365 Group. It also provides them with a Team in MS Teams.
Additional security measures can be applied to more sensitive information including:
More restricted security on some parts of the SharePoint site
Data Loss Prevention policies for very sensitive information
Retention policies to prevent the deletion of content (or capture ‘deleted’ content in a hidden Preservation Hold library).
Stronger monitoring of all activity by end-user.
IT Operations/Service Desk
The IT Service Desk is a common point of contact in most organisations and most service desks will have a shared mailbox to review and triage incoming emails. They will also have a requirement to keep records relating to service support issues.
An Office 365 Group, perhaps named ‘ITServiceDesk’, can be established.
The Group mailbox can be the central point of email contact for the Service Desk (ITServiceDesk@organisation.name).
The associated SharePoint site can be used for the storage of service support documents and other content.
The Group’s Team in MS Teams can have multiple channels to support each application or aspect of IT as required. The Service Desk can use the one-to-one chat and sharing screen capability to resolve service issues.
Office 365 Groups can change the way we work
As noted above, ‘personal’ emails and the use of mostly uncontrolled network file shares (and other storage locations) have been a common way of working for three decades.
Changing these work habits can be hard. However, change can be brought about with minimal impact, provided end-users are assured that the content they need to access will still be accessible and protected, and there is a tangible benefit in doing so in adopting the new ways of working.
These changes should start small, with a focus on mainly small business units that would benefit from being ‘converted’ to an Office 365 Group. The changes that come with an Office 365 Group include:
Replacing shared mailboxes and some (mostly smaller) distribution lists with Office 365 Group mailboxes – still accessed from Outlook.
Replacing network file shares with a File Explorer-based view of the Group’s SharePoint site libraries (via the sync option). This will allow end-users to continue to work in a familiar way. Note that added metadata is not visible from File Explorer and certain metadata options, such as making a column mandatory, will cause the File Explorer view to become read only.
Introducing end-users to MS Teams initially for one-to-one chat, and pointing out how their Group also has a Team where they can chat and access the SharePoint site and other resources.
Demonstrating how the browser-based version of a SharePoint site can show (via the ‘Activity’ web part) emails coming from the Group’s mailbox. And that document libraries have a range of additional document and records management functionality.
Managing retention for all aspects of the Group, ensuring that content is kept (and can be recovered) for as long as required. This action can be hidden from end-users.
The end result should be better management of all records relating to specific subjects or needing to be kept in context.
What about other records stored in email and SharePoint sites?
This post has focused on the benefits of using Office 365 Groups to manage certain types of records within a given context. This model may not be suitable for all types of records. For example:
End-users will probably, for years to come, create, send and manage emails in ‘personal’ email accounts. Email won’t go away as it provides a useful medium for certain types of communication.
End-users will also continue to use their personal space in OneDrive to create and store records that should be stored in SharePoint. It is not easy to monitor this content and so end-user training is essential to ensure that final versions of records are stored in SharePoint.
Security Groups are still valid, especially for groups of 30 or more and can be used in parallel with Office 365 Groups. For example, a small business area may have an Office 365-based SharePoint site and decide to give read only access to the members of the members of a Security Group (with different members), by adding that Security Group to the Site Visitors group. Any member of the Security Group who also happens to be a member of the Office 365 Group will continue to have Member (add/edit) access.
High level business department/divisions may prefer to retain standard SharePoint sites with limited Member access and with Visitor (read only) access controlled via Security Groups (especially those with hundreds of members). An Office 365 Group with more than 30/50 members is still possible, but the benefits compared with using Security Groups is debatable, especially if a Team is linked with the Office 365 Group. Experience using Yammer since 2012 suggests that (a) Teams with more than 30 people can become very quiet and (b) a mailbox with more than 30 members is not useful.
Office 365 Groups:
Should be part of the information architecture model for managing records in Office 365. They are well-suited for lower-level and/or small business units with fewer than 30 members as experience suggests that the more people in the group the less likely everyone will actively participate and contribute.
Can and should, in many instances, replace existing functionality including shared mailboxes and network file shares.
Will allow end-user to continue to work in familiar ways, via Outlook and File Explorer, while offering new options to communicate and collaborate.
Should reduce the volume of ‘personal’ emails and attachments to emails.
May enable the creation, and facilitate the storage and management of records relating to the same business context, including within a function/activity pair.
May never replace the requirement for email or ‘standard’ high level business division or departmental SharePoint sites.
Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.
Traditional legacy electronic document and records management (EDRM) systems have two components:
A database (e.g., SQL, Oracle) where the metadata about the records are stored
A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.
In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.
When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?
This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.
The Windows NTFS Master File Table (MFT)
Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).
In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:
As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.
The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size. This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.
If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.
When a file is deleted, the MFT records the deletion.
If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.
So, in summary, ‘deleting’ a file does not actually delete it. It may either:
Store the file in the Recycle Bin, making it relatively easy to recover, or
Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.
How does an EDRMS store and manage files?
The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.
Most EDRM systems are made up of two parts:
A database (SQL, Oracle etc) to store the metadata about the record.
An attached file store that stores the actual digital objects.
When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.
When digital records are uploaded to the EDRMS:
The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
Additional metadata may be added, including a content or record ‘type’.
The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
A link is made between the database record and the record object stored in the file store (the MFT record).
When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.
What happens when an EDRMS record is destroyed?
If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.
For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.
For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files.
If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
There is likely to be a backup of the EDRMS.
The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
Digital forensics can be used to recover ‘deleted’ files from the associated file share.
Consider this scenario:
An email containing evidence of something is saved to a container in an EDRMS.
The container of records is ‘destroyed’ after the retention period expires.
A legal case arises after the container is ‘destroyed’
A subpoena is made for all records, including those specific records.
Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?
Is it really possible to destroy digital records, and does it matter?
Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.
Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
Might it be possible to recover the record with digital forensics tools?
Does it matter?
It might be worth asking IT and your EDRMS vendor.
The SharePoint Online admin centre contains a number of configuration options and settings. Most of these settings relate to the administration of SharePoint as a service and are not described further unless they relate to the management of records.
The section named ‘Active sites’ lists all active sites, including details of storage used and when it was last modified. The list can be exported as a csv file.
The records management team should have a retention plan for every SharePoint site, including Office 365 Group-based sites and communication sites. The SharePoint Admin and the Records Manager/s to review the list from time to time to review where content is stored and if any sites could potentially be deleted.
Creating new sites
As noted in the screenshot above, the SharePoint Admin can create a new site directly from this portal, or it may be scripted.
Organisations that are new to Office 365, and especially larger organisations that want to manage corporate records in SharePoint, might consider restricting – at least initially – the ability for end users to create new SharePoint sites, as well as new Teams in MS Teams, Groups in Outlook that also create SharePoint sites via the Office 365 Group.
If there is no control over the creation (at least initially) of SharePoint sites, the number of sites could grow exponentially with no regard to corporate recordkeeping requirements. Sites holding important records may abandoned or forgotten, or be invisible to people who need to see them.
As soon as there is sufficient critical mass in terms of SharePoint sites for business areas, and training and awareness for end users, these controls may be loosened.
There are three options to create new sites from this portal:
Team site. These create an Office 365 Group with Members who become the Members (add/edit) of the SharePoint site. It is recommended that an Office 365 Group is created first to ensure consistency in Group naming and controls. These types of sites, with a Team in MS Teams, may work better for smaller business units or project teams with less than 30 staff. They are also more likely to contain ‘working documents’ or have content (including the connected mailbox) that can be covered by a single retention policy.
Other options (sites). The options here are Team Sites, Document Center, Enterprise Wiki, Publishing Site. Team sites created here are best for large departmental or divisional sites where access can be controlled through AD Security Groups. These types of sites are more likely to last for several years, contain formal, final versions of records stored in controlled and well-named document libraries, and be subject to more than one retention policy (including both site and library policies).
All new sites must be provisioned, which is described further below.
The SharePoint admin can only assign, from the admin portal, Site (Collection) Administrator permissions for individual SPO sites. Site Owners, Members and Visitors are assigned in the individual sites once they are created.
Generally speaking, Site Owners should work in the business unit that ‘owns’ the SharePoint site. Site Owners should not be the head of the business unit unless they are prepared to manage the SharePoint site.
Site Administrators are the Site Collection Administrators found in that section of the permissions ribbon menu for the site, under ‘Advanced permissions settings’.
All SharePoint Admins should be Site Collection Admins
Site Collection Admins should be grouped in a Security Group (so each site doesn’t have to be modified every time, only the SG)
If the SharePoint Admin is not listed in the Site Collection Admin group (including via the recommended SG), they may get ‘access denied’ if they try to open the site directly. They can, however, still see the site and modify the admins from the SP Admin portal.
The Primary Admin is by default ‘Company Administrator’. It is good practice to: (a) create a single SG for SharePoint Admins, and (b) remove Company Administrator as it doesn’t really need to be there – GAs can access the SP Admin portal anyway.
It is recommended that a key or senior records or information manager be added to the Site Collection Administrator Security Group added to all SharePoint sites to provide access to all to the content, if required. This can be removed on a case by case basis if there are concerns about the security of the content in those sites.
External Sharing is always disabled, even if it is enabled globally. A decision must be made for each site to allow external sharing.
External sharing allows records to be shared directly with external parties, rather than being attached to emails. This provides better security for those records as the ability to prevent the download the record can also be added.
Hub sites (or sub-sites?)
Hub sites (top level and ‘subsidiary’ sites) are effectively the replacement for sub-sites in SharePoint. See below regarding the architecture of SharePoint sites.
More features – Records Management
The SharePoint admin portal has a ‘classic’ setting under ‘More features’ called ‘Records Management’. This is not what it appears to be – it is in fact a way to set up ‘send to connections’ to ‘send’ (actually copy) content to a Records Center.
There are a number of problems with this (one of which is that it copies the most recent version and re-creates it in a new library) and it is not recommended for the management of records.
The OneDrive Admin portal includes a ‘Storage’ section that defines how much storage user’s will get as well as a setting for how long the content will be retained.
Records managers should be involved in discussions around the retention of OneDrive for Business content both while the account is active (via an Office 365 retention policy) and after the account is de-activated (via the setting here).