Posted in Compliance, Conservation and preservation, Electronic records, Governance, Information Management, Information Security, Legal, Records management, Retention and disposal, Security

Destroying digital records – are they really destroyed?

Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.

Traditional legacy electronic document and records management (EDRM) systems have two components:

  • A database (e.g., SQL, Oracle) where the metadata about the records are stored
  • A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.

In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.

When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?

This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.

The Windows NTFS Master File Table (MFT)

Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).

In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:

attriblist

As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.

  • The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size.  This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
  • The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.

If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.

When a file is deleted, the MFT records the deletion.

  • If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
  • If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.

So, in summary, ‘deleting’ a file does not actually delete it. It may either:

  • Store the file in the Recycle Bin, making it relatively easy to recover, or
  • Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.

How does an EDRMS store and manage files?

The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.

Most EDRM systems are made up of two parts:

  • A database (SQL, Oracle etc) to store the metadata about the record.
  • An attached file store that stores the actual digital objects.

When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.

When digital records are uploaded to the EDRMS:

  • The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
  • Additional metadata may be added, including a content or record ‘type’.
  • The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
  • The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
  • A link is made between the database record and the record object stored in the file store (the MFT record).

When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.

What happens when an EDRMS record is destroyed?

If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.

For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.

For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files. 

Also:

  • If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
  • There is likely to be a backup of the EDRMS.
  • The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
  • Digital forensics can be used to recover ‘deleted’ files from the associated file share.

Consider this scenario:

  • An email containing evidence of something is saved to a container in an EDRMS.
  • The container of records is ‘destroyed’ after the retention period expires.
  • A legal case arises after the container is ‘destroyed’
  • A subpoena is made for all records, including those specific records.
  • Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?

Is it really possible to destroy digital records, and does it matter?

Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.

But:

  • Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
  • Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
  • Might it be possible to recover the record with digital forensics tools?
  • Does it matter?

It might be worth asking IT and your EDRMS vendor.

References:

 

 

Posted in Electronic records, Exchange Online, Governance, Information Management, Microsoft Teams, Office 365, Office 365 Groups, Products and applications, Records management, Retention and disposal, SharePoint Online

Setting up SharePoint Online to manage records (as part of Office 365) – Part 1/3

This is the first of three posts that describe the main elements involved in setting up SharePoint Online to manage records.

This post focuses on the recordkeeping related elements in the Office 365 and Compliance admin portals:

  • Office 365 Admin – Licences, Roles and AD Groups (including Office 365 Groups)
  • Compliance Admin – Retention labels and policies (and some more options)

The second post focuses on SharePoint Online Admin centre configuration.

The third and last post focuses on SharePoint site collection provisioning and configuration to manage records

Office 365 admin center

O365AdminPortalUsersRolesGroups

The main elements that impact on the management of records in Office 365 are Users (for licences), Roles and Groups, as can be seen in the screenshot.

Users – licencing and applications

Organisations that acquire Office 365 will generally have the relevant licences required (a) to set up and administer SharePoint Online, and (b) for users to use it (and OneDrive for Business).

This post assumes that organisations will have at least an E3 licence which includes SharePoint for end users, visible as an app when they log on to https://office.com, along with all other applications included in the licence, for example Exchange/Outlook, OneDrive for Business, MS Teams and so on. End users with access to these items will also be able to download and use the equivalent mobile device apps.

Roles

The three key roles that impact on the management of records in SharePoint are as follows:

Global Admin (GA)

Global Admins:

  • Are responsible for managing the entire Office 365 environment. This includes creating new Groups (Security Groups, Distribution Lists and Office 365 Groups).
  • Are responsible for assigning key roles, including the SharePoint Administrator and Compliance Administrator (and other roles).
  • May have responsibility for, and/or the skills and knowledge required to set up and administer SharePoint Online and create new sites for the organisation.
  • May also be able to create and publish retention policies in the Compliance admin portal.

Note – Organisations that outsource the administration of Office 365 should always have at least one GA account to access the tenant if ever required. If they don’t have a log on, they should have or acquire a very good understanding of the access and privileges afforded to the outsourced company. 

SharePoint Administrator (SP Admin)

The SP Admin role will usually be a ‘system’ role that is responsible for managing the SharePoint environment, including OneDrive for Business. As noted above, a GA with the right skills can also manage the SharePoint environment. 

Generally speaking, SharePoint Administrators will focus on the technical and configuration aspects of SharePoint. They are not usually responsible for confirugint SharePoint to manage records, managing records, or creating and publishing retention policies. This role usually falls to either the GA or Compliance Administrator.

Compliance Administrator

The Compliance Admin role is responsible, among other things, for the creation and publishing of retention labels and policies in the Compliance Admin portal. A GA can perform this role (along with all other roles) if required.

Compliance Admins will usually be responsible for disposition reviews linked with retention labels, and be involved in eDiscovery cases.

The Compliance Admin can search and view the audit logs for all activity across Office 365 and can carry out broad content searches with the ability to export the content of those searches. As this role is relatively powerful, it should be limited to key senior individuals in the organisation.

Office 365 and Security Groups

Office 365 Groups are Azure/Exchange objects just like Security Groups and Distribution Lists. Accordingly, there should be controls around their creation, including naming conventions.

As every Office 365 Group has an associated SharePoint site, organisations should consider restricting the ability for end users to create Office 365 Groups, and only allowing Global Admins and members of a Security Group to do this. Neither SharePoint Admins or Compliance Admins would normally create AD Groups.

If the ability to create Office 365 Groups is not restricted, an Office 365 Group will be created with an associated SharePoint site whenever:

  • A new Team is created in MS Teams.
  • A new Group is created from Outlook.
  • A new Yammer Group/Community is created.

External sharing

The ability to share content externally from SharePoint and OneDrive for Business is controlled from the Office 365 Admin portal. This is a global setting that can be disabled by the Global Admins if required.

It is assumed, for the purpose of this post, that that setting is enabled to allow external sharing.

Note that enabling external sharing at the global level does not enable it globally for all SharePoint sites; sites must be individually modified to allow it.

Compliance Admin

The Compliance admin portal can be accessed by the GAs and also the Compliance Admins (and some other roles). It is where retention labels and policies are created (in line with the corporate file plan/BCS) and published, and disposition reviews are undertaken, so records managers need access.

Other options in this section that relate to the management of records include the audit logs, content search and eDiscovery.

Retention policies

Retention policies may be applied to all the key workloads in Office 365 where records are stored:

  • Exchange Online
  • SharePoint Online
  • OneDrive for Business
  • MS Teams
  • Office 365 Groups

Retention labels published as retention policies are visible to and can be applied by end-users. Generally these are more likely to be applied at the document library level rather than to individual records, or in mailboxes or OneDrive for Business.

Retention policies that are not based on labels may be applied to all, or parts of, the four workloads listed above. For example, they may be applied to all, or a sub-set of Exchange mailboxes or OneDrive for Business accounts, or SharePoint sites. Retention policies may also be applied to individual or team chats in MS Teams.

Organisations seeking to use retention policies in Office 365 should understand how these work, have a plan for their implementation, and keep track of what has been applied where.

  • Retention policies for all mailboxes or all ODfB accounts may replace previous on-premise backup options for those workloads. It is unlikely that end-users will (or will want to) apply retention labels published as policies to individual emails or folders in mailboxes or OneDrive.
  • SharePoint sites are likely to have either or a combination of explicit and implicit/invisible retention policies. Implicit, single period retention policies may be more suitable for entire smaller, short-lived SharePoint sites. Explicit retention policies may be more suitable for the diverse range of content on more complex and long-lasting sites. Some sites may be created and populated around the need to keep a particular type of record for a long period of time – for example, employee records.

Audit logs

The Office 365 audit logs are found in the Compliance admin portal. For an E3 licence, the content in the logs is stored for 90 days.

As audit logs are an important element in keeping records, organisations may need to consider ways to retain this content for a longer period.

Note – SharePoint document libraries record the name of anyone who edited a document (and also previous versions), but they don’t record the name of anyone who simply viewed it. SharePoint lists also include audit trails, making it possible to track changes in individual rows of a list.

Content searches and eDiscovery

The Compliance admin portal provides two similar options to search for content across Office 365. Both the Content Search and eDiscovery options provide the ability to establish a ‘case’ that can be run more than once.

The eDiscovery option provides the added ability to put content on Legal Hold. Advanced eDiscovery is available with a higher licence.

Next

Click on the links below to read the next two posts:

  • SharePoint Online Admin centre configuration.
  • SharePoint site collection provisioning and configuration to manage records.
Posted in Compliance, Electronic records, Exchange Online, Information Management, Microsoft Teams, Office 365 Groups, Products and applications, Records management, Retention and disposal, SharePoint Online

Understanding and applying retention policies to content in MS Teams

This post highlights the need to understand how retention works in MS Teams, why it may be related to how long you keep emails (including for backup purposes), and why you need to consider all the elements that make up an Office 365 Group when considering how – and how long – to retain content in MS Teams.

Overview of retention in MS Teams

If you are unfamiliar with how retention works with MS Teams, these two related sites provide very useful detail.

overview_of_security_and_compliance_in_microsoft_teams_image1
Image from the first link above – Security Compliance Overview

The quote below from the second link is relevant to this post:

‘Teams chats are stored in a hidden SubstrateHolds folder in the mailbox of each user in the chat, and Teams channel messages are stored in a hidden SubstratesHolds folder in the group mailbox for a team. Teams uses an Azure-powered chat service that also stores this data, and by default this service stores the data forever. With a Teams retention policy, when you delete data, the data is permanently deleted from both the Exchange mailboxes and the underlying chat service.’

and

‘Teams chats and channel messages aren’t affected by retention policies applied to user or group mailboxes in the Exchange email or Office 365 groups locations. Even though Teams chats and channel messages are stored in Exchange, they’re only affected by retention policies applied to the Teams locations.’

In summary:

  • One-to-one chat in MS Teams is stored in a hidden folder of the mailbox of each user in the chat. Documents shared in those chats are stored in the OneDrive for Business of the person who shared it.
  • Group chat in Team channels is stored in a hidden folder of the mailbox of the associated Office 365 Group – and also in an Azure chat service. Documents are stored in the Office 365 Group’s SharePoint site (other SharePoint site libraries may also be linked in a channel).

Another quote from the same post:

‘In many cases, organizations consider private chat data as more of a liability than channel messages, which are typically more project-related conversations.’

Teams content is kept in mailboxes, retention may be similar

Typically, in the on-premise past, organisations will have backed up their Exchange mailboxes (and possibly also enabled journaling, to capture emails), for disaster recovery, ‘archiving’ and investigations. Unless a decision is made to invest in cloud back-ups, Office 365 retention policies may also be applied to Exchange mailboxes, effectively replacing the need to back them up. Retention policies applied to Exchange mailboxes don’t affect the teams chat folder.

Organisations should probably apply the same retention period to both emails and Teams chats as they do to email mailbox backups now. That is, if mailboxes are typically kept for 7 – 10 years after the person leaves the organisation, then keep the Teams chats for the same period.

Note that, even if a poster deletes an item (if that option is enabled), it will still be retained if there is a retention policy.

Suggestions for retention in MS Teams

As there can be different retention requirements, depending on the subject matter, here are some suggestions for retention:

  • One-to-one chat is like email, you will never know everything that is being said or sent. So a single retention policy that mirrors email would be appropriate.
  • Teams chat is more likely to be about the subject of the Team, which is based on an Office 365 Group, its own mailbox, and has a SharePoint site. In this case, you could consider a retention policy applied to all Office 365 Groups or specific Groups – for example ‘Project Groups’, then ensure that the retention policy or policies cover all aspects of the Office 365 Group (mailbox, team chat, SharePoint).
  • If all the records relating to a particular subject matter (including email, chat and documents) must be retained for 25 years, then you need to understand all the options.

It underscores the need to plan carefully for retention management for all the key workloads in Office 365.