Posted in Classification, Compliance, Exchange Online, Information Management, Microsoft Teams, Office 365, Office 365 Groups, Products and applications, Records management, Retention and disposal, SharePoint Online, Training and education

Planning for records retention in Office 365

Office 365 is sometimes referred to as an ‘ecosystem’. In theory this means that records could be stored anywhere across that ecosystem.

Unlike the ‘old’ on-premise world of standalone servers for each Microsoft application (Exchange, SharePoint, Skype) – and where specific retention policies could apply (including the Exchange Messaging Records Management MRM policy), the various elements that make up Office 365 are interconnected.

The most obvious example of this interconnectivity is Microsoft Teams which stores chat content in Exchange and provides access to content stored in both SharePoint (primarily the SharePoint site of the linked Office 365 Group) and OneDrive, and has links to other elements such as Planner.

Records continue to be created and kept in the various applications but retention policies are set centrally and can apply to any or all of the content across the ecosystem.

Managing records in Office 365, and applying retention rules to those records, requires an understanding of at least the key parts of the ecosystem – Exchange, Teams, SharePoint and OneDrive and how they interrelate, and from there establishing a plan for the implementation of retention.

What types of records are created in Office 365?

Records are defined as ‘evidence of business activity’ and are often associated with some form of metadata.

Evidence of business activity is an overarching term that can include:

  • Emails
  • Calendars
  • Documents and notebooks (in the sense of text on a page)
  • Plans, including both project plans and architectural plans and diagrams
  • Images/photographs and video
  • Chat and/or messages
  • Conversations (audio and/or video based)
  • Social media posts

All digital records contain some form of metadata, usually displayed as ‘Properties’.

Where are the records stored in Office 365?

Most records created organisations using Office 365 are likely to be created or stored in the following parts of the ecosystem:

  • Exchange/Outlook – for emails and calendars.
  • SharePoint and OneDrive – for documents and notebooks (in the sense of text on a page), plans, images/photographs and video.
  • Stream – for audio and video recordings.
  • MS Teams – for chat and/or messages, conversations (audio and/or video based). Note that 1:1 chats are stored in a hidden folder of the Exchange mailbox of the end-user/s participating in the chat, while Teams channel chat is stored in a hidden folder of the linked Office 365 Group mailbox.
  • Yammer – for (internal) social media posts.

It is also possible to import and archive certain external content such as Twitter tweets and Facebook content in Office 365.

The diagram below provides a overview of the main Office 365 applications and locations where records are created or stored. Under SharePoint, the term ‘Sites’ refers to all types of SharePoint sites, including those associated with Office 365 Groups. Libraries are shown separately because of the potential to apply a retention policy to a library – see below.

O365WheretheRecordsare

Note also that this diagram does not include network file shares (NFS) as the assumption is made that (a) NFS content will be migrated to SharePoint and the NFS made read only, and (b) all new content that would previously have been stored on the NFS is instead saved either to OneDrive for Business (for ‘personal’ or working documents) or SharePoint only.

Creating a plan to manage records retention across Office 365

In previous posts I have recommended that organisations implementing Office 365 have the following:

  • A basic architecture design model for SharePoint sites, including SharePoint sites linked with Office 365 Groups (and Teams in MS Teams).
  • A plan for creating and applying retention policies across the ecosystem.

Because SharePoint is the most likely location for records to be stored (aside from Exchange mailboxes and OneDrive accounts), there should be at least one retention policy for every SharePoint site (or group of sites), as well as policies for specific document libraries if the retention for the content in those libraries may be different from the retention on the overall site.

For example, a ‘Management’ site may contain a range of general content as well as specific content that needs to be retained for longer. 

  • The site can be covered by a single implicit retention policy of (say) 7 years. This policy will delete content in the background, based on date created or data modified. 
  • The document library where specific types of records with longer or different retention requirements are stored may have one or more explicit label-based policies applied to those libraries. This content will be retained while the rest of the site content is deleted via the first policy.

Structure of a retention plan for records in Office 365

A basic plan for creating and applying retention policies might look something like the following:

  • User mailboxes – one ‘general’ (implicit) retention policy for all mailboxes (say, 7 years after creation) and another more specific retention policy for specific mailboxes that require longer retention.
  • SharePoint sites – multiple (implicit) retention policies targeting one or more sites.
  • SharePoint libraries – multiple (explicit) label-based retention policies that are applied manually. These policies will usually a retention policy that is longer than any implicit retention policy as any implicit site policy will prevent the deletion of content before it reaches the end of that retention period.
  • Office 365 Groups (includes the associated mailbox and SharePoint site) – one ‘general’ (implicit) retention policy. See also below.
  • Teams channel chat – one ‘general’ (implicit) retention policy. Note that this content is stored in a special folder of the Office 365 Group mailbox.
  • 1:1 chat – one ‘general’ (implicit) retention policy. This content is stored in a special folder of the participant mailboxes.
  • OneDrive documents – one ‘general’ (implicit) retention policy for all ODfB accounts, plus the configuration of retention after the account is inactive.

At a high level, the retention policy plan might look something like the following – ‘implicit’ policies are shown in yellow, SharePoint document libraries may be subject to ‘explicit’, label-based policies. The ‘+7 years’ for OneDrive relates to inactive accounts, a setting set in the OneDrive Admin portal.

O365WheretheRecordsare2

Regarding Microsoft Office 365 Groups, Microsoft notes the following on this page about managing retention in Office 365:

To retain content for a Microsoft 365 group, you need to use the Microsoft 365 groups location. Even though an Microsoft 365 group has an Exchange mailbox, a retention policy that includes the entire Exchange location won’t include content in Microsoft 365 group mailboxes. A retention policy applied to an Microsoft 365 group includes both the group mailbox and site. A retention policy applied to an Microsoft 365 group protects the resources created by an Microsoft 365 group, which would include Microsoft Teams.

The actual plan should contain more detail and included as part of other recordkeeping documentation (perhaps stored on a ‘Records Management’ SharePoint site). The plan should include details about (a) where the policies have been applied and (b) the expected outcomes or actions for the policies, including automatic deletion or disposition review (for document libraries).

Keep in mind that, unless the organisation decides to acquire this option, there is no default backup for content in Office 365 – once a record had been deleted, it is gone forever and there may be no record of this beyond 90 days.

Posted in Digital preservation, Digitisation, Electronic records, Information Management, Records management, Retention and disposal

Why is ‘going digital’ a problem for records management?

Few organisations create original records on paper any more. Almost all the paper records that are created these days are the printed versions of born-digital records.

In a somewhat ironic twist, many organisations seek to digitise (or ‘scan’) the printed versions of born-digital records.

And yet, there apparently continues to be an ongoing problem in many organisations (particularly government organisations) about ‘going digital’.

Why is ‘going digital’ so hard for records management?

On one hand, allowing people to even print and store the printed version of digital records on paper files helps to perpetuate the problem of going digital.

On the other hand, many older style recordkeeping systems require content to be copied from one system where they have been created, captured or stored, to another. The requirement to copy a record (if it is not automated) requires a conscious, voluntary (and selective) action on the part of the end-user. It does not guarantee that the copied record is the final version of a document or, in the case of email, that there is no additional replies in a thread. And, the original remains in the originating system.

Additionally, some types of records cannot easily be copied to a centralised recordkeeping system. Examples include Twitter tweets, Facebook content, instant messaging texts, chat, video, and conferencing text, audio and video. And even when they can, there is no certainty that the version saved is the most recent.

The elephant in the room – digital recordkeeping has not evolved

In my opinion, the primary reason why digital records continued to be printed, and why organistions find it hard to ‘go digital’, is because many recordkeeping systems and practices have not evolved with the digital world.

Instead, they remain based on the idea that all records should be stored, with added metadata, in a central recordkeeping system. Anything that does not fit this model, and any system that doesn’t meet all the standards for keeping records in this way, is regarded as ‘non-compliant’.

Vendors of these traditional, centralised recordkeeping systems highlight how these systems meet recordkeeping compliance requirements, which in turn further cements these systems as being the only way compliance requirements can be met. These systems increasingly are unable to capture the full range of digital content and consequently, ‘going digital’ becomes a problem – because the system isn’t working.

It’s a vicious cycle.

How paper recordkeeping turned into digital recordkeeping

Until the early 1990s, paper files (and boxes) were the really the only way we had to store records. During the late 1980s and 1990s, many organisations acquired databases to keep track of these paper files and the boxes in which they were stored.

At the beginning of the digital world, in the early to mid 1990s, in the absence of any other method, digital records were usually printed and placed on the same paper files.

By the end of the 1990s, the databases that were originally used to keep track of paper files were adapted to manage digital records in digital ‘files’ (folders, containers).

But the opportunity was missed to evolve the paper recordkeeping paradigm into something more suitable for digital records.

Additionally, none of the leading software manufacturers, Microsoft in particular, did anything to incorporate recordkeeping in their various systems and applications.

Recordkeeping systems used to manage digital content, retained the same ‘filing’ concept where end-users, after receiving suitable training, had to (voluntarily) copy the digital record (including emails) to the digital ‘file’, leaving the original in place.

The idea of a central recordkeeping system, to where all records are to be copied, makes almost no sense in the digital world. For almost twenty years, it has overlooked or even ignored the ever-increasing volume and types of digital records and persisted with a centralised model.

In my opinion, the problem of ‘going digital’ for many organisations has been directly related to the fact that recordkeeping systems have not evolved from the original centralised model.

It doesn’t make sense in a digital world.

Fixing the problem

In my opinion, one of the key problems is not so much that many older style recordkeeping systems are based around a paper recordkeeping paradigm (because this paradigm can still be valid, especially for high value or archival records), but that organisations think they should manage all records according to the same paradigm, or otherwise they will not somehow ‘comply’ (especially with government recordkeeping requirements).

In reality:

  • Records may have different ‘value’. There are a lot of low-quality, low-value records.
  • Some records can go from being innocuous (‘OK’ in an email reply) to being critical very quickly (when the ‘OK’ becomes evidence of fraud).
  • Not all records need to have complex recordkeeping metadata. In fact, most digital records already have extensive metadata payloads.
  • Emails will continue to remain separate from other records.
  • Only a small percentage of records need to be kept for a long time.
  • Digital records can be categorised or classified in multiple ways over time. Pre-defined classification applied to a digital record may not accurately capture the full context (or potential context) of a record and may even impede it.
  • Digital records may remain active even after they are captured, including as new versions, new replies in a thread, modified images and so on.
  • When managed well, digital records can be managed and accessed in place, in the system in which they were created or captured.
  • Digital records that need special attention, including records that require long-term storage, can still be managed in ‘files’ or ‘containers’, but this needs to be implemented in a way that is simple for end-users to understand.

Organisations should, in my opinion:

  • Embrace digital recordkeeping.
  • Abandon the idea that all records must be copied to a central recordkeeping system.
  • Accept that any system can contain records – including line of business systems that also capture documents as records – and focus on how to manage the records in those systems.
  • Use the recordkeeping capability of the systems where records are created or captured.
  • Focus most effort on records of high value, or records that need to be kept for a long time including for archival purposes.
  • Let end-users create and work with born-digital records where they are created or captured, without the additional overhead of having to copy these to another system.
  • Implement high-level architecture models and monitor where information is being stored.
  • Use a combination of global retention policies and auto-classification to protect the integrity, reliability and authenticity of records.
  • Use search and discovery to find content, wherever it is stored, whenever it is required.
Posted in Governance, Information Management, Microsoft Teams, Office 365, Office 365 Groups, Products and applications, SharePoint Online

What happens when you create a Team in MS Teams

On 27 March 2020 I asked, via Twitter, whether organisations that rolled out MS Teams will wonder in the future who created all the random (and randomly-named) SharePoint sites.

20200414_122632

The reason for this question was because many organisations, scrambling to establish ways for staff to work from home, decided to make use of MS Teams in their (often newly implemented) Office 365 suite of apps.

I have seen multiple organisations since late 2019 ask ‘who created all those SharePoint sites?’ when they reviewed the list. The current COVID-19 work-from-home situation will only make this situation ‘worse’ and, without effective oversight or controls, result in the creation of multiple uncontrolled SharePoint sites.

Unlike other products like Zoom, Whatsapp, Facetime and Skype, however, MS Teams is not a standalone product, but a core element in the Microsoft Office 365 ecosystem.

The key point is this – every Team in MS Teams has a linked SharePoint site (and an Exchange mailbox, where all the chat content is stored). You can’t disable these options.

What happens if you create a Team in MS Teams?

The good thing about the one-to-one chat element of MS Teams is that it’s relatively intuitive and easy to use, including on the mobile app. You only need to tell users it’s like Skype or Whatsapp, but for internal user only, and most pick it up quickly.

The Teams part of MS Teams is not quite as intuitive, but early adopters generally understand the basic concepts – that a Team has members, and you can have multiple chat channels for each Team.

Once end-users understand how a Team works (and this can take some time because one-to-one chat can include multiple people), they might notice this option at the bottom left of the app:

JoinCreateTeam

Creating a new team sounds like a great idea, so end-users may try:

JoinCreateTeam2

My guess is that end-users are more likely to want to ‘build a team from scratch’ as shown below, because the second option doesn’t really make sense.

JoinCreateTeam3

There is a good chance they will want the Team to be ‘Private’, although may not fully understand what this means. A Public Team sounds like a Yammer Group (or Community).

JoinCreateTeam4

So far, so good, the end-user can give the Team any name they like:

JoinCreateTeam5

At the bottom of the naming screen is the option to ‘Create’. The end-user is then invited to add members to their new Team. This seems a fairly obvious step, and they can add whoever they want. New members are by default ‘Members’ but they can be changed to ‘Owners’ if necessary. There is no control over this process.

JoinCreateTeam6

The new team now appears on the left-hand menu of MS Teams:

JoinCreateTeam7

The new team opens at the default ‘General’ channel.

On the main part of the Team, the following options are offered:

  • Along the top, ‘Posts’, ‘Files’, ‘Wiki’ and a + to add more applications. (Hint – the ‘Files’ option points to the SharePoint site that has been created behind the scenes).
  • Across the middle, three options to ‘Add more people’, ‘Create more channels’ and ‘Open the FAQ’
  • At the bottom, the option to ‘Start a new conversation’ with various other options including the ‘Meet now’ video option.

The end-user can now get on with chatting, sharing files, and adding apps to do other things.

But what else has happened?

As noted above, the ‘Files’ tab in the General channel gives a clue to the existence of the connected SharePoint site. End-users may not care terribly much about this, for them it provides the option to create, upload, share and collaborate on files.

A new Office 365 Group is created

But before we get to the SharePoint site, it’s important to understand the one-to-one relationship between a Team in MS Teams and an Office 365 Group. If you do not know what an Office 365 Group is, please read this Microsoft guidance on Office 365 Groups.

In very simple terms:

  • Every new Team in MS Teams creates a new Office 365 Group.
  • The Owner of the Office 365 Group is the Owner of the team; the members of the Group are the Members of the team, as added by the person who created the Team.

The new Office 365 Group appears in the list of Groups in the Office 365 Admin portal, as shown below. Access to this part of the Admin portal is normally restricted to Global Admins (who would normally be responsible for creating other types of AD Groups, such as Security Groups and Distribution Lists.

A new Exchange mailbox has been created

Note that the process has also created an Exchange mailbox with a Group email address. The new Exchange mailbox will now appear in the Outlook client of everyone in the Team – something they are unlikely to notice.

JoinCreateTeam8

As noted above, all the chat messages in the Team are stored in a hidden folder in the Exchange mailbox for the Team.

A new SharePoint site has been created

If we go across to the SharePoint Admin portal, which is normally restricted to Global Admins and SharePoint Admins, we can see that a new SharePoint site has been created, and is owned by the ‘Group owners’.

JoinCreateTeam9

The SharePoint Admin has had no involvement in the creation, naming, or structure of this new site. And, just to add another factor, the SharePoint Admin cannot access the site – see below.

The Team owner may not realise it, but they now have a SharePoint site. The new site’s ‘Documents’ library appears in the ‘Files’ tab as shown below.

JoinCreateTeam11

And, just to add a confusing element, the site includes the invitation (at the bottom left) to create a new Team!

JoinCreateTeam10

As noted above the SharePoint Admin can ‘see’ that this site exists in the list of sites but cannot actually access it. The Global Admin, on the other hand, can access it.

JoinCreateTeam12

So the person responsible for managing SharePoint across the organisation cannot access the SharePoint site, which is not a good thing from an information governance point of view.

The reason they cannot access the site is because they were not added to the Site Collection Admin Group when the site was created. And, just to make it a bit more confusing, the ‘Users and Permissions’ section of Site Settings, where the ‘Site collection administrators’ section is found (see screenshot below), does not appear in Office 365 Group-based SharePoint sites.

SPOSiteSettings

So, how does the SharePoint Admin get access to this site to configure and manage it? There are two ways:

  • The Global Admin can go to /_layouts/15/mngsiteadmin.aspx (after the site name URL) and add them (or a Security Group with them in it) there.
  • The SharePoint Admin can click on the site details in the SharePoint admin portal and add him/herself as an Owner. This puts them in the Site Collection Admin section along with the Group Owner.

Summary

This post began with a simple question – if organisations allow end-users to create Teams to work from home, how will they manage all the SharePoint sites that are created through the process described above?

There is no one answer to this question but it’s worth understanding exactly what happens – and what else is created (including Planner) – when a Team is created. Organisations seem to go one of two ways:

  • Let end users create Teams and deal with the consequences later, including attempts at auto-classification and retention policy application across the various elements of the new Office 365 Group – mailbox, SharePoint site, Team chat. This is the Microsoft default and the preference of many organisations that are don’t have compliance issues or can accept the risks of uncontrolled information stores.
  • Control the creation of Teams, but make any controlled process as easy as possible for end-users to keep them working quickly, and manage the content in mailboxes, SharePoint and Teams proactively. While not the preferred option, it will help with the management of corporate information down the track.

 

Posted in Governance, Information Management, Microsoft Teams, Products and applications, Records management, SharePoint Online

Why end-users cannot create a Team in MS Teams – a common question

In the last few months, as more and more organisations implement Office 365, I have been asked one of two questions relating to teams:

  • From IT – How do we stop end users creating a new Team in MS Teams
  • From end users – Why can’t I create a new Team?

This post is for end-users, to help understand why the ability to create a new Team in MS Teams has been disabled.

A Team is (much) more than it appears

The simple reason is because of the flow-on effect (see below) and the need for IT to maintain control over the environment, especially the creation of SharePoint sites.

The diagram below, an extract of a larger diagram created by Matt Wade (credit below image), visually shows what happens when a new Team is created (and, for that matter, various other elements).

O365GroupsTeamsetc
Source: @thatmattwade / https://www.jumpto365.com/infographics/everyday-guide-to-office-365-groups

A new Team creates a range of other things (described below) including a SharePoint site. The SharePoint site that is created is visible as the ‘Files’ tab in the Team channel, as you can see below:
image.png

A Team is directly linked with an Office 365 Group

The thing that links all these things together is what are called ‘Office 365 Groups’ (O365 Groups).

O365 Groups only exist in Office 365 and are like a cross between: (a) an Active Directory (AD) Security Group (that controls/grants access to IT resources and systems) and (b) usually small Distribution Lists (a list of people you can email) – but with a lot more functionality.

What do you get with every Office 365 Group?

As can be seen in the diagram above, every O365 Group creates a number of other Office 365 elements. Each Group:

  • Has at least one owner. This is the person who creates the Group, and becomes the linked SharePoint site owner and the owner of the Team. If there is only one owner, then the owner leaves, there is no-one to manage the group, SharePoint site and Team members. This is one good reason why this should be centralised in IT (who usually create all other AD group types).
  • Has members. Members usually belong to a logical and generally smaller (<30 people) business unit or work team, similar to membership of an AD Security Group. Membership of the Group (and Team and SharePoint site) is managed by the Owner.
  • Has a dedicated SharePoint site. The URL of the site is the same as the Group. The members of the Group have default add/edit rights to the SharePoint site. Others, and AD Security Groups, can also be added to the SharePoint site directly (for example, as visitors) but that only gives them access to the site, NOT the Team or the mailbox.
  • Has an email address/mailbox. The mailbox for the Group appears in the Outlook of every member of the group. You can send and receive mails to/from that Group (similar to a Distribution List).
  • Has a Planner and a OneNote notebook.
  • Can be linked to a Team in MS Teams when the Group is created.

What happens if you allow end-users to create Teams?

Conversely, if you create a Team in MS Teams, it creates everything in the previous dot points but with no controls for:

  • Office 365 Group/Team naming. End-users can create a Team with whatever name they want, which then assigns the same name to the Office 365 Group and SharePoint site.
  • Group membership. The person who creates the Team becomes the Owner of the O365 Group and is responsible for managing the Group/Team membership.
  • SharePoint site structure including document library/ies and folders. If the Team uses only the default ‘Documents’ library, it is very likely to create multiple folders, including via File Explorer. The likely outcome is the mess that is often found on network file shares.
  • Everything else that comes with every Team, including Planner and OneNote.

Some organisations have allowed their employee to create new Teams in MS Teams and then had to retrospectively clean up the mess created by random SharePoint sites, poor Team names, confusion between O365 Group members and AD Security Group membership and quite a bit more.

Should we even use Teams?

Yes. Read this post from CMSWire titled ‘The State of Play with MS Teams‘ to see why it is a very useful application to implement. Three points from that article:

  • Chat is the most used function in Teams, making up 70% to 95% of all messages. Chat has 13 times the number of messages than Teams channels. Chat is being used to keep local teams connected in real time.
  • Staff, on average, are members of three teams but are mostly active in one. While most employees have a “favored” team, Teams operating as forums or communities were identified to help employees engage beyond their local team.
  • The most active team has 25 members, all active and connected to each other, interacting at the rate of 365 channel interactions/per day or 14 interactions/per member/per day. This does not include chat.

Note that the most active team has 25 members. This underlines the point made earlier that Office 365 Groups work best when there are fewer than 30 members.

Where is the data stored?

Finally, where is the data stored?

  • One-to-one chats:
    • Chats are stored in a hidden folder in the participant’s email mailboxes.
    • Documents are stored in the OneDrive of participants.
  • Chats in the Team channels
    • Chats are stored in a hidden folder in the Office 365 Group’s mailbox.
    • Documents stored in these channels are stored in the O365 Group’s linked SharePoint site.

Should we use Teams?

Yes, definitely, but understand what is happening ‘under the hood’ if you allow end-users to create new Teams.

Organisations that are new to Office 365 should consider disabling the ability for end-users to create Teams by disabling the ability for end-users to create Office 365 Groups.

Smaller organisations can leave the option available but ensure that there is a guide for the creation of new Teams, including naming conventions and Group/Team membership management.

It will generally be better to centralise the creation of MS Teams in IT as they will normally be responsible for the creation of Active Directory Security Groups and should therefore be responsible for the creation of the more powerful Office 365 Groups.

Posted in Compliance, Conservation and preservation, Electronic records, Governance, Information Management, Information Security, Legal, Records management, Retention and disposal, Security

Destroying digital records – are they really destroyed?

Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.

Traditional legacy electronic document and records management (EDRM) systems have two components:

  • A database (e.g., SQL, Oracle) where the metadata about the records are stored
  • A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.

In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.

When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?

This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.

The Windows NTFS Master File Table (MFT)

Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).

In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:

attriblist

As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.

  • The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size.  This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
  • The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.

If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.

When a file is deleted, the MFT records the deletion.

  • If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
  • If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.

So, in summary, ‘deleting’ a file does not actually delete it. It may either:

  • Store the file in the Recycle Bin, making it relatively easy to recover, or
  • Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.

How does an EDRMS store and manage files?

The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.

Most EDRM systems are made up of two parts:

  • A database (SQL, Oracle etc) to store the metadata about the record.
  • An attached file store that stores the actual digital objects.

When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.

When digital records are uploaded to the EDRMS:

  • The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
  • Additional metadata may be added, including a content or record ‘type’.
  • The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
  • The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
  • A link is made between the database record and the record object stored in the file store (the MFT record).

When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.

What happens when an EDRMS record is destroyed?

If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.

For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.

For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files. 

Also:

  • If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
  • There is likely to be a backup of the EDRMS.
  • The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
  • Digital forensics can be used to recover ‘deleted’ files from the associated file share.

Consider this scenario:

  • An email containing evidence of something is saved to a container in an EDRMS.
  • The container of records is ‘destroyed’ after the retention period expires.
  • A legal case arises after the container is ‘destroyed’
  • A subpoena is made for all records, including those specific records.
  • Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?

Is it really possible to destroy digital records, and does it matter?

Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.

But:

  • Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
  • Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
  • Might it be possible to recover the record with digital forensics tools?
  • Does it matter?

It might be worth asking IT and your EDRMS vendor.

References: