Posted in Compliance, Conservation and preservation, Electronic records, Governance, Information Management, Information Security, Legal, Records management, Retention and disposal, Security

Destroying digital records – are they really destroyed?

Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.

Traditional legacy electronic document and records management (EDRM) systems have two components:

  • A database (e.g., SQL, Oracle) where the metadata about the records are stored
  • A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.

In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.

When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?

This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.

The Windows NTFS Master File Table (MFT)

Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).

In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:

attriblist

As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.

  • The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size.  This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
  • The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.

If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.

When a file is deleted, the MFT records the deletion.

  • If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
  • If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.

So, in summary, ‘deleting’ a file does not actually delete it. It may either:

  • Store the file in the Recycle Bin, making it relatively easy to recover, or
  • Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.

How does an EDRMS store and manage files?

The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.

Most EDRM systems are made up of two parts:

  • A database (SQL, Oracle etc) to store the metadata about the record.
  • An attached file store that stores the actual digital objects.

When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.

When digital records are uploaded to the EDRMS:

  • The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
  • Additional metadata may be added, including a content or record ‘type’.
  • The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
  • The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
  • A link is made between the database record and the record object stored in the file store (the MFT record).

When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.

What happens when an EDRMS record is destroyed?

If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.

For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.

For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files. 

Also:

  • If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
  • There is likely to be a backup of the EDRMS.
  • The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
  • Digital forensics can be used to recover ‘deleted’ files from the associated file share.

Consider this scenario:

  • An email containing evidence of something is saved to a container in an EDRMS.
  • The container of records is ‘destroyed’ after the retention period expires.
  • A legal case arises after the container is ‘destroyed’
  • A subpoena is made for all records, including those specific records.
  • Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?

Is it really possible to destroy digital records, and does it matter?

Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.

But:

  • Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
  • Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
  • Might it be possible to recover the record with digital forensics tools?
  • Does it matter?

It might be worth asking IT and your EDRMS vendor.

References:

 

 

Posted in Disasters, Electronic records, Information Management, Information Security, Legal, Office 365, Records management, Retention and disposal, SharePoint Online, Training and education

Why is it so hard to ‘go digital’?

I visited a local fast-food outlet recently and could not help but notice the ‘Lever Arch’ binders in the small office behind the counter. A small two-drawer filing cabinet was also located below the desk.

20191002_125518

It made me wonder – in this day and age when pretty much everyone has access to the internet including via their smart phone, why are there any paper records?

And, why is it so hard to ‘go digital’, when so many better and safer digital options are available?

Reasons for not going digital

People probably want to keep paper records in this digital age for a few fairly common reasons, all of which I’ve encountered over the years.

  • Ease of access. It is much ‘easier’ to access a record if it’s in the folder with an obvious name, like ‘Rosters’.
  • Speed of access. You can access a paper record in a couple of seconds. Accessing the same record on a computer means logging on then searching or navigating to where it is stored (potentially including on personal removable storage devices).
  • Easier to archive. At the end of a given period the records can ‘simply’ be placed in an archive box and sent off for archiving.
  • Keeping digital records is too ‘hard’.
  • The company doesn’t offer any other option.
  • ‘Computers are hard’.
  • No obvious or pressing business reason to go digital.
  • A preference for paper, or belief that paper records must be kept.

Which of the above have you encountered? Let me know via this anonymous Form:

Or click this link:

https://forms.office.com/Pages/ResponsePage.aspx?id=DQSIkWdsW0yxEjajBLZtrQAAAAAAAAAAAAN__td1WRVUM0hJM0g2Q1NCWFdLS0JYM0k5QUlOUVUxRC4u

Keeping paper records can be risky

Keeping paper records can be all well and good, unless this sort of thing happens:

burger-king-fire-hed-2017-1260x840
Source: https://finance.yahoo.com/news/burger-king-used-photos-real-105654804.html

If you keep paper records when better digital options exist, you are taking a calculated risk that doing so is ‘OK’.

Of course, not all businesses (a) store the only copy of their physical records locally or (b) burn down (including by being constructed in fire-prone areas). However, these are not the only risks. Other risks include:

  • Flooding, from burst pipes, storms, or floodwaters. Water-damaged records are not easy to recover.
  • Damage from falling objects, including trees or other objects falling from the sky.
  • Theft or vandalism.
  • Business closure and leaving records behind in the abandoned building.
  • Any combination of the above.

What’s the back up for physical records?

What’s the back up for these paper records when disaster strikes?

Generally, unless the physical records have been transferred off-site, or they are the printed version of a digital original that can still be accessed, there isn’t one.

Is there a better, digital way?

Yes.

Printed records are likely to fall into several broad categories, each of which can be managed in their own way. For example, in the business above:

  • Policies and procedures, including ‘operating manuals’ and similar types of instructions are likely to be the printed version of digital originals. They can be made available on the company intranet or, if one doesn’t exist, sent via email.
  • Financial records (e.g., invoices). Again, these are likely to be the printed version of a digital original. If they were in printed form when received (e.g., by mail, with a delivery), the company should (a) ask for digital copies to be sent by email, or (b) scan them and store them digitally.
  • Rosters and general documents relating to groups of employees (as opposed to individual staff ‘files’). Rosters could still be printed for display purposes, but the original should be kept in digital form.
  • Staff files. The format of these may depend on the organisation, but there should be no reason for ‘local’ staff files to be kept in an organisation that has a centralised HR system.
  • Other types of business documents. If necessary, these could be scanned and kept in digital form.

And, of course, all of these could be kept in Office 365, including SharePoint for document storage and MS Teams for teams chat, including for front line workers.

Additional training and support may be required to help these areas ‘go digital’.

 

 

Posted in Classification, Compliance, Information Classification, Information Management, Office 365, Products and applications, Records management, Retention and disposal, Security

Office 365 Security and Compliance – classification label changes

Microsoft have improved the Classification section in the Office 365 Security and Compliance centre. The change will help to reduce confusion and make it easier for records managers and security administrators to focus on their individual needs.

Previous user interface

The primary change is to the menu interface. The previous menu options, shown in the screenshot below, showed only ‘Labels’ and ‘Label policies’.

O365_Classifications_Labels

When the previous ‘Labels’ option was selected, a new screen with two tabs ‘Sensitivity’ (default) and ‘Retention’ was displayed, as shown below.

O365_Classifications_Labels

The sensitivity or retention tab had to be selected to create or publish a new label. The user interface was unclear and the difference between creating and publishing a label was not obvious.

New user interface

The sensitivity and retention elements have now been separated and placed under the primary ‘Classification’ menu option as shown below.

O365_Compliance_ClassificationLabels23Aug19.JPG

Now, ‘Labels’ and ‘Label policies’ are two tabs under the relevant section as can be seen below.

 

O365_Compliance_ClassificationLabelsRetention23Aug19

O365_Compliance_ClassificationLabelsSensitivity23Aug19

The options to create and publish labels remain the same.

Posted in Information Management, Information Security, Products and applications, Records management, SharePoint 2013, SharePoint Online

Migrating to SharePoint Online – Part 1 (Planning)

We implemented SharePoint 2010 in early 2012 and then upgraded to SharePoint 2013 in early 2015. After acquiring Office 365 enterprise licences in April 2016 we began to play for the migration of our existing on-premise environment to SharePoint Online. After testing the migration process with inactive sites, we started to migrate active sites from early 2018. We expect to complete all the migrations by 31 December 2018.

This post, the first of three, outlines the factors that influenced and guided how we approached the migration. Our approach may not be the same as your approach, but many of the basic principles may be similar.

Overview of our SharePoint environment pre-migration

A key principle for our SharePoint environment since 2012 was to avoid customisation and dependencies, and use the product ‘out of the box’ (OOTB) as much as possible.

  • Customisation would almost always require some degree of development and ongoing maintenance. It also meant that upgrades could be more complex and expensive.
  • Dependencies of any sort – be they integration components or third-party add-ons – could also make upgrades more complex and expensive.

Governance model

We also implemented a ‘balanced’ controlled environment, following the technical design models for SharePoint 2010 described by Microsoft (extract in image above), which recommended that organisations strike balance across three key governance elements:

SharePoint2010GovernanceBalance

Source: https://docs.microsoft.com/en-us/previous-versions/office/sharepoint-server-2010/cc303422(v%3doffice.14)

  • IT Governance. Centrally managed or locally managed?
  • Information Management. Tightly managed or loosely managed?
  • Application Management. Strictly managed or loosely managed development?

In our environment, the ability to create new SharePoint sites and sub-sites required the completion of a (SharePoint) online form and was restricted to the SharePoint Administrators. This enabled us to prevent uncontrolled growth in the environment and to ensure that all new sites were created within a pre-defined – but not overly strict – architecture design model.

Upgrade to SharePoint 2013 in early 2015

Our SharePoint site collections were created across five web applications: team (approximately 120 sites), project (approx. 120 sites), publication, apps, and intranet. Most of the corporate records were stored in team or project sites, as well as a single ‘apps’ site. (Our apps sites (< 10) were set up to address small business problems that in the past might have been addressed by using Microsoft Access).

Thanks to our OOTB model, we were able to upgrade to SharePoint 2013 over a weekend, with almost no errors. The only site we could not upgrade was the intranet which remains (as at August 2018) in ‘compatibility mode’.

Note: It is not possible to migrate directly from SharePoint 2010 to SharePoint Online. It must be upgraded to SharePoint 2013 or SharePoint 2016 first.

The situation in 2016

In May 2016 we changed our Microsoft Enterprise Agreement to an Office 365 subscription model. Our reasons for going to Office 365 were driven by multiple factors, including the need for mobile access to information.

It is important to remember that SharePoint Online is only one element among many others in Office 365. That is, while it is technically possible to do it, SharePoint would not normally be migrated on its own to SharePoint Online. Any migration must take in account a range of considerations relating to the broader Office 365 environment, including (but not limited to):

  • Office 365 licences (and what this meant for our users with Office installed on existing computers which were being upgraded to new Windows 10-based devices as part of a separate project)
  • Active Directory syncing so users can access the environment.
  • Exchange mailbox migrations so SharePoint-based, email-linked Flow workflows can work.
  • OneDrive for Business, as a SharePoint service to replace ‘personal’ drives on network file shares.
  • Security controls and records retention policies, set from the Office 365 Security and Compliance admin portal, as well as audit logs in that same portal.
  • Office 365 Groups with associated SharePoint sites, Yammer groups (which can be linked with Office 365 Groups) and Microsoft Teams (which can also be linked with Office 365 Groups).
  • ‘Classic’ and modern team sites, Office 365 Group-based sites, and communication sites.
  • The SharePoint user portal.
  • The mobile app, and how sub-sites are accessed.
  • The ever-changing SharePoint Online environment in which anything described as ‘classic’ is likely to be deprecated at some point, and new features appear.

Migrating multiple web applications to one

We needed to plan our migration process, moving away from our five web applications to a new model. We new that, with the exception of our customised intranet, we would probably be able to migrate almost all of our sites relatively easily because we had always kept to the OOTB model.

Fortunately, Microsoft produced a very useful 12-page document which provided a good overview describing how it ran its own SharePoint migration, and good advice for how we might do our own migration.

SharePoint_to_the_cloud_MSpaper.JPG

Learn how Microsoft ran its own migration

We had a range of factors to take into account.

  • One of our initial decisions was not to migrate any active site until all Exchange mailboxes were migrated (and preferably, end-users had new Windows 10 devices). As it turned out, the decision to migrate mailboxes was delayed and as a result we would end up migrating most sites first.
  • We need to work out how to migrate our content as it was no longer possible to do a ‘lift and shift’. We investigated the market and made the decision to acquire a migration tool, ShareGate, to do the migrations ourselves. We would later find the same tool useful to migrate personal drives to OneDrive for Business.
  • We identified the likelihood that we would create new SharePoint Online sites in parallel with the migration of on-premise sites; this was partially because some existing on-premise sites with multiple sub-sites would be split into separate sites instead, but also because the new SharePoint was so much more versatile and would likely be popular.

The new architecture model

An important point to note is that the new SharePoint Online architecture model provided the opportunity to re-think our SharePoint model and, to some extent, clean up or leave unwanted SharePoint content behind. To quote the Microsoft site above, ‘the best migration is no migration’.

As noted above, we had five primary web applications in our SharePoint 2013 environment. These had to be migrated (or re-created, in the case of publication sites) under one of two paths (only – /teams or /sites) to one of three site option:

  • ‘Classic’ sites (the default for all team and project sites)
  • Office 365 Group-based team sites
  • Communication sites (re-created page-based content)

That is:

  • Migrated team and project sites would become classic team sites under either (a) /teams/sitename path or (b) /teams/prj_sitename path, respectively. There were some exceptions:
    • Some sites with multiple sub-sites would be split up into multiple independent sites (including using the new ‘hub’ sites).
    • A couple of team sites would become communication sites.
    • Team sites that crossed multiple organisational business areas would be created as classic team sites under the /sites/sitename path.
  • Most publication sites that used the publishing features would need to be re-created as communication sites under the /sites/sitename path. There were some exceptions:
    • Some publication sites would become team sites instead.
    • The intranet would be managed separately as, at the very least, it would need to be re-created in SharePoint Online. It could not be migrated ‘as is’.
  • Application sites would become team sites.
  • Some existing sites or sub-sites might be migrated to SharePoint sites linked to Office 365 Groups, with the naming prefix of either GRP_ or PRJ_.

The above ‘mapping’ model was an early decision that did not change.

Preparatory work

We also commenced work on the following elements of work:

  • Reviewing all existing sites to determine which sites would be migrated or discarded – see below.
  • Re-developing our SharePoint Architecture documentation for the Online version.
  • Investigating and documenting all Office 365 admin and Office 365 Security and Compliance admin configuration settings, and determining roles. This process, which required Global Admin access, included establishing records retention policies (from mid 2018) in the Security and Compliance admin portal.
  • Re-developing our existing SharePoint admin documentation for the Online version, including all the configuration settings. We included the OneDrive for Business config settings in this same document as it is a SharePoint service.
  • Understanding how the new environment worked, and would work.
  • Re-establishing our SharePoint Admin and SharePoint User Group sites in SharePoint Online.
  • We also created a range of ‘test’ sites to better understand the new environment.
  • Creating an initial schedule for the migration of sites, targeting inactive sites first.
  • Assigning the initial batches of Office 365 licences.
  • Developing a repeatable process to migrate sites using ShareGate. In our environment steps involved:
    • Identify need to migrate site
    • Register a new site request in our SharePoint Admin portal.
    • Register the task in our Jira task management system.
    • Create the SharePoint Online site (via a script linked to the request).
    • Migrate the on-premise site, make it read only with a re-direct notice on the front page (and a three month deletion notice*).
    • Prepare the migrated site, including swapping the classic default home page to a modern home page.
    • Hand over the site to the business owners and close the task

* In practice many of these sites still remained after 6 months.

As part of our review process, we identified around a dozen sites that had one or all of the following elements, that would mean we had to devote more time to their migration (‘custom workload’ in the Microsoft document above):

  • Complex workflows which would need to be re-created.
  • Integration with other systems (mostly via BizTalk).
  • Links with ETL processes.

We also identified around 50 sites that would not be migrated:

  • Sites that were unused or had no content of value (often because the original was still on a drive).
  • Sites that did not need to be migrated, for example if their content had been migrated to a different business system.
  • Test sites.

Sites that were no longer used but contained records that needed to be kept were to be migrated with the word ‘Archive’ to the end of the site URL name, assigned a site retention policy, and then made read only.

By August 2017, we had identified that 250 site collections would be migrated to SharePoint Online. We acquired ShareGate in September 2017 and were ready to start migrating.

In Part 2 of this series of posts I will describe the migration process and the lessons we learned along the way.

Posted in Classification, Compliance, Digital preservation, Electronic records, Governance, Information Management, Information Security, Office 365, Records management, Retention and disposal, SharePoint Online

Office 365 – Applying retention periods to SharePoint document libraries and disposal/disposition actions

Records retention policies are created in the Security and Compliance Admin portal, Classifications section of Office 365, as noted in my previous post of 9 March 2018 on the subject.

This post describes how these are applied to document libraries and what happens when the records reach their disposal/disposition period.

Note: In Australia we refer to the disposal of records. In the US this is called disposition.

Setting up retention policies

Organisations may have complex or quite simple records retention policies. An important point to keep in mind in Office 365 is how many policies should be displayed to the end user to choose from.

Ideally, there should be fewer than a dozen classes so they are easy to choose from (see below). There is nothing stopping you creating 100 or 500 policies, but all of them will appear in the drop down list to choose from. Microsoft say they are working on ‘grouping’ policies, so this may help to fix the issue.

For some organisations, it may be useful to distill or group retention policies down to a smaller number.

  • For example, specific retention policies for certain types of records, and one (or two) for ‘all other’ records. The key, as we will see below, is naming them so they are obvious and easy to apply.

Viewing available retention policies

Retention policies that have been created appear in the Security and Compliance Admin portal, under Classifications > Labels.

O365_Classifications_Labels

Note: Labels must be published before they become visible to end users.

When you click on Labels, you can then see all the retention policies that have been created (but not necessarily published).

The screenshot below shows just the very top policy (a test/demonstration policy with a 7 day retention period) in a list of policies.

O365_Classifications_Labels_List.png

Note: Policies can be auto-applied, provided the policy has sufficient ability to identify what records they should be applied to.

Published policies appear in the Data Governance, Dispositions section:

O365_DataGovernance_Dispositions.png

The Dispositions section displays policies that have been published and are visible to end users in the Office 365 areas selected when the policy was created (e.g., Exchange, SharePoint, OneDrive etc).

O365_DataGovernance_Dispositions_List.png

Applying the policy in a SharePoint document library

To apply the policy to a SharePoint document library, go to the document library, library settings, and you will see the option to add the retention policy: ‘Apply label to items in this list or library’.

O365_RetentionPolicy_LibrarySet1.PNG

The ‘Apply Label’ dialogue shows the option to apply the label to existing items (recommended) and a drop down which shows all the published retention policies.

O365_RetentionPolicy_LibrarySet2.PNG

In this example below, there are four policies including the test policy.

O365_RetentionPolicy_LibrarySet3

The policy now applies to all records stored in that document library.

Managing disposal/disposition

When the records reach the end of the retention period configured in the policy, the person designated to be informed about the retention will receive an email notifying them of the need to review the dispositions.

O365_Dispositions_EmailNotification.pngNote, the person (or mailbox) receiving this email MUST be assigned to the Records Management role in the Security and Compliance Admin portal, Permissions section. No-one else will see the records due for disposal otherwise (not even the Global Admins, unless they have also been delegated to that role).

The records person clicks on the link ‘Go there now’ and it opens the following section in the Office 365, Security and Compliance Admin portal, showing the documents that are pending disposition. A number of options are available to sort by Type, to search, and to filter by several options.

 

O365_Dispositions_DocListing

The following options appear if a single document is selected. Note the option to extend the retention period or apply a different label, as well as the ability to delete the item permanently.

O365_Dispositions_Doc_OneDocument

Filtering options are displayed below.

O365_DataGovernance_Dispositions_Filters

Finally, the records manager can choose all the documents in the list and complete three bulk actions as shown.

O365_DataGovernance_Dispositions_BulkActions.png

Positives and negatives

The positives of this method of disposing of documents are that all records from any location will appear in a single view that can be filtered and actions taken as required.

The negatives are that potentially thousands of documents might appear in this listing every single day making it difficult to decide what can deleted or not.

However, as it’s possible to filter by the retention policy, that at least should make it relatively easy to identify what can be destroyed. The more fine-grained the policies, the fewer records should appear.

Organisations that have function-based disposal classes should find that all records relating to the same function appear for disposal under that function.

Another potential negative is that records may not always appear in the same context, whether it be subject- or function-based. For example, a collection of documents (often known as a ‘file’) may not appear in the disposition listing as a collection but as a set of records that are only connected by the disposal policy name. Does this matter?

Recording disposal actions

A key requirement for most organisations is keeping a record of what was destroyed.

At the moment the only apparent option to do this is to apply filters and export the list, using the handy ‘Export’ option to keep a record of what was destroyed. That csv file can then be stored in a control library to ensure a record is kept. This type of action requires a degree of control to ensure it happens every time.

It may also be possible to identify what was destroyed – and by whom – in the audit logs. This is being investigated.