Posted in Records management, Electronic records, Digital preservation, XML, Retention and disposal, Conservation and preservation, Exchange 2010, Information Management, Exchange 2013, Access controls, Exchange Online

The enduring problem of emails as records

Ever since emails first appeared as a way to communicate more than 30 years ago they have been a problem for records management, for two main reasons.

  • Emails (and attachments) are created and captured in a separate (email) system, and are stored in mailboxes that are inaccessible to records managers (a bit like ‘personal’ drives).
  • The only way to manage them in the context of other records was/is to print and file or copy them to a separate recordkeeping system, leaving the originals in place.

Thirty-plus years of email has left a trail of mostly inaccessible digital debris. An unknown volume of records remains locked away in ‘personal’ and archived mailboxes. Often, the only way to find these records is via legal eDiscovery, but even that can be limited in terms of how back you can go.

Options for the preservation of legacy emails

The Council on Information and Library Resources (CLIR) published a detailed report in August 2018 titled ‘The Future of Email Archives: A Report from the Task Force on Technical Approaches to Email Archives‘.

The report noted (from page 58) three common approaches to the preservation of legacy emails:

  • Bit-Level Preservation
  • Migration (to MBOX, EML or even XML)
  • Emulation

In a follow up article, the Australian IDM magazine published an article in March 2020 by one of the CLIR report authors (Chris Prom). The article, titled ‘The Future of Past Email is PDF‘, suggested that PDF may be (or become) a more suitable long-term solution for preservation of legacy emails.

Preservation is one thing, what about access

There is little point in preserving important records if they cannot be accessed. The two must go together. In fact, preservation without the ability access a record is not a long different from destruction through negligence.

Assuming emails can be migrated to a long-term and accessible format, what then?

No-one (except possible well-funded archival institutions perhaps) is seriously likely to attempt to move or copy individual legacy emails to pre-defined and pre-existing containers or aggregations of other records. This would be like printing individual emails and storing them in the same paper file or box that other records on the same subject are stored.

Access to legacy emails in an digitally accessible, metadata-rich format like PDF provides a range of potential opportunities to ‘harvest’ and make use of the content, including through machine learning and artificial intelligence.

These options have been available for close to twenty years in the eDiscovery world, but to support specific legal requirements.

Search, discovery and retention/disposal tools available in the Microsoft 365 Compliance portal, along with the underlying Graph and AI tools (including SharePoint Syntex) provide the potential to manage legacy content, including emails.

The starting point is migrating all those old legacy emails to an accessible format.

Posted in Digital preservation, Digitisation, Electronic records, Information Management, Records management, Retention and disposal

Why is ‘going digital’ a problem for records management?

Few organisations create original records on paper any more. Almost all the paper records that are created these days are the printed versions of born-digital records.

In a somewhat ironic twist, many organisations seek to digitise (or ‘scan’) the printed versions of born-digital records.

And yet, there apparently continues to be an ongoing problem in many organisations (particularly government organisations) about ‘going digital’.

Why is ‘going digital’ so hard for records management?

On one hand, allowing people to even print and store the printed version of digital records on paper files helps to perpetuate the problem of going digital.

On the other hand, many older style recordkeeping systems require content to be copied from one system where they have been created, captured or stored, to another. The requirement to copy a record (if it is not automated) requires a conscious, voluntary (and selective) action on the part of the end-user. It does not guarantee that the copied record is the final version of a document or, in the case of email, that there is no additional replies in a thread. And, the original remains in the originating system.

Additionally, some types of records cannot easily be copied to a centralised recordkeeping system. Examples include Twitter tweets, Facebook content, instant messaging texts, chat, video, and conferencing text, audio and video. And even when they can, there is no certainty that the version saved is the most recent.

The elephant in the room – digital recordkeeping has not evolved

In my opinion, the primary reason why digital records continued to be printed, and why organistions find it hard to ‘go digital’, is because many recordkeeping systems and practices have not evolved with the digital world.

Instead, they remain based on the idea that all records should be stored, with added metadata, in a central recordkeeping system. Anything that does not fit this model, and any system that doesn’t meet all the standards for keeping records in this way, is regarded as ‘non-compliant’.

Vendors of these traditional, centralised recordkeeping systems highlight how these systems meet recordkeeping compliance requirements, which in turn further cements these systems as being the only way compliance requirements can be met. These systems increasingly are unable to capture the full range of digital content and consequently, ‘going digital’ becomes a problem – because the system isn’t working.

It’s a vicious cycle.

How paper recordkeeping turned into digital recordkeeping

Until the early 1990s, paper files (and boxes) were the really the only way we had to store records. During the late 1980s and 1990s, many organisations acquired databases to keep track of these paper files and the boxes in which they were stored.

At the beginning of the digital world, in the early to mid 1990s, in the absence of any other method, digital records were usually printed and placed on the same paper files.

By the end of the 1990s, the databases that were originally used to keep track of paper files were adapted to manage digital records in digital ‘files’ (folders, containers).

But the opportunity was missed to evolve the paper recordkeeping paradigm into something more suitable for digital records.

Additionally, none of the leading software manufacturers, Microsoft in particular, did anything to incorporate recordkeeping in their various systems and applications.

Recordkeeping systems used to manage digital content, retained the same ‘filing’ concept where end-users, after receiving suitable training, had to (voluntarily) copy the digital record (including emails) to the digital ‘file’, leaving the original in place.

The idea of a central recordkeeping system, to where all records are to be copied, makes almost no sense in the digital world. For almost twenty years, it has overlooked or even ignored the ever-increasing volume and types of digital records and persisted with a centralised model.

In my opinion, the problem of ‘going digital’ for many organisations has been directly related to the fact that recordkeeping systems have not evolved from the original centralised model.

It doesn’t make sense in a digital world.

Fixing the problem

In my opinion, one of the key problems is not so much that many older style recordkeeping systems are based around a paper recordkeeping paradigm (because this paradigm can still be valid, especially for high value or archival records), but that organisations think they should manage all records according to the same paradigm, or otherwise they will not somehow ‘comply’ (especially with government recordkeeping requirements).

In reality:

  • Records may have different ‘value’. There are a lot of low-quality, low-value records.
  • Some records can go from being innocuous (‘OK’ in an email reply) to being critical very quickly (when the ‘OK’ becomes evidence of fraud).
  • Not all records need to have complex recordkeeping metadata. In fact, most digital records already have extensive metadata payloads.
  • Emails will continue to remain separate from other records.
  • Only a small percentage of records need to be kept for a long time.
  • Digital records can be categorised or classified in multiple ways over time. Pre-defined classification applied to a digital record may not accurately capture the full context (or potential context) of a record and may even impede it.
  • Digital records may remain active even after they are captured, including as new versions, new replies in a thread, modified images and so on.
  • When managed well, digital records can be managed and accessed in place, in the system in which they were created or captured.
  • Digital records that need special attention, including records that require long-term storage, can still be managed in ‘files’ or ‘containers’, but this needs to be implemented in a way that is simple for end-users to understand.

Organisations should, in my opinion:

  • Embrace digital recordkeeping.
  • Abandon the idea that all records must be copied to a central recordkeeping system.
  • Accept that any system can contain records – including line of business systems that also capture documents as records – and focus on how to manage the records in those systems.
  • Use the recordkeeping capability of the systems where records are created or captured.
  • Focus most effort on records of high value, or records that need to be kept for a long time including for archival purposes.
  • Let end-users create and work with born-digital records where they are created or captured, without the additional overhead of having to copy these to another system.
  • Implement high-level architecture models and monitor where information is being stored.
  • Use a combination of global retention policies and auto-classification to protect the integrity, reliability and authenticity of records.
  • Use search and discovery to find content, wherever it is stored, whenever it is required.
Posted in Digital preservation, Disasters, Electronic records, Governance, Information Management, OneDrive for Business, Records management, Retention and disposal

Managing the retention of content stored in OneDrive for Business accounts

The methods available to manage the retention of content stored by end-users in their Office 365 OneDrive for Business (ODfB) accounts are not always well understood.

Organisations may initially default (in their thinking) to backing up the content because that’s what was always done in the past. A change of thinking may be required.

This post:

  • Explains some of the key differences between ‘home drives’ and ODfB accounts.
  • Highlights the need for organisations to understand their business requirements for retention of ‘personal’ content, and not assume traditional backup methods are the only option.
  • Also highlights the need for organisations to understand the potential risks (and potentially unnecessary additional costs) associated with backing up Office 365 content.
  • Describes two simple options for the retention of content stored in ODfB accounts.
  • Suggests that organisations can probably use a combination of a single Office 365 retention policy and a change to the storage retention period for inactive accounts, instead of backups to achieve the same outcome.

What are ‘home drives’?

In many organisations, home drives are usually a dedicated area on a network file share designed to allow end-users to store ‘working’ documents and ‘personal’ content.

Using the network file shares for home drives ensures that the content stored in them is backed up as part of standard disaster recovery processes while the user is still active (for disaster recovery and to recover deleted items) and still accessible (as an ‘archive’) after they leave the organisation.

In some organisations, home drives may instead be an area on the user’s computer (C drive). Any content stored on local computers is not backed up.

Generally speaking, home drives – whether in the NFS or on the user’s computer, are not accessible once the end-user leaves the office. This has given rise to the fairly regular use of USB storage devices or uncontrolled, internet-based, file storage systems such as DropBox.

How is ODfB different from home drives?

In organisations that implement Office 365, ODfB is the replacement for ‘home’ or ‘personal’ drives.

Although they offer similar functionality for end-users (in terms of the ability to access the content from File Explorer), ODfB accounts are fundamentally different in several ways.

  • The content can be accessed on almost any device. No VPN is required.
  • With Windows 10 devices, the content is synced to and can be accessed via File Explorer. This makes ODfB an almost identical replacement for existing home drives in terms of look and feel, and functionality (plus even more functionality, such as the ability to share directly).
  • There is no accessible back up – Microsoft is entirely responsible for disaster recovery. If organisations want to back up ODfB accounts from Office 365, they will need to acquire a third-party product. The ability to establish retention for the content (last two dot points below) may make the need for back up redundant.
  • There is a 90 day Recycle Bin accessible via the browser-based interface. This allows end-users to restore the content they deleted themselves within that time-frame.
  • Organisations can set a storage retention period that will apply once the end-user leaves and their account is deactivated.
  • Organisations can also set a retention policy that will prevent the deletion of content while the user remains active.

Both the last two options are the subject of this post.

Access to and retention of home drives vs ODfB accounts

In many organisations, the content stored by end-users in their home drives is considered to be ‘private’ to them, despite the system being owned by the organisation.

While they can be accessed easily by network administrators with elevated privileges, it is not uncommon (often for audit purposes) for IT to have to seek special approval from someone senior to access the content of a home drive either while the end-user is still employed or after they have left. In these cases, IT will either access the active drive or request the back up tape to restore the content. 

The content in home drives, when backed up, remains as long as the backup media is accessible.

In Office 365, Global Administrators can access the ODfB accounts of any active user. They do this by going to the Office 365 Admin portal and, under the ‘Users’ section, clicking the end-user account name and then going to the ‘OneDrive’ tab where the option to ‘Get access to files’ is displayed’. Any access to ODfB accounts, by anyone (including Global Admins) is recorded in the audit logs.

[Note: At at January 2020, the old ‘My Sites’ options in SharePoint still exists. These options allow the Global Admins or SharePoint Admins to assign someone, or a Security Group, as a Secondary Admin for all ODfB accounts. This option is largely redundant because Global Admins can access the content anyway.]

The default retention period for ODfB content is 30 days after the end user’s account is disabled.

What exactly are you trying to achieve?

As noted, there are some fundamental differences between ‘home drives’ and ODfB.

Consequently, organisations ideally should re-examine their business requirements for access to and the retention of ‘personal content’ both while the user account is active and when it is made inactive, and not assume that old backup option remain valid.

For example, consider the use of backup tapes:

  • The primary purpose of backup tapes is to support disaster recovery. These made sense when IT owned the servers, but it makes less sense when Microsoft own them and are responsible for disaster recovery. Is Microsoft’s disaster recovery capability sufficient or suitable?
  • Backup tapes were (and still are) often used as a type of ‘archive’, allowing organisations to recover data from active and inactive home drives for an indefinite period of time.

The bottom line is – what business outcome/s do you want? Generally, these are likely to be:

  • The ability to recover content stored on personal drives after a disaster (not just when the end-user has deleted something).
  • The ability to access and retain content while the user is active or after they become inactive.

An additional business requirement might be to reduce the use of ‘home drives’ for business related content.

Retention options for content stored in ODfB

ODfB ships with two default retention options:

  • Recycle Bin. Any ODfB content deleted by an end-user goes to the Recycle Bin for 90 days.
  • Inactive content retention. When an end-user accounts is deactivated, the content remains accessible for a default period of 30 days.

Neither of these two options on their own, without modification, is likely to meet business requirements to achieve some form of back-up equivalent capability and the ability to access content in ODfB for a period of time.

It is likely that most business requirements (to replace backups) will be met instead via a combination of the following:

  • Creating a single Office 365 retention policy applied to all ODfB accounts that prevents content in those accounts from being deleted for a given period of time.
  • Extending the default retention period for the content in deactivated accounts from 30 days to a much longer period, for example 7 years.

Office 365 Retention Policy

To ensure that content is kept (and accessible, even after being ‘deleted’ by the user) while the user is active, and after they leave, (a) create a single Retention Policy in the Office 365 Compliance portal, ‘Information Governance’ section and (b) apply it to all ODfB accounts by choosing ‘https://tenantname-mysharepoint.com’.

ODfBRetentionPolicy.JPG

Once published, the retention policy creates a ‘Preservation Hold library’, visible only to the Global Admins, that stores any content that is modified or deleted by the end-user during the retention period.

At the end of the retention period, the content in the Preservation Hold library and anything else that has reached the end of the retention period is sent to the Recycle Bin where it is kept for 90 days before being permanently deleted.

ODfBPresHoldLib.JPG

This type of retention policy effectively replaces the need for a back up of home drives, provided the organisation:

  • Accepts the risk that Microsoft may not be able to recover all or some of the content in the case of a disaster. Note that this risk also applies to Exchange, SharePoint and MS Teams content.
  • Understands that, if it decides to attempt to back up ODfB, restoring from back up may not be as simple as it used to be when the organisation owned and managed the relevant servers. What, exactly, will you back up to, and how will you read the data?

ODfB Storage Retention

The second retention option relates to the ODfB accounts of departed users, or inactive accounts.

ODfB includes the option to retain files in ODfB for a specific period of time after the end-user account is deactivated. This is set in the ODfB Admin portal under ‘Storage’.

ODfBStorage.JPG

At the end of the period of time specified, the content is sent to the Recycle Bin after which it is deleted permanently.

Summary

Many organisations are likely to approach the retention of ODfB content in the same way they did for home drive content, by considering backup options first, often ‘because that’s what we’ve always done’.

Organisations implementing Office 365 should:

  • Define their business requirements for the retention of home drive/ODfB content
  • Examine, understand and consider if retention options in Office 365 result in the same outcome
  • Understand the potential risks of relying on Microsoft to provide a reliable service including in a disaster situation
  • Understand the complexity (and risks) of backing up (and recovering) content from Office 365.

In many cases, retention options in Office 365 may provide the required outcome at a much lower cost.

 

 

 

 

 

Posted in Classification, Compliance, Digital preservation, Electronic records, Governance, Information Management, Information Security, Office 365, Records management, Retention and disposal, SharePoint Online

Office 365 – Applying retention periods to SharePoint document libraries and disposal/disposition actions

Records retention policies are created in the Security and Compliance Admin portal, Classifications section of Office 365, as noted in my previous post of 9 March 2018 on the subject.

This post describes how these are applied to document libraries and what happens when the records reach their disposal/disposition period.

Note: In Australia we refer to the disposal of records. In the US this is called disposition.

Setting up retention policies

Organisations may have complex or quite simple records retention policies. An important point to keep in mind in Office 365 is how many policies should be displayed to the end user to choose from.

Ideally, there should be fewer than a dozen classes so they are easy to choose from (see below). There is nothing stopping you creating 100 or 500 policies, but all of them will appear in the drop down list to choose from. Microsoft say they are working on ‘grouping’ policies, so this may help to fix the issue.

For some organisations, it may be useful to distill or group retention policies down to a smaller number.

  • For example, specific retention policies for certain types of records, and one (or two) for ‘all other’ records. The key, as we will see below, is naming them so they are obvious and easy to apply.

Viewing available retention policies

Retention policies that have been created appear in the Security and Compliance Admin portal, under Classifications > Labels.

O365_Classifications_Labels

Note: Labels must be published before they become visible to end users.

When you click on Labels, you can then see all the retention policies that have been created (but not necessarily published).

The screenshot below shows just the very top policy (a test/demonstration policy with a 7 day retention period) in a list of policies.

O365_Classifications_Labels_List.png

Note: Policies can be auto-applied, provided the policy has sufficient ability to identify what records they should be applied to.

Published policies appear in the Data Governance, Dispositions section:

O365_DataGovernance_Dispositions.png

The Dispositions section displays policies that have been published and are visible to end users in the Office 365 areas selected when the policy was created (e.g., Exchange, SharePoint, OneDrive etc).

O365_DataGovernance_Dispositions_List.png

Applying the policy in a SharePoint document library

To apply the policy to a SharePoint document library, go to the document library, library settings, and you will see the option to add the retention policy: ‘Apply label to items in this list or library’.

O365_RetentionPolicy_LibrarySet1.PNG

The ‘Apply Label’ dialogue shows the option to apply the label to existing items (recommended) and a drop down which shows all the published retention policies.

O365_RetentionPolicy_LibrarySet2.PNG

In this example below, there are four policies including the test policy.

O365_RetentionPolicy_LibrarySet3

The policy now applies to all records stored in that document library.

Managing disposal/disposition

When the records reach the end of the retention period configured in the policy, the person designated to be informed about the retention will receive an email notifying them of the need to review the dispositions.

O365_Dispositions_EmailNotification.pngNote, the person (or mailbox) receiving this email MUST be assigned to the Records Management role in the Security and Compliance Admin portal, Permissions section. No-one else will see the records due for disposal otherwise (not even the Global Admins, unless they have also been delegated to that role).

The records person clicks on the link ‘Go there now’ and it opens the following section in the Office 365, Security and Compliance Admin portal, showing the documents that are pending disposition. A number of options are available to sort by Type, to search, and to filter by several options.

 

O365_Dispositions_DocListing

The following options appear if a single document is selected. Note the option to extend the retention period or apply a different label, as well as the ability to delete the item permanently.

O365_Dispositions_Doc_OneDocument

Filtering options are displayed below.

O365_DataGovernance_Dispositions_Filters

Finally, the records manager can choose all the documents in the list and complete three bulk actions as shown.

O365_DataGovernance_Dispositions_BulkActions.png

Positives and negatives

The positives of this method of disposing of documents are that all records from any location will appear in a single view that can be filtered and actions taken as required.

The negatives are that potentially thousands of documents might appear in this listing every single day making it difficult to decide what can deleted or not.

However, as it’s possible to filter by the retention policy, that at least should make it relatively easy to identify what can be destroyed. The more fine-grained the policies, the fewer records should appear.

Organisations that have function-based disposal classes should find that all records relating to the same function appear for disposal under that function.

Another potential negative is that records may not always appear in the same context, whether it be subject- or function-based. For example, a collection of documents (often known as a ‘file’) may not appear in the disposition listing as a collection but as a set of records that are only connected by the disposal policy name. Does this matter?

Recording disposal actions

A key requirement for most organisations is keeping a record of what was destroyed.

At the moment the only apparent option to do this is to apply filters and export the list, using the handy ‘Export’ option to keep a record of what was destroyed. That csv file can then be stored in a control library to ensure a record is kept. This type of action requires a degree of control to ensure it happens every time.

It may also be possible to identify what was destroyed – and by whom – in the audit logs. This is being investigated.

 

Posted in Digital preservation, Electronic records, Legal, Products and applications, Records management, Retention and disposal, SharePoint 2013

SharePoint 2013 Site Disposal Policies

SharePoint 2013 includes the option to set a disposal date on site collections. This article describes how to configure a SharePoint 2013 site collection to include a site disposal policy.

Default settings

A site cannot be deleted (either manually or automatically) unless a Site Policy has been set up (exception – the SharePoint Administrator has permissions to do this).

Without a Site Policy, the default settings under the Site Closure and Deletion option (see below) are as follows:

  • Site Closure – ‘Close this site now’ click box default: greyed out.
  • Site Deletion – ‘This site will be deleted on:’ Default: ‘Never’.
  • Site Policy – Default:  ‘No Site Policy’.

Setting up a Site Policy

New site policies are created under Site SettingsSite Collection AdministrationSite Policies. Once created, the policy is applied under Site SettingsSite AdministrationSite Closure and Deletion. While you can create multiple policies, only one policy can be selected at a time under the Site Closure and Deletion option.

There are no default policies; the first time Site Policies is opened, the Site Policies section provides only one option – ‘Create’. Each policy must have a Name and may have a Description. The name and description can be the class description from a records retention schedule, using ‘after date created’ or ‘after date closed’ as the triggers (see below).

Site Closure and Deletion options

There are three options under Site Closure and Deletion:

  • Do not close or delete site automatically. The default option.
  • Delete sites automatically. This option deletes a site on a pre-defined date after it was created or closed.
  • Close and delete sites automatically. This option first closes the site and then deletes it on pre-defined dates.

In addition there is a check box ‘Site Collection Closure’ that allows the site collection to be made read only when it is closed.

Delete sites automatically

When this option is selected the following appears:

  • Set Deletion Event. The two options provided are ‘Site closed date’ and ‘Site created date’, plus n days, months, or years.
  • (Check box) ‘Send an email notification to site owners this far in advance of deletion:’ (i.e., to warn them of the pending deletion) – n days, months or years. Default setting is 3 months.
  • (Check box) ‘Send follow-up notifications every:’ (i.e., to remind site owners of the pending deletion) – n days, months, or years. Default setting is 14 days.
  • (Check box) ‘Owners can postpone imminent deletion for:’ (i.e., to postpone the proposed deletion) – n days, months or years. Default setting is 1 month.

Close and delete sites automatically

This option is identical to Delete Sites Automatically except that it also includes a date when the site can be closed – after which a deletion event date is set followed by the same three options above.

Site Closure and Deletion

As noted above, a Site Policy must exist before a site can be closed and deleted using these options. The Site Policy must be selected otherwise the default options (see above) apply.

  • If the Site Policy is based on the Delete Sites Automatically option, the option to ‘Close this site now’ becomes available. If the option ‘Site Closed Date’ was selected, the site will not be deleted (at the pre-defined time) until this option is selected. If the option ‘Site Created Date’ was selected there is no requirement to ‘manually’ close the site.
  • If the policy is based on the Close and Delete Sites Automatically option, the option to ‘Close this site now’ becomes available. This allows the site to be closed earlier, otherwise the deletion date will be automatically calculated from the site policy setting and displayed next to the Site Closure and the Site Deletion options.
  • If no policy is selected, the default settings will apply; this means that the site cannot be closed.

Further reading

Overview of site policies in SharePoint 2013 (Microsoft).

Posted in Digital preservation, Electronic records, Exchange 2010, Governance, Information Management, Legal, Products and applications, Records management, Retention and disposal

Applying recordkeeping policies to email – Microsoft Messaging Records Management (MRM)

The problem

The problem of managing emails as records is summed up in the following statements:

“Many organizations have yet to define an email retention policy. More than one‐quarter of organizations have not yet established any sort of email retention policy despite the fact that there are a growing body of statutory requirements and legal obligations to preserve business records, including those stored in email. Among the nearly three‐quarters of organizations that have established an email retention policy, only two‐thirds of these organizations indicate that their users are fully aware of the policy.” Michael Osterman, “Messaging Archiving and Document Management Markets Trends, 2009-20112”, dated May 2009.

‘Over 40 years after the invention of email, relatively few institutions have developed policies, implementation strategies, procedures, tools and services that support the longterm preservation of records generated via this transformative communication mechanism.’  Christopher J Prom, ‘Preserving Email’, DPC Technology Watch Report 11-01 Decemer 2011. www.dpconline.org/component/docman/doc_download/739-dpctw11-01.pdf

Storing business records in context

Traditional records management theory recommends that there should be a clear relationship between records about a particular subject or issue, regardless of format, and the business context that originated it. (AS ISO 15489-2002: 9.3 Records Capture)

In the paper world, this was achieved by the co-location of related records in a physical file.

In the electronic world, this is usually achieved through the application of metadata. Business classification and naming systems applied to electronic folders generally achieve this; as well, electronic systems also allow for a range of cross-subject metadata that allows records to be organised in different contexts.

Additional, business context-specific metadata can be applied to emails (including from integrated business applications – for example, an email saved to TRIM will show the TRIM record number in its email metadata properties).  However, this ability (as with Properties in Office documents) is rarely enabled or used.

Instead, and as with Office documents, we tend to let users ‘categorise’ their emails (and documents on network shares) through folders – although not all users do this.  (Interestingly, online email systems like Google’s gmail use tags instead of folders).

Are emails documents?

The short answer is yes (in the Australian legal evidence context), but they are documents that, in a way like xml-based Office documents like docx, are made up of structured data that displays as a single ‘document’.

Part of the problem with emails as records is the perception (on the part of users who have never had to face court) that they are not documents, but messages.  The ability to use the system to send or receive ‘private’ messages exacerbates this perception.

The problem of storing emails as records

Emails have been a constant problem and challenge for records managers and recordkeeping since they first appeared in the early 1990s.

The three main approaches to keeping emails have been to (a) print to paper, (b) save to a recordkeeping system, and (c) save to a drive.

Print to paper, while relatively common in many organisations even now, is probably the poorest (and some might say ‘silliest’) option in the digital world as (a) it is dependent on users, (b) emails usually lose their message headers, (c) emails are unsearchable in their electronic form, (d) emails remain on the Exchange system and are discoverable.

Saving emails to a recordkeeping system, while better than printing, is an inadequate option because (a) it is usually dependent on users to do it, (b) the email still remains in the Exchange system, and (c) it can sometimes result in the email being saved in a different format that is not necessarily suitable for long-term preservation (e.g., TRIM’s .vmbx).  There is also the problem of users saving ‘dumb’ emails with (valuable) attachments, which can make the attachment harder to find, identify or access.  Some systems (such as SharePoint 2010) include email-enabled storage locations.

Chris Prom, in a blog posting titled ‘Practical E-Records ‘Facilitating the Generation of Archives in the Facebook Age’, notes that:

‘…the formal recordkeeping systems previously used by many organizations for electronic records have died or have one foot firmly in the grave.  At the same time, the habits that individuals use in producing, consuming, storing, filing, searching, and interpreting records are themselves undergoing constant change.  People adopt new communication technologies at an ever-quickening pace.   Divergent personal practices, rather than the centralized electronic systems, are the harsh reality that confronts our profession’.

Saving to a drive is also a poor option, and is usually based on user preferences to want to ‘keep’ emails.  Emails saved to drives (a) will still remain in the Exchange system, (b) may lose their header information, and (c) are not necessarily saved in appropriate or accessible formats.

In relation to the last point, Outlook does not make it easy for an end user to decide, with usually five options to choose from – which is the right one?  Users will usually choose whatever is the default (.msg), but this isn’t necessarily the best long term option (which is MIME or EML – the latter described by the National Archives of Australia (NAA) as ‘an acceptable open file format for long term storage).

In all cases, keeping these emails in the business context to which they relate has been a constant problem for records managers.  As a consequence, there is a tendency on the part of almost all businesses to leave and manage emails where they are (i.e., in Exchange).

Microsoft Exchange 2010 – Messaging Records Management

To try to address this problem, Microsoft introduced ‘Mailbox Manager Policies’ in Exchange Server 2003.

This was followed by ‘Message Records Management’ with Managed Folders in Exchange Server 2007 (a feature that remains in Exchange 2010).

Exchange Server 2010 includes a new model of managing emails as records, called ‘Messaging Records Management’.  Microsoft describe it as follows:

‘Messaging records management (MRM) is the records management technology in Microsoft Exchange Server 2010 that helps organizations reduce the legal risks associated with e-mail. MRM makes it easier to keep the messages needed to comply with company policy, government regulations, or legal needs, and to remove content that has no legal or business value. This is accomplished through the use of retention policies or managed folders’. (Source: http://technet.microsoft.com/en-us/library/dd335093)

As Microsoft notes, however (on the same page), MRM does not prevent users from deleting messaging; it is really only designed to remove them at the end of a given period.  Microsoft recommend ‘journaling’ emails where there are specific business reasons to keep them for longer (such as legal proceedings or the need to ensure specific email is kept), or applying the Legal Holds functionality.

The key elements of MRM are Retention Policy Tags (RPTs) and Retention Policies.

There are three types of Retention Tags: (1) Default Policy Tags (DPT), (2) Retention Policy Tags, and (3) Personal  Tags (which are an ‘opt-in’ on the email client).

  • Retention Policy Tags (RPTs) are used on default folders (e.g., inbox, junk mail, sent, deleted). Users cannot change the RPT but can override it with a Personal Tag.
  • Default Policy Tags can be applied by users to untagged items.  A Retention Policy can contain only one default policy tag.
  • Personal Tags can be applied by users to their own custom folders or individual emails.

In most cases, users make the decision, and the retention applies on where the email is located.  If there is actual or anticipated litigation, a Retention Hold can be applied to the user’s mailbox; however, this does not prevent users deleting emails, it only overrides any retention policies.  The Legal Hold option should be applied to prevent deletion.  Once this option is applied, Legal Hold ‘captures any deleted or edited items into a special folder that’s neither accessible nor changeable by the user’.

All retention tags include: a Tag Name, a Tag Type, an age limit (in days) with an action to take, and comments.

The actions available are:

  • Delete And Allow Recovery – This action will perform a hard delete, sending the message to the dumpster. The user will be able to recover the item using the Recover Deleted Items dialog box in Outlook 2010 or Outlook Web App.
  • Mark As Past Retention Limit – This action will mark an item as past the retention limit, displaying the message using strikethrough text in Outlook 2007, 2010 or Outlook Web App.
  • Move To Archive – This action moves the message to the users archive mailbox.(see below)
  • Move To Deleted Items – This action will move the message to the Deleted Items folder.
  • Permanently Delete – This action will permanently delete the message and cannot be restored using the Recover Deleted Items dialog box.

Once the tags are created, they can be added to a Retention Policy and this policy, in turn, is then applied to specific mailboxes – one policy per mailbox.

The ‘auto-tagging’ feature, once 500 items have been tagged, will automatically tag items in a user’s mailox based on their past tagging activities.

So, is MRM the answer to managing emails as records? 

Yes and no.  From a recordkeeping perspective, MRM:

  • Does nothing to ensure that records are kept in the business activity or functional context to which they relate, unless (of course) the emails are the only form of record that exists for the business activity.
  • Does not stop users from deleting emails.

On a positive note, MRM:

  • Attempts to address the problem of email retention.
  • Allows the application of a retention policy to emails that might be stored in a business context Outlook mailbox or fold As well, Exchange features like Legal Hold and Journaling allow further controls to be implemented.

Archiving

Exchange 2010 now includes a ‘personal archives feature’, which allows users to save emails to their own archive instead of saving emails to drives or using Personal Storage (.pst). A good article on this subject can be found at this location: http://mohamedridha.com/2011/11/07/exchange-2010-online-archiving-and-retention-tagspolicies-a-practical-example/

Sources (all retrieved 1 June 2012)

Posted in Conservation and preservation, Digital preservation, Electronic records, Records management, Retention and disposal

Ensuring long term access to digital information

(This is a version of an article written for the RMAA magazine Informaa Quarterly, due to be published in May 2010).
In February 2010, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SDPA, brtf.sdsc.edu), a US-based group established in 2007 and funded by several private and public organisations, published a report titled ‘Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information’.
The BRTF-SDPA report examined the long-term preservation of digital information from an economic perspective, noting that ‘… economically sustainable preservation (of digital information) is … an urgent societal problem’.  The report quotes a 2008 IDC report stating that the volume of information now created exceeds all available storage.
The BRTF focussed its attention on digital information created within four key areas: scholarly discourse; research data; commercially owned cultural content; and collectively produced web content.  The report did not examine digital information produced by public sector agencies because there are already ‘… well articulated mandates for preservation and well defined organisations with clear roles and responsibilities’ to preserve the digital information produced by those agencies.
The report confirms the frequently cited preservation and conservation mantra that the main business case for preservation is use.  The dilemma for those making decisions about preservation is that access – and therefore use – is impossible without preservation; however, if there is no demand for access, there will be no preservation. What to preserve is the problem.
Identifying what should be preserved for later use requires significant effort that requires the agreement of a range of stakeholders – those who own, will select, preserve, pay for preservation to take place and who will eventually benefit.  The interests of these stakeholders need to be aligned as much as possible; and yet those who make preservation decisions now must attempt to do so without any real idea of what future stakeholders may want to access.
The report makes the point that a key threat to ‘persistent access’ is the costs involved, particularly where the costs outweigh the perceived benefits.
The report presents digital information as economic goods that have four essential attributes: the derived demand for access rather than preservation; their nature as depreciable durable assets that can suffer from physical degradation and loss of functionality; the ubiquity of access (known as ‘non rivalrous consumption’) which can lead to ‘free riding’; and the temporarily dynamic and path dependent nature of the digital preservation process throughout the lifecycle of the information.
These attributes, according to the report, mean that problems may be encountered aligning incentives to preserve among beneficiaries, owners and preservers.  The closer the alignment, the more likely that appropriate preservation actions will be taken.  Weak or misguided incentives to preserve are the greatest risk to preservation.
According to the report the six key conditions necessary to ensure the economic sustainability for digital information are: recognition of the benefits; selecting materials with long-term value; providing incentives for preservation; establishing effective governance arrangements and allocating resources; and ensuring that timely actions are taken before digital information is lost.
As the report notes, solving the economic challenges of digital preservation is neither easy nor insuperable.  A careful balance needs to be established between the perceived future  value of digital information, incentives for its preservation, and the roles and responsibilities of key stakeholders.