Posted in Conservation and preservation, Digital preservation, Electronic records, Records management, Retention and disposal

The challenge of identifying born-digital records

A recent ‘functional and efficiency’ review into the National Archives of Australia (also known as the ‘Tune Review’, published on 30 January 2021) noted the ‘rapid and ever-evolving challenges of the digital world’.

It stated that ‘the definition of a ‘record’ needs to reflect current international standards, be more directly applied to digital technologies, and more clearly provide for direct capture of records that are susceptible to deletion, such as emails, texts or online messages’.

The review also highlighted the difficulties associated with ingesting digital records ‘via manual intensive activities (due to lack of interoperable systems)’ and proposed a new model based on the ‘continuous automated appraisal of [Agency] digital records that would require a combination of artificial intelligence and skilled archivists’.

The review underlined the challenges of identifying and managing born-digital records, and the need for better solutions.

This post explores the challenges of accurately and identifying born-digital records in order to manage them.

Identifying and protecting records

Records usually provide evidence of something that happened – an action, an activity or process, a decision, or a current state (including a photograph or video record). They may have or be associated with descriptive metadata used to provide context to the records and guide or determine retention.

Like all other types of evidence, the authenticity, integrity and reliability should be protected for as long as they must be kept.

In the paper world, this outcome was achieved by storing physical records (including the printed version of born-digital records) on paper files or in physical storage spaces.

For the past twenty years or so, this outcome was achieved for (some) digital records by (mostly manually) copying them from a network drive or email system (or via a connector) to a dedicated electronic records management (ERM) system and then ‘locking’ them in that system to prevent unauthorised change or deletion. Most ERM systems consisted of a database for the metadata and an associated network drive file store for the objects.

The main problem with this centralised storage model – however good it might be at protecting copies of records stored in it – was that the original versions, along with all the other records that were not identified or could not be copied to the ERMS, remained where they were created or captured.

And the records stored ‘in’ the ERMS were actually stored on a network file share on a server that was (a) accessible to IT, and (b) almost always backed up. So, yet more copies existed.

The challenge of born-digital records

There are several key challenges with born-digital records:

  • Consistently and accurately identifying (or ‘declaring’) all records in all formats created or captured in all locations. For too long, the focus has primarily been on emails and anything that can be saved to a network drive with the onus of identifying a record on end-users.
  • Ensuring their authenticity, reliability and integrity over time. For records stored in the ERMS, this has usually involved locking them from edit, including through the ‘declaration’ process, or preventing deletion. But in almost all cases, the original version (in email, on the network drives), could continue to be modified. Other records that were not identified or stored in an ERMS may be deleted.
  • Ensuring that born-digital records will remain accessible for as long as they are required.

It is not possible to consistently and accurately manually (or even automatically) identify every born-digital record that an organisation creates or captures to ensure their authenticity, reliability, integrity or accessibility over time. Only a small percentage of born-digital records are copied to an ERMS.

Records remain hidden in personal mailboxes, personal drives and third-party (often unauthorised) systems. Records may exist in multiple forms and formats, sometimes created or stored in ‘private’ systems or on social media platforms. They may take the form of text or instant messages or social networking posts and threads. They may be drawings, images, voice or video recordings.

Even if a record is identified, it is not always possible to save it to an ERMS. Text or instant messages on mobile devices are a case in point that has been a problem for at least two decades. More recent examples include chat messages, reactions (emojis, comments), and recordings of online meetings.

And even if a high percentage of born-digital records could be stored in the ERMS, the original versions will almost always remain where they were created or captured.

A different approach is needed.

Triaging records?

One approach to the problem would be to accept that not all records have equal value. That is, not all records need to be managed the same way.

To some degree, this way of thinking is already reflected in classes in the structure of records retention schedules and the attention paid to each:

  • Records that have permanent or archival value and need to be transferred to archival institutions.
  • Specific types of records that must be created or kept by the organisation for a minimum periods (sometimes quite long but not ‘forever’), for legal, compliance or auditing purposes.
  • Records that are not subject to legal or compliance requirements but which the organisation decides to keep for a minimum period of time.
  • Everything else.

Triaging records means that they can be managed as required at each level, but nothing is missed. It requires a risk management approach.

For records of permanent value, or are subject to legal or compliance requirements, it means that ensuring that these records receive the most attention and every effort it made to ensure that they are and can be identified (declared) and managed accordingly. This would include ensuring that it is possible to identify and capture these records in the systems used to create or capture them, for example, key emails.

A similar approach would be taken to records that need to be kept for legal, compliance or auditing purposes but with an understanding that some of these records (e.g., emails) may remain in the original system where they were created or captured. Technological solutions may be used to identify or tag these records. The destruction of these records should be subject to some form of review and a record kept of the approval and what was destroyed.

For all other records would remain stored wherever they were created or captured and subject to minimum retention periods after which they can be destroyed without review – but a record kept of the basic metadata of each record (including original storage location).

Protecting – or proving – the authenticity, integrity and reliability of records

The assumption behind the protection of records is that they should not be changed or deleted.

The reality, with digital records, is that they may change at any time through new threads, new revisions, new chats, or even through photoshopping.

A more realistic approach may be to use information about what was changed, by whom, and when – not to protect the record but to provide an evidentiary trail to prove what it is or was. The ‘smoking gun’ evidence for most born-digital records is the metadata that is recorded when it was captured or modified, not (necessarily) the added descriptive metadata.

For example:

  • Someone may author a document (metadata records each revision, and each revision can be viewed).
  • The document may be approved electronically (recorded in metadata).
  • Someone then modifies the approved version.
  • All of the above is recorded in the ‘modified’, ‘modified by’ and approval metadata.
  • The record should (or may) also recorded who viewed the record, and when.

EXIF metadata stored on images provides a similar form of evidence (and may even include GPS information).

Which record is more likely to be accepted as evidence:

  • A record stored in an EDRMS, versions or revisions of which may exist in multiple other places, including on network file shares, email system and even backup tapes
  • A record stored in a system that shows the full set of metadata about access and changes, or the most recent thread of an email discussion?


At the end of the day, it should be possible to confirm the authenticity, reliability and integrity of records based on information/metadata that forms part of the born-digital record: who and when it was created, the context in which it was created and its relationship with other records.

Perhaps, instead of focussing on trying to identify and capture all born-digital objects that might be records and ‘protecting’ a version of that record, it may be more practical and easier to leave most records where they were created or captured (and retained by retention policies) and use change or revision metadata to provide evidence of authenticity.

This may, in the end, be a much easier way to protect the authenticity of records than having to rely on manual identification or declaration.

Posted in Archiving third party content, Connectors, Conservation and preservation, Electronic records, Information Management, Microsoft 365, Microsoft Graph, Records management, Retention and disposal, Solutions

Using Microsoft 365 connectors to support records management

Microsoft 365 includes a range of connectors, in three categories, that can be used to support the management of records created by other applications. The three categories are:

  • Search connectors, that find content created by and/or stored in a range of internal and external applications, including social media.
  • Archive connectors, that import and archive content created by third-party applications.
  • API connectors, that support business processes such as capturing email attachments.

This post how these connectors can assist with the management of records.

The recordkeeping dilemma

Finding, capturing and managing records across an ever increasing volume of digital content and content types has been one of the biggest challenges for recordkeeping since the early 2000s.

The primary method of managing digital records for most of the past 20 years has been to require digital records (mostly emails and other digital content created on file shares) to be saved to or stored in an electronic document and records management system (EDRMS). The EDRMS was established as ‘the’ recordkeeping system for the organisation.

EDRM systems were also used to manage paper records which, over the past 20 years, have mostly contained the printed version of born-digital records that remain stored in the systems where they were created or captured.

There were two fundamental flaws in the EDRMS model. The first was an expectation that end-users would be willing to save digital records to the EDRMS. The second was that the original digital record remained in place where it was created or captured, usually ignored but often the source of rich pickings for eDiscovery.

The introduction of web-based email and document storage systems, smart phones, social media and personal messaging applications from around 2005 (in addition to already existing text messaging/SMS messages) further challenged the concept of a centralised recordkeeping system; in many cases, the only option to save these records was to print and scan, screenshot and save the image, or save to PDF, none of which were particularly effective in capturing the full set of records.

The hasty introduction from early 2020 of ‘work from home’ applications such as Zoom and Microsoft Teams has been a further blow to these methods.

In place records management

To the chagrin of records managers around the world, Microsoft never made it easy to save an email from Outlook to another system. Emails stubbornly remained stored in Exchange mailboxes with no sign of integration with file shares.

And for good reason – they have a different purpose and architecture to support that purpose. It would be similar to asking when it would be possible to create and send an email in Word.

The introduction of Office 365 (later Microsoft 365) from the mid 2010s changed the paradigm from a centralised model – where records were all copied to a central location and the originals left where they were created or captured, to a de-centralised or ‘in place’ model – where records are mostly left where they were created or captured.

The decentralised model does not exclude the ability to store copies of some records (e.g., emails) in other applications (e.g., SharePoint document libraries), but these are exceptions to the general rule.

It also does not exclude the ability to import or migrate content from third-party applications where necessary for recordkeeping purposes.

Microsoft 365 connectors

Microsoft 365 includes a wide range of options to connect with both internal and external systems. Many of these connectors simplify business processes and support integration models.

Connectors may also be used to support recordkeeping requirements, in three broad categories.

The three connectors

Archive connectors

Archive connectors allow organisations to import and archive data from third-party systems such as social media, instant messaging and document collaboration* platforms. Most of this data will be stored in Exchange mailboxes, where it can be subject to retention policies, eDiscovery and legal holds.

(*This option is still limited via connectors, but also see below under Search).

The social media and instant messaging data that can be archived in this way currently includes Facebook (business pages), LinkedIn company page data, Twitter, Webex Teams, Webpages, WhatsApp, Workplace from Facebook, Zoom Meetings. For the full listing, and a detailed description of what is required to connect each service, see this Microsoft description ‘Archive third-party data‘.

An important thing to keep in mind is that the data will be archived to an Exchange mailbox; this will require an account to be created for the purpose. Any data archived ot the mailbox will contribute to the overall storage quotas.

Search connectors

Search connectors (also known as Microsoft Graph connectors) index third-party data that then appears in Microsoft search results, including via Bing (the ‘Work’ tab), from, and via SharePoint Online.

Most ECM/EDRM systems are listed, which means that organisations that continue to use those systems can allow end-users to find content from a single search point, only surfacing content that users are permitted to see.

The following is an example of what a Bing search looks like in the ‘Work’ tab (when enabled).

Example Bing search showing the Work tab

Note: as at 17 November 2020, Microsoft’s page ‘Overview of Microsoft Graph connectors‘ (which includes a very helpful architecture diagram) states that these are ‘currently in preview status available for tenants in Targeted release.’

There are two main types of search connector:

  • Microsoft built: Azure Data Lake Storage Gen2, Azure DevOps, Azure SQL, Enterprise websites, MediaWiki, Microsoft SQL, and ServiceNow.
  • Partner built. Includes the following on-premise and online document management/ECM/EDRM connectors – Alfresco, Alfresco Content Services, Box, Confluence, Documentum, Facebook Workplace, File Share (on prem), File System (on prem), Google Drive, IBM Connections, Lotus Notes, iManage, MicroFocus Content Manager (HPE Records Manager, HP TRIM), Objective, OneDrive, Open Text, Oracle, SharePoint (on prem), Slack, Twitter, Xerox DocuShare, Yammer

See the ‘Microsft Graph connectors gallery‘ web page for the full set of current connectors.

A consideration when deploying search connectors is the quality of the data that will be surfaced via searches. Duplicate content is likely to be a problem in identifying the single – or most recent – source of truth of any particular digital record, especially when the organisation has required records to be copied from one system (mailbox/file share) to another (EDRMS).

API Connectors

API connectors provide a way for Microsoft 365 to access and use content, including in third-party applications. To quote from the Microsoft ‘Connectors‘ web page:

‘A connector is a proxy or a wrapper around an API that allows the underlying service to talk to Microsoft Power Automate, Microsoft Power Apps, and Azure Logic Apps. It provides a way for users to connect their accounts and leverage a set of pre-built actions and triggers to build their apps and workflows.’

To see the complete list and for more information about each connector, see the Microsoft web page ‘Connector reference overview‘.

Each connector provides two things:

  • Actions. These are changes initiated by an end-user.
  • Triggers. There are two types of triggers: Polling and Push. Triggers may notify the app when a specific event occurs, resulting in an action. See the above web page for more details.

API connectors can support records management requirements in different ways (such as triggering an action when a specific event occurs) but they should not be confused with archiving or search connectors.

Summing up

The connectors available in Microsoft 365 support the model of keeping records in place where they were first created or captured. They enable the ability to archive data from third-party cloud applications, search for data in those (and on-premise) applications, and triggers actions based on events.

The use of connectors should be part of an overall strategic plan for managing records across the organisation. This may include a business decision to continue using an ECM/EDRMS in addition to the content created and captured in Microsoft 365. Ideally, however, the content in the ECM/EDRMS should not be a copy of what already exists in Microsoft 365.

Posted in Access controls, Conservation and preservation, Digital preservation, Electronic records, Exchange 2010, Exchange 2013, Exchange Online, Information Management, Records management, Retention and disposal, XML

The enduring problem of emails as records

Ever since emails first appeared as a way to communicate more than 30 years ago they have been a problem for records management, for two main reasons.

  • Emails (and attachments) are created and captured in a separate (email) system, and are stored in mailboxes that are inaccessible to records managers (a bit like ‘personal’ drives).
  • The only way to manage them in the context of other records was/is to print and file or copy them to a separate recordkeeping system, leaving the originals in place.

Thirty-plus years of email has left a trail of mostly inaccessible digital debris. An unknown volume of records remains locked away in ‘personal’ and archived mailboxes. Often, the only way to find these records is via legal eDiscovery, but even that can be limited in terms of how back you can go.

Options for the preservation of legacy emails

The Council on Information and Library Resources (CLIR) published a detailed report in August 2018 titled ‘The Future of Email Archives: A Report from the Task Force on Technical Approaches to Email Archives‘.

The report noted (from page 58) three common approaches to the preservation of legacy emails:

  • Bit-Level Preservation
  • Migration (to MBOX, EML or even XML)
  • Emulation

In a follow up article, the Australian IDM magazine published an article in March 2020 by one of the CLIR report authors (Chris Prom). The article, titled ‘The Future of Past Email is PDF‘, suggested that PDF may be (or become) a more suitable long-term solution for preservation of legacy emails.

Preservation is one thing, what about access

There is little point in preserving important records if they cannot be accessed. The two must go together. In fact, preservation without the ability access a record is not a long different from destruction through negligence.

Assuming emails can be migrated to a long-term and accessible format, what then?

No-one (except possible well-funded archival institutions perhaps) is seriously likely to attempt to move or copy individual legacy emails to pre-defined and pre-existing containers or aggregations of other records. This would be like printing individual emails and storing them in the same paper file or box that other records on the same subject are stored.

Access to legacy emails in an digitally accessible, metadata-rich format like PDF provides a range of potential opportunities to ‘harvest’ and make use of the content, including through machine learning and artificial intelligence.

These options have been available for close to twenty years in the eDiscovery world, but to support specific legal requirements.

Search, discovery and retention/disposal tools available in the Microsoft 365 Compliance portal, along with the underlying Graph and AI tools (including SharePoint Syntex) provide the potential to manage legacy content, including emails.

The starting point is migrating all those old legacy emails to an accessible format.

Posted in Compliance, Conservation and preservation, Electronic records, Governance, Information Management, Information Security, Legal, Records management, Retention and disposal, Security

Destroying digital records – are they really destroyed?

Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.

Traditional legacy electronic document and records management (EDRM) systems have two components:

  • A database (e.g., SQL, Oracle) where the metadata about the records are stored
  • A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.

In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.

When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?

This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.

The Windows NTFS Master File Table (MFT)

Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).

In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:


As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.

  • The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size.  This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
  • The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.

If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.

When a file is deleted, the MFT records the deletion.

  • If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
  • If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.

So, in summary, ‘deleting’ a file does not actually delete it. It may either:

  • Store the file in the Recycle Bin, making it relatively easy to recover, or
  • Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.

How does an EDRMS store and manage files?

The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.

Most EDRM systems are made up of two parts:

  • A database (SQL, Oracle etc) to store the metadata about the record.
  • An attached file store that stores the actual digital objects.

When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.

When digital records are uploaded to the EDRMS:

  • The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
  • Additional metadata may be added, including a content or record ‘type’.
  • The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
  • The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
  • A link is made between the database record and the record object stored in the file store (the MFT record).

When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.

What happens when an EDRMS record is destroyed?

If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.

For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.

For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files. 


  • If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
  • There is likely to be a backup of the EDRMS.
  • The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
  • Digital forensics can be used to recover ‘deleted’ files from the associated file share.

Consider this scenario:

  • An email containing evidence of something is saved to a container in an EDRMS.
  • The container of records is ‘destroyed’ after the retention period expires.
  • A legal case arises after the container is ‘destroyed’
  • A subpoena is made for all records, including those specific records.
  • Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?

Is it really possible to destroy digital records, and does it matter?

Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.


  • Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
  • Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
  • Might it be possible to recover the record with digital forensics tools?
  • Does it matter?

It might be worth asking IT and your EDRMS vendor.




Posted in Classification, Conservation and preservation, Electronic records, Information Management, Records management, Retention and disposal, SharePoint Online

SharePoint Content Types – A records management view

SharePoint Content Types (CTs) have been a fundamental element in SharePoint architecture since they were introduced in 2007. They are also the cause of some divided opinions, in favour of (a) using multiple, ‘custom’ CTs to control content, or (b) minimising their use in favour of alternative options such as metadata choices.

A recent tweet from Nate Chambelain (@chambernate) read as follows:

You can have multiple content types in lists and libraries. They can all have unique metadata, templates, forms, retention policies, workflows, etc. Content types are EVERYTHING.

While CTs can do a lot as suggested by Nate, it is important to understand what they are, and what they can do, as part of an overall architecture model if you are planning to use SharePoint to manage records.

SharePoint administrators (and information architects and others) may have different views on this subject. There is, on one hand, the view that multiple ‘custom’ CTs are a good thing as they provide more control over content. On the other hand, there is the view that too many CTs are a bad thing because they are not easy to implement (see below for details) and it makes the environment harder to manage.

The key is getting the balance right. In my own opinion based on a decade of working with SharePoint, it is better to create custom CTs only where they provide specific useful functionality and otherwise using metadata columns and/or folders.

Content Types have been around since 2007

Content Types (CTs) were introduced in SharePoint 2007 (SP2007 usually known as MOSS2007), the immediate precursor to SP2010. CTs, it was said, would allow organisation to:

  • Have multiple document types associated with a single library.
  • Have a different document template in each CT.
  • Have different workflows and metadata for each CT.

List CTs could also be used to capture different metadata in the same list.

What are Document Types?

Document types examples might include:

  • Contracts
  • HR document (e.g., specific to a staff member)
  • Invoices
  • Plans
  • Minutes

Most organisations will have (and share) a relatively standard set of document types.

While it is tempting to create a CT for each document type (similar to the way some document management systems do this), in my experience this can create considerable overhead for very little benefit.

In my view, document types should only ‘map’ to custom ‘content types’ if there is a reason to use CTs for a specific purpose (see below for examples).

Otherwise, it is far easier and more efficient (in terms of management) to create additional metadata columns including, if required, a ‘choice’ metadata column to define the document type.

Every Content Type has a parent

Every new site collection includes a range of default CTs and each has a parent. The primary default CTs are:

  • ‘Document’, ‘Folder’ (and, where enabled ‘Document Set’) for document libraries.
  • ‘Item’ for lists.
  • ‘Event’ (parent is ‘Item’) for calendars

Your starting point, when thinking about CTs, is whether you will (a) use the default CTs or (b) create custom CTs – or (c) a combination of both. Most organisations will have a combination.

By way of reference, we had 500 site collections and fewer than 30 custom CTs, each used for a specific purpose. We did not use the CT Hub but created CTs directly on the sites where they were required.

How many custom Content Types do you really need?

The answer is, ‘it depends’.

The case for multiple custom CTs:

  • Custom CTs allows organisations control of specific types of content at a more granular level than the document library.
  • Having multiple custom CTs requires more overhead to create, implement and manage. First, each custom CT must be created via Site Settings. In each document library setting where the CT is to be used, a setting must be enabled to allow management of CTs. And finally, each CT must be added to each document library (or list).
  • Questions to consider: Who will enable the library setting then add each CT that is required? Who will manage metadata conflicts if different CTs use the same or very similar metadata? How will end users react to have to select a CT every time they just want to save a document (including via synced libraries in File Explorer)?

The case for fewer CTs:

  • Fewer or more selective use of custom CTs allows organisations to manage content at the logical aggregation of a document library rather than CT-linked content within that aggregation, using custom CTs only where necessary. This requires less overhead to manage.
  • Site Collection Administrators or Site Owners can create CTs when and only if they are needed.
  • Site Owners (and end users) can more easily manage content within a document library by using folders and/or metadata to group content and/or apply workflows.
  • Choice metadata columns (e.g., for ‘Document type’) may be a better option.
  • Site Owners can create document libraries with names the comply with naming conventions and apply retention policies to the entire library.

The bottom line – just because you can create multiple custom CTs, it doesn’t mean you should. It is important to understand the broader architecture of the entire SP environment and what you are seeking to achieve.

As a general recommendation, I would suggest creating only the CTs that you really need (based on a content management design), and consider using separate libraries (or sites) and/or metadata to define content types. For example, consider having a single library called ‘Meetings’ or ‘Policies’ with useful metadata columns.

More custom CTs will mean a more complex environment to manage, especially from a records management point of view.

How are Content Types created and then applied

New custom CTs are created via Site Settings – Site Content Types.


As can be seen above, every new CT is based on an existing (‘parent’) CT that exists in a grouping of ‘parent content types’. For example:

  • The ‘Document’ CT is found in the ‘Document Content Types’ group. Surprisingly that group also includes ‘Master Page’, ‘Site Page’ among others.
  • The ‘Document Set’ CT is the only option in the ‘Document Set Content Types’ group.
  • The ‘List’ CT is one of multiple options in the ‘List Content Types. It includes ‘Event’ found in calendars – ‘Event’ is the child of ‘Item’.

Once the new CT is created, additional columns can be added to it, including from the Managed Metadata Service (MMS), if used. The new, custom CT can now be added anywhere on the site.

Two steps are required to add a CT to a document library or list:

  • Go to the library or list settings and click on ‘Advanced’.
  • In the ‘Advanced’ section, click on the option to ‘Allow management of content types’.


Note that enabling this option has an impact on the ‘+ New’ option in the library ribbon menu – see below.

Once enabled, a new ‘Content Types’ section appears in the Library Settings, displaying all the CTs now available in the library. Only the standard ‘Document’ CT will appear and show as the default option.

Someone (SP Admin, Site Collection Admin or Site Owner) must now click on ‘Add from existing site content types’, and choose the CT to be added. This action could probably be scripted however it is a one-off event.

Once the custom CTs are added, another (but only one) CT may be set as the default. The others are visible by default but can be made invisible if required.

Any unique columns in any of the CTs will now appear in the ‘Columns’ section, showing the source of the column (‘Used in’).

Folders or document sets?

While it is possible to create folder-based CTs, with unique metadata, to group documents, it may be more useful to create document set-based CTs as these provide much more flexibility. Unlike folders, document sets also have a unique document ID (that same one that is applied to all documents in the library, with a different sequential number).

How CTs are used

Once they are enabled, the new CTs now appear in the ‘+ New’ drop down menu for the library. End users may now choose which CT they want to add to the library (or it may be the default).


As can be seen above, the option to create an Office Online document is no longer available. (These options can, however, be added back via the ‘Edit New Menu’ option, however there may be reasons not to want to allow this).

Each document-based CT includes the default ‘Document’ template. This means that if an end-user clicks a custom CT from the ‘New’ menu, it opens Word Online. Should it be necessary for the end user to, say, create an Excel document the following steps must be followed:

  • Click on library settings
  • Click on the custom CT
  • Click on ‘Advanced settings’ (which will show ‘template.dotx’ as the default template)
  • Upload a new document and choose the file – e.g., a blank spreadsheet with a sensible name, or a given template that must be used every time. Note that the template must be able to be opened in the online versions of the Office applications.

If the document library contains mandatory metadata, and the end user is working directly from the library in SPO, they will be required to add the mandatory metadata.

If the end-user saves a document to a synced library in File Explorer, and that library includes CTs, the end user will be asked to select the CT from a drop down list in a separate dialogue. If the CT includes any mandatory metadata, it will not be possible to add a new document via File Explorer.

What about retention policies?

The ability to apply retention policies in CTs is only available as an ‘auto-apply’ option to those who have an E5 licence. Essentially, when a new retention policy is created (based on a label that defines the retention options), it can be auto-applied to specific CTs. For more details, see this article by Joanne Klein.

What this means is that individual documents contained in the CT (not the CT itself) will be subject to disposal at the end of the retention period. As noted in my earlier post on managing retention outcomes, disposal may be automatic or subject to a review.

Specific use cases and careful consideration of the options and implications need to go into planning for the use of retention policies on custom CTs. While the option may appear to be a simple way to manage the retention outcomes of specific types of content, in reality it could be very hard to manage in large organisations, especially as individual documents are the subject of disposal, not the aggregation (CT or library).

Joanne Klein’s article above refers to a CT named ‘Contract document’. While this is just an example, most contract documents will be subject to disposal review ‘n’ years after the contract has expired. Therefore, retention should be based on a metadata column named something like ‘Contract Expiry Date’. Many organisations, in my experience, keep contract documents well after the contract expiry date. The use of retention policies on contract-related CTs needs to be considered carefully.

Where custom CTs could be useful

Custom CTs can be useful to manage ‘files’ within a broader grouping or aggregation. The following are actual examples from my organisation:

  • To manage individual staff files (created as custom document sets), in a document library that is used to manage all staff files.
  • To manage a collection of contract documents (stored within a custom document sets), in a document library (or libraries) that has retention enabled at the library level. In this example, folders would probably be just as suitable.
  • To manage building/property documents (again stored within a custom document set) in one or more document libraries.

The primary reason that using custom CTs here is better than using simple folders is that the custom document sets can contain specific metadata that is needed, such as ‘Staff ID’ or ‘Staff Name’, ‘Building Number’ or ‘Building address’, or ‘Company’.


Content types are, indeed, a core part of SharePoint. However, unless your organisation is small or you only want to control selected content via the use of custom CTs, it may be easier and more efficient to minimise their use and use metadata choice columns instead.

Applying retention policies to CTs – assuming that is considered to be a viable records management option – is only available to organisations that have an E5 licence. This option should be considered carefully, especially the outcomes when the retention period expires.

Custom CTs can be very useful when used correctly. It is important to have a good understanding of how they work and where they can be applied most usefully. Otherwise, there is a high potential to add considerable complexity and management overhead to your SharePoint environment.

Posted in Conservation and preservation, Digital preservation, Electronic records, Records management, Retention and disposal

Ensuring long term access to digital information

(This is a version of an article written for the RMAA magazine Informaa Quarterly, due to be published in May 2010).
In February 2010, the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (BRTF-SDPA,, a US-based group established in 2007 and funded by several private and public organisations, published a report titled ‘Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information’.
The BRTF-SDPA report examined the long-term preservation of digital information from an economic perspective, noting that ‘… economically sustainable preservation (of digital information) is … an urgent societal problem’.  The report quotes a 2008 IDC report stating that the volume of information now created exceeds all available storage.
The BRTF focussed its attention on digital information created within four key areas: scholarly discourse; research data; commercially owned cultural content; and collectively produced web content.  The report did not examine digital information produced by public sector agencies because there are already ‘… well articulated mandates for preservation and well defined organisations with clear roles and responsibilities’ to preserve the digital information produced by those agencies.
The report confirms the frequently cited preservation and conservation mantra that the main business case for preservation is use.  The dilemma for those making decisions about preservation is that access – and therefore use – is impossible without preservation; however, if there is no demand for access, there will be no preservation. What to preserve is the problem.
Identifying what should be preserved for later use requires significant effort that requires the agreement of a range of stakeholders – those who own, will select, preserve, pay for preservation to take place and who will eventually benefit.  The interests of these stakeholders need to be aligned as much as possible; and yet those who make preservation decisions now must attempt to do so without any real idea of what future stakeholders may want to access.
The report makes the point that a key threat to ‘persistent access’ is the costs involved, particularly where the costs outweigh the perceived benefits.
The report presents digital information as economic goods that have four essential attributes: the derived demand for access rather than preservation; their nature as depreciable durable assets that can suffer from physical degradation and loss of functionality; the ubiquity of access (known as ‘non rivalrous consumption’) which can lead to ‘free riding’; and the temporarily dynamic and path dependent nature of the digital preservation process throughout the lifecycle of the information.
These attributes, according to the report, mean that problems may be encountered aligning incentives to preserve among beneficiaries, owners and preservers.  The closer the alignment, the more likely that appropriate preservation actions will be taken.  Weak or misguided incentives to preserve are the greatest risk to preservation.
According to the report the six key conditions necessary to ensure the economic sustainability for digital information are: recognition of the benefits; selecting materials with long-term value; providing incentives for preservation; establishing effective governance arrangements and allocating resources; and ensuring that timely actions are taken before digital information is lost.
As the report notes, solving the economic challenges of digital preservation is neither easy nor insuperable.  A careful balance needs to be established between the perceived future  value of digital information, incentives for its preservation, and the roles and responsibilities of key stakeholders.
Posted in Conservation and preservation, Legal, Records management, Retention and disposal

A brief history of the origins of the Statute of Limitations

(NOTE: This article was completely re-revised on 1 February 2010, all original content was changed).
The retention – and eventual disposal – of records is a common business practice, despite occasional concerns about what gets destroyed.  Justice Scalia, in Arthur Andersen LLP v United States (No. 04-368, 2004) said as much about the destruction of records relating to Enron by Arthur Anderson ‘… we all know that what are euphemistically termed “record-retention programs” are, in fact, record-destruction programs, and that one of the purposes of the destruction is to eliminate from the files information that private individuals can use for lawsuits and that Government investigators can use for investigations.’
A key factor in all records disposal programs is determining how long records should be kept.  In many parts of the English speaking world, seven years is frequently cited as the minimum period that records must be kept. But what is the origin or significance of this period of time?
It seems, based on the available evidence, that the seven year period is based on an arbitrary period or time limit of six years, set in 1623.  Some jurisdictions with English legal traditions around the world have retained the same minimum six year period, for example, ‘An action for an account shall not be brought in respect of any matter which arose more than six years before the commencement of the action.'(s4(2) Limitation Act 1950 (New Zealand)). Others have decided to go with seven years, based on (it would seem) the expiry of the six year period.
It has been claimed by some commentators that the seven year period is based on Deuteronomy 15:1 – 2, which refers to the release of debts after a seven year period, and Deuteronomy 31:10 which has similar references.  There is also, as we will see, Jewish influence on English property law during the same period which set the scene for the eventual creation of statutes of limitations, but these links do not provide credible links to the specific period of time that was chosen.  If anything, the origins of a set timeframe for (legal) actions can be traced to Roman Law but, again, the links with early English property law is not strong.
Roman Law
Roman law, as outlined in the Twelve Tables (see, for example,, included the principle for property related matters of usucapio, literally ‘taking by use’ (Table VI.5).  ‘Usucapio of movable things requires one year’s possession for its completion; but usucapio of an estate and buildings two years.’
The concept of usucapio is in many respects the basis for the English expressions ‘possession is nine-tenths of the law’, and ‘finders keepers’.  The timeframes defined in the original tables were eventually extended by Justinian but ‘it remained in principle a method of acquiring ownership’.  (House of Lords, R v Oxfordshire County Council and Others, 24 June 1999).
Roman law also established the concepts of possession (possessio) and ownership (dominium). (see reference in sources).
Of interest is that Henry de Bracton, a Royal judge during the time of Henry II, wrote considerably on Roman Law, although his writings and the value of them have been disputed (see Wikipedia article).
English Law
English law, on the other hand, never accepted the idea that long possession of property was the basis for ownership or acquiring title. Instead, the continual possession of property over a passage of time removed the original owner’s right to claim it back.
Blackstone’s Commentaries on the Laws of England (1765-1769) notes that William I introduced feudal tenures into England after 1066.  An essential part of this new governance model was that ‘the king is the universal lord and original proprietor of all the lands in his kingdom; and that no man doth or can possess any part of it, but what has mediately or immediately been derived as a gift from him, to be held upon feodal fervices’.  This meant that the tenant’s possessory right in land was limited to usufruct, as granted by the King, who retained absolute dominion over the land.
Usufruct means that the tenant, or ‘fief’ was required to render service to the sovereign in return for the privilege of using the land.
According to Judith Shapiro, William also brought with him Jews who were owned by him and became his moneylenders.  Jews could not own land, but they could lend money using land as the collateral security, and presumably over a period of time.  While the contracts established at the time (‘shetar’, also known as ‘Jewish gage’) did include a clause from the bible (Deuteronomy 24:10-11) protecting debtors, it is highly doubtful that they released debt at the end of 7 years – in fact, Deutoronomy 15:3 clearly distinguishes ‘foreigners’ from this requirement.
According to the UK Law Commission April 2009 report ‘Why does the present law need reform’, the first limitation periods applied only to land-related actions.
Henry I succeeded William in 1100 and reigned until his death (of gluttony) in 1135.  Henry I brought about many changes to English feudal law recognised in documents such as ‘Leges Henrici Primi’ (written around 1115) and ‘Quadripartitus’.  One important change introduced was a limit on the date by which a ‘disseisor’ (that is, a person claiming ownership of land as a result of adverse possession (‘assize of novel disseisin’)) could claim ownership.
In R v Oxfordshire County Council and others, 1999, it is noted that ‘… the medieval real actions for the recovery of seisin were subject to limitation by reference to past events.
Shapiro (ibid) notes that, during Henry II’s reign (1154 – 1189), ‘… the King’s court assumed an increasing share of litigation that had previously only been heard in local courts.  This was done through the issuance of Royal writs, including the new ‘writ of debt’, used to collect loans of money.
Writs of Entry were also created during this period, according to Joseph Biancalana.  Writs of entry were used to allege that a defendant had no entry into land other than by a transaction or taking that did not authorise him to hold the land, for a period of years (‘ad terminum qui preterit’), defined in three degrees.
Biancalana claims that the timeframe set out in the three degrees was developed from the writs of ‘gage’ (debt).
Shapiro claims that, eventually, the Jewish moneylending practices became ‘a weapon of socio-economic changes that tore the fabric of feudal society and established the power of liquid wealth in place of land holding.’  Riots broke out in 1190 and many of the original documents were destroyed (leading, incidentally, to the creation in 1200 of local Archives (Archae) and duplicate copies).
The UK Law Commission report noted above states that,
  • Before 1237, ‘… plaintiffs could not claim land on the basis of seisin before the day in 1135 when Henry I died.’
  • In 1237, the Statute of Merton, 20 Hen III (1235) stated that a writ of right for land-related claims could not refer back to any time before the coronation of Henry II in 1154.
  • In 1275, the Statue of Westminster, 3 Ed I c 39 moved this date forward to the coronation of Richard I in 1189.
  • These dates were not changed again until the 1540 Act of Limitation, which prescribed 60, 50, and 30 year limitation periods for land-related writs of right, writs of morts d’ancestor, and claims based on possession of the claimaint, respectively.
In R v Oxfordshire County Council, it further notes that ‘as time went on, proof of lawful origin … became for practical purposes impossible … the evidence was not available …’ to assess claims of novel disseisin.  Judges apparently instructed juries that ‘if there was evidence of enjoyment for the period of living memory, they could assume that the right had existed since 1189’.  As time wore on, it clearly became impossible to prove.
Finally, the Statute of Limitations Act 1623 fixed a 20 year period for ‘writs of formedom’ (UK Law Commission report).
However, these changes still proved difficult in practice and often relied on ‘legal fictions of presumed grants’ (R v Oxfordshire) effectively based on ‘time immemorial’ (that is, since 1189).
Until the passage of the Act in 1623, no limitation periods existed for other, non land-related claims. (UK Law Commission report) The new Act included limitation periods for non-land-related claims as follows:
  • Two years: Actions for slander
  • Four years: Actions of trespass to the person, assault, menace, battery, wounding and imprisonment
  • Six years: Actions on the case (other than slander); actions for account, other than such accounts as concern the trade of merchandise between merchant and merchant, their factors or servants; actions of trespass, detinue, action sur trover, and replevin for taking away of goods or cattle; actions of debt grounded upon any lending or contract without speciality; and actions of debt for arrears of rent; actions of trespass to land.
The 1623 Act also provided for an extension of time where the plaintiff was under the age of 21, a married woman (‘feme covert’), mentally disabled (‘non compos mentis’), imprisoned, or ‘beyond the seas’.
The UK Law Commission states, on page 5 that ‘we have been unable to trace any information on the reason why the six year period was thought appropriate’.  They add that ‘No limitation period applied to contracts under seal (that is, specialties), actions of account between merchants, their servants or factors, actions brought for debt under a special statute, or actions brought on a record’.
Limitation periods for land related actions were reviewed by the Real Property Commissioners in 1829.  The Commissioners recommended the retention of the 20 year period, implemented in the Real Property Limitation Act 1833 and the Prescription Act 1832.  The Commissioners also found that no limitation periods applied in some cases, including where seisin did not need to be alleged. And, there were no statute of limitations applied to actions by the Church. The 20 year period was then reduced to 12 years by the Real Property Limitation Act 1874. (UK Law Commission report)
Limitation periods were further reviewed in 1936 and recommendations made.  These included:
  • That a single limitation period of six years should apply to actions in simple contract, and actions in tort.
  • A new limitation period of 12 years (down from 20) was created for actions on a specialty.

According to the UK Law Commission report, the six year period ‘which at present applies to the majority of such actions … is familiar to the general public’.

  • ‘The Shetars Effect on English Law – A Law of the Jews becomes the Law of the Land’ by Judith Shapiro in The Georgetown Law Journal Vol 71, pages 1179 – 1200
  • ‘The Origin and Early History of the Writs of Entry’, Joseph Biancalana.  Law and History Review.  Vol 25, No. 3, Fall 2007.
  • ‘Final Report on Limitation  and Notice of Actions’, Western Australian Law Reform Commission,  1997.
  • ‘Ownership and Possession in the Early Common Law’, by Joshua C. Tate, Southern Methodist University (SMU) – Dedman School of Law, American Journal of Legal History, Vol. 48, pp. 280-313, 2006 SMU Dedman School of Law Legal Studies Research Paper No. 5
  • UK Law Commission, ‘Why does the Present Law need Reform?’, April 2009
  • Blackstone’s Commentaries on the Laws of England (1765-1769)
  • ‘A Man and His Money’, Harvey Reeves Calkins , 1915.
  • ‘A treatise on the law of actions relating to real property’, Henry Roscoe, 1825