Posted in Classification, Compliance, Electronic records, Governance, Information Management, Microsoft Teams, Office 365 Groups, Products and applications, Records management, Retention and disposal, SharePoint Online

Managing MS Teams chat as records

(The image above was part of collector’s album issued in 1930 by Echte Wagner, a German margarine company. Source –

On 19 May 2020, Tony Redmond published a very helpful article on the Office 365 for IT Pros website titled ‘Using Teams Compliance Data for eDiscovery‘.

In the article, Tony describes where and how the chat component of MS Teams is stored and how this might affect eDiscovery.

He also makes the important point that, while it may be possible ‘… to backup Teams by copying the compliance records in an Exchange Online backup … you’ll never be able to restore those items into Teams.’ In other words, it is better to leave the data where it was created – in MS Teams. The post explains why this is the case. 

This post draws on the article to describe the factors involving in managing the chat element of Teams as records. It notes that, while is is technically possible to export chat messages (in various ways), it may be much better from a recordkeeping point of view to leave them where they are and subject them to a retention policy.

Two key reasons for leaving chat messages in place are: (a) chat messages are dynamic and may not always be a static ‘thread’, and (b) the chat messages exported from Exchange may not contain the full content of the message. 

What is a Teams chat?

A Teams chat consists of one or more electronic messages with at least two participants – a sender and a receiver. 


There are two types of chat message in MS Chat:

  • One-to-one/one-to-many ‘chat’ (top icon above).
  • Channel-based Teams chat (second icon above). Teams chat is visible to all members of the Team. Within channel-based chats, a person may create a private channel which is visible only the person who created the private channel and any participants.

Messages created in both options could be regarded as records because they may contain evidence of business activity.

However, one-to-one chats have no logical subject or grouping. Only the chat messages in Team channel chat are connected through the context of the Team/channel. 

Where and how are chat messages stored?

The following is a summary from Tony Redmond’s article.

Chat messages are stored directly in the backend Azure Cosmos DB (part of the so-called Microsoft 365 ‘substrate’). The version in the database is the complete version of the chat message.

The messages are then copied, less some content elements (for example: reactions, audio records, code snippets), to a hidden folder in either (a) end-user mailboxes for one-to-one chat and private channel chats, and (b) M365 Group mailboxes for channel chat.

Most export options, including the export option in Content Search and eDiscovery, draw their content from the mailbox version of the message. This has potential implications for the completeness of the chat message as a record.

Additionally, any export can only be a ‘point in time’ record unless there is absolute certainty that all chat on a given subject have ceased. 

Implications for records managers

In addition to the concerns about a chat message (or exports of them) being complete, there are (at least) two other points relating to the management of chat messages as records in MS Teams:

  • Knowing if chat messages on any given subject exist. 
  • Applying an appropriate retention policy. 

Both of these points are discussed below. 

Finding content

The primary way to locate content on any given subject across Microsoft 365 is via the Content Search option in the Compliance portal. Access to the Content Search option is likely to be restricted. So, if records managers do not have access, they will need to ask the Global Administrators to conduct a search. 

Content searches are very powerful. This Microsoft article, ‘Keyword queries and search conditions for Microsoft 365‘ provides details on how to search. The screenshot below shows an example of a very simple keyword queries with the option to add conditions. 


Searches can be configured to find content in any or all of the following locations:

  • Users, Groups, Teams
    • Exchange email
    • Office 365 group email
    • Skype for Business
    • Teams messages [the copy in the mailbox]
    • To-Do
    • Sway
    • Forms
  • SharePoint
    • SharePoint sites
    • OneDrive accounts
    • Office 365 group sites
    • Teams sites
  • Exchange public folders

Note that content search only works on the copies of the items in the Exchange mailboxes, not the backend Teams database. Accordingly, there is some potential for it to not find some content.

Both the mailbox content and the content discovered by the search can be exported.  Teams chat messages can be exported as individual items or as a PST – but note that these message may exclude the elements as described in Tony’s article.

The problem with exporting the content either this way or via other export options (such as described in this post ‘How to export MS Teams chat to html (for backup)‘ (using the Microsoft Graph API) is that it creates a single ‘point in time’ copy; additional content could be added at any time and, if the chats were subject to a retention policy, they may already be deleted.

Managing chat messages ‘in place’ as records

As any export only creates a ‘point in time’ version, it makes more sense from a recordkeeping point of view to leave the chat messages where they are and apply one or more retention policies to ensure the records are preserved. 

Ideally, organisations that may create or capture records on a given subject will have taken the time to establish a way for users to do this, including through the creation of a dedicated Microsoft 365 Group with an associated SharePoint site and Team in MS Teams. 

For example, if there is a requirement to store all records relating to COVID-19, it would make sense (at the very least) to create a Microsoft 365 Group with that name; this will create: (a) a linked mailbox accessible by all members of the Group, (b) a SharePoint site with the same name, and (c) a Team in MS Teams. All of the content – emails, documents, chat, is linked via the same (subject) Group. 

This model makes it easier to aggregate ‘like’ information and apply a single retention policy. It assumes there is (or will be) some degree of control over the creation of Teams (or very good communication to users) to prevent the creation of random Teams, Groups and SharePoint sites – AND to ensure that end-users chat about a given subject within a Team channel, not in one-to-one chat. 

What retention period should be applied to chat messages?

The retention period applied to either one-to-one or Team channel messages will depend largely on the organisation’s business or regulatory requirements to keep records. There are two potential models. 

The simplest model is to have a single retention policy for one-to-one chats, and a separate retention policy for all Teams channel chats.

As one-to-one chats are stored in the mailboxes of chat participants, it makes sense to retain the chat content for as long as the mailboxes. However, some organisations may seek to minimise the use of chat and have a much reduced retention period – even as little as a few days. 

The creation and application of retention policies to Teams channel chat may require additional considerations. For example:

  • As every Team is based on a Microsoft Group that has its own SharePoint site, it is probably a good idea to establish Teams based on subjects that logically map to a retention class. For example, if ‘customer correspondence’ needs to be kept for a minimum 5 years, and there is a Group/SharePoint site/Team for that subject, then all the content should have the same retention policy – although the Group mailbox and SharePoint site may have a policy applied to the Group, with a separate (but same retention period) applied to the Team. 
  • There may be a number of Teams that contain trivial content that does not need to be retained as records. These Teams could be subject to a specific implicit policy that deletes content after a given period – say 3 years. 

In all cases, there is a requirement to plan for retention for records across all the Microsoft 365 workloads. 

What happens to chat messages at the end of a retention period?

At the end of a Microsoft 365 retention policy period, both the mailbox version and the database version of the Teams chat message are deleted. To paraphrase Tony’s article, the Exchange Managed Folder Assistant removes expired records from mailboxes. Those deletions are synchronized back to Teams, which then removes the real messages from the backend database.

No record is kept of this deletion action except in the audit logs. Accordingly, if there is a requirement to keep a record of what was destroyed, this will need to be factored in to whatever retention policy is created. 


A basic retention model for Microsoft Teams

In my previous post about managing inactive Teams, the third option listed was to apply retention policies to those Teams. It included the graphic below. This post provides more details of a basic retention model that can be applied to both active and inactive Teams. Key takeaways Key takeaways from this post for records and […]

Posted in Classification, Electronic records, Information Management, Office 365, Records management, Retention and disposal

Planning for retention management in Microsoft 365

Fools rush to implement retention without thought‘ – Tony Redmond, 13 April 2017

Tony Redmond’s quote above, as well as the rest of the article in ‘Bringing Compliance to Office 365 Groups‘, is as relevant today as it was in 2017.

Tony is a contributing author to the e-book ‘Office 365 for IT Pros‘, essential reading for anyone doing anything with Microsoft 365. Page 921 of the May 2020 edition contains the following paragraph, which expands on the quote above and contains probably the best guidance ever required in relation to this subject:

It is sensible to write down each of the retention labels that you plan to use before creating anything. It is much easier to delay the release of a label and the training of users to use the label properly than it is to launch a label into general circulation only to discover that you later need to withdraw it. Another thing to consider is how easy it is for users to decide between different retention labels when the time comes for them to apply a label. Too many labels, misleading names, or too much choice can lead to frustration and bad decisions.

How do you go about writing down each of the retention labels as part of a plan – especially for a Microsoft 365 environment that is already in full swing?

This post provides some suggestions to help you do this.

What is your records retention and disposal status?

A good starting point is to establish the current records retention and disposal status for your organisation. Do you have a records retention schedule, also known as a disposal authority or records authority? 

If you have one of these documents, it would be useful to review it as a key part of the process is to ‘map’ the records retention classes to specific records across the various Microsoft 365 ‘workloads’ (e.g., Exchange, SharePoint, OneDrive, MS Teams etc), not just in one system (such as SharePoint).

You will need to know what and where these workloads are.

Where (and what) are the records in Microsoft 365?

If you are a records manager then there is a reasonably good chance that you have very little access to, or visibility of, all the content stored across Microsoft 365.

You may have access to one or more SharePoint sites, but unless you are a SharePoint Admin or Site Collection Admin on every site, your visibility will be very limited.

Most of the records in Microsoft 365 will be stored in Exchange, SharePoint, OneDrive for Business, or MS Teams.

  • Emails created and sent by users are stored in Exchange mailboxes. There may also be public mailboxes. Unless there is a plan (or third-party app) to copy these (or some of these) emails out of Exchange (e.g., to SharePoint), most email records will probably remain in user’s mailboxes.
  • Records that, in the past, would have been saved to a network file share (or EDRMS) will now be in SharePoint Online (corporate content) or OneDrive for Business (ODfB) (personal/working content).
  • Chat messages in MS Teams are stored in a hidden area of the Exchange mailbox of each user who participates in the chat. Any documents shared in this chat area are stored in the OneDrive for Business of the person who shared the document.
  • Channel-based Team chat messages in MS Teams are stored in a hidden area of the Exchange mailbox of the Office 365 Group linked with the Team. Any documents shared in this chat area are stored in the SharePoint site of the Office 365 Group linked with the Team.

So, fundamentally, records are stored in two primary workloads: Exchange mailboxes and SharePoint/OneDrive for Business.

What are the retention options?

There are two retention options in Microsoft 365. Both are configured in the Compliance portal of Microsoft 365. Access to this portal requires special privileges, which may not always be granted to records managers.

The two options are:

  • Retention labels published as retention policies and then applied to the various workloads (Exchange email, SharePoint, OneDrive, Office 365 Groups (Exchange/SharePoint content)). These are sometimes described as ‘explicit’ policies because they are visible to end users. Organisations with an E5 licence can extend the way these labels are applied and retention managed.
  • Retention policies that are applied directly to the various workloads (Exchange email, Exchange public folders, SharePoint, OneDrive, Office 365 Groups (Exchange/SharePoint content)). These are sometimes described as ‘implicit’ policies because they are not visible to end users. These policies automatically delete content at the end of a retention period, without any review possible.

Records managers will need to determine how to ‘translate’ each records retention class into one of the two options above, and how and where it will be applied in Microsoft 365.

Some of the options may also require the creation of new records retention classes – for example for the chat element in Microsoft Teams.

A suggested first model

Exchange mailboxes

Your IT probably already has some form of back-up regime (‘archive’) for mailboxes, used for disaster recovery and investigation purposes.

It might be worth creating two policies for mailboxes:

  • All end-user mailboxes could have a single ‘implicit’ retention policy (e.g., 7 years).
  • Mailboxes for specific staff (e.g., senior managers) could have a second, longer, ‘implicit’ retention policy. This policy will take over when the first one expires, but just for the mailboxes identified.

The use of retention policies in this way can replace the need for mailbox backups. No emails will ever actually be deleted while the retention policy is in place and all content can be retrieved via the Content Search option in the Compliance Portal. 

Content Searches can also be used to retrieve and export emails.

OneDrive for Business

As with end-user mailboxes, OneDrive for Business accounts are generally inaccessible to records managers. To ensure that the content in those accounts is not deleted, a single Microsoft implicit retention policy of, say, 7 years could be applied to all ODfB accounts.  This policy will create a hidden (to the user) ‘Preservation Hold’ library on the ODfB account.

Anything ‘deleted’ by the end user during the retention period will be moved to the Preservation Hold library, which is visible to the Global Admins and SharePoint Admins from this URL – /_layouts/15/viewlsts.aspx?view=14

In addition the OneDrive settings include the option (under ‘Storage’ in the ODfB admin portal) to retain OneDrive accounts for a period of time after they are inactive.

All content in these locations is accessible from a Content Search.


SharePoint is likely to be the most complicated in terms of retention policies if there is a requirement to keep content for different periods of time in accordance with the retention schedules/records disposal authorities.

There are likely to be three main options in relation to SharePoint content:

  • One or more implicit retention policy/ies applied to one or more sites. When applied to a SharePoint site, a ‘Preservation Hold’ library retains anything that is ‘deleted’ by end users.
  • One or more explicit label-based retention policies applied to one or more sites. When applied to a SharePoint site, the option to apply it appears for each document library on the site. Once applied (manually), end users cannot delete anything and if the library is synced to File Explorer, the File Explorer view of the library will be read only.
  • A combination of implicit and explicit retention policies.

The decision to apply what policy to what site will depend on your SharePoint architecture and the content stored in each site. For example:

  • A SharePoint site that only stores records that map to one records retention class could have either a single implicit policy (if there is no requirement for disposal review) or a single explicit policy that is applied manually to each library.
  • A SharePoint site that contains records that map to multiple retention classes, but for one business function and also ‘working papers’ could have (a) one implicit policy to cover the working papers and (b) one label-based retention policy with multiple labels – one for each class. This means, for (b), that a specific retention label can be applied to each library as required.
  • SharePoint sites linked with Office 365 Groups and Teams. Depending on the content in the site, it may be possible to apply a single retention policy for all M365 Groups (which covers both the SharePoint site and the mailbox), or a similar policy created for a Group of SharePoint sites (which excludes the mailbox).

MS Teams

As noted above, the chat content in MS Teams is stored in Exchange mailboxes – (a) the mailbox of each participant for one-to-one chat, and (b) the mailbox of the Office 365 Group for channel-based chat.

You may consider having a relatively short-term retention period for one-to-one chat. The retention period for the channel based chat will depend on the subject matter and should – ideally – be the same as for the linked SharePoint site. For example:

  • A Team set up for a specific business function and activity (or activities) will have channel based chat and a linked SharePoint site. Both should be subject to the same retention period.
  • A Team set up for low-level discussion about a subject that may be not be covered by any retention period could be subject to a general retention policy for the chat and the SharePoint content.

Bringing it together

As noted at the beginning of the post, if you are going to use retention policies in Microsoft 365 you need a plan and you need to document it. It doesn’t matter too much if the environment is already active.

However, you will need to have discussions with your Microsoft 365 Global Admins, Compliance Admins and SharePoint Admins and know where the content is stored.

  • The Global Admins can give you a list of every Office 365 Group and Team in MS Team (these are connected – every Team is based on an O365 Group).
  • The SharePoint Admins (or Global Admins) can give you a list of every SharePoint site.

There are some potential ‘quick wins’, such as agreement with IT regarding Exchange mailboxes, OneDrive for Business accounts, and MS Teams.

The more complex requirement is to map the classes in your records retention schedules/disposal authority to content stored in SharePoint, including for standard sites (not linked with Microsoft Groups), communication sites, and sites linked to Office 365 Groups.

You can start to do this by having a list of all the sites exported from the SharePoint Admin portal. This should allow you to see how many sites exist, how much content they hold, and if they are active or not.

It is probably a good idea for the records manager to be included as a Site Collection Administrator, including by being a member of a Security Group added to every SharePoint site. This will help the records manager gain visibility of the content of each site, however they should be very careful about browsing the content as everything is recorded in audit logs.

Document and plan

The outcome of all these actions should be one or more documents that describe (a) where records are stored and (b) the retention policy and action that will apply to those records.

  • For Exchange mailboxes, OneDrive for Business accounts, and MS Teams, this may be a single line for each policy.
  • For SharePoint, there should be a listing of every site and the retention policy or policies that apply to that site.
  • Additionally, for SharePoint sites where an explicit label-based retention policy is applied, the listing should show which libraries this has been applied to. If a disposal review option has been selected, there should be a process to ensure that the metadata of the library where the records are stored is exported and stored in a different location. The original library may then be deleted.
Posted in Classification, Compliance, Electronic records, Information Management, Records management, Retention and disposal, SharePoint Online

Applying multiple retention policies to a SharePoint Online site

Many organisations have complex records retention requirements that are described in records retention schedules, disposal authorities or records authorities. For example:

  • There may be different ‘levels’ of retention depending on the ‘state’ of a record. The final versions of certain records may have a longer retention requirement than the working versions.
  • For each business function there may be multiple types of records, each with their own retention requirement or ‘class’.
  • In some disposal authorities based on business functions, activities that produce records (for example ‘Meetings’) may appear in multiple functions with the same retention requirement.

This post describes multiple and different types of Microsoft 365 retention policies created with an E3 licence in the Information Governance section of the Compliance admin portal can be applied to a single SharePoint site.

Example retention schedules/disposal authorities

Most records retention schedules or disposal authorities list types (or ‘classes’) of records that are created or captured by the organisation, including through the completion of various activities or transactions, and define how long these records must be kept or retained by the organisation (or transferred to an archival institution).

These record types or classes are usually grouped, by business subject or function.

The following extract, from a private sector company records retention schedule, shows records grouped by subject type (‘Company records’).


In the example below, from the Victorian (Australia) government, records are grouped by function (‘Enquiries and Complaints’).


The diagram below presents a simple view of the examples above. For every subject type or business function, there may be one or more records description (based on the activity or transaction that creates or captures the record) with a corresponding retention period.


How does SharePoint manage records?

SharePoint Online team sites (including the sites linked with Microsoft 365 Groups and MS Teams) may be created to manage the records for a particular business area or function, or for a specific business activity.

Whether a single or multiple document libraries are used, SharePoint sites may contain a mix of record content. It may not always be possible to apply a single retention policy to the site.

Use case

For the purpose of this post, we will assume that the organisation has a business function named ‘Client Services’ – a generic name for a business unit that delivers client services.


The Client Services area has several SharePoint sites. One of these sites is named ‘Client Services’.

The ‘Client Services’ site, which has been active for several years, has multiple libraries for the activities it performs, including ‘Meetings’, ‘Procedures’, ‘Working papers’, ‘Rosters’, ‘Marketing’ and so on. Most of these libraries are created annually and consequently the year is added to the library name to help group content more efficiently – for example, ‘Meetings 2018’, ‘Meetings 2019’.

The organisation’s records retention authority has multiple classes for the Client Services function, including:

  • Marketing – Retain for five years
  • Meetings – Retain for seven years
  • Procedures – Retain for seven years.
  • Rosters – Retain for ten years

There is no class for general ‘working papers’ that may be created in support of the above activities, but the organisation would like to ensure that all content not otherwise covered by one of the ‘explicit’ retention policies above is retained by an ‘implicit’ or background policy.

Creating the Office 365 retention policy

Based on its requirements, the organisation will require two different options.

  • A single retention policy with a minimum three year retention for content (including ‘working papers’) not covered by any other longer retention period. This will be created as an ‘implicit’ or background policy and applied to the site. Any content that is deleted by the end users will be moved to the invisible (to end users) Preservation Hold library. Records covered by this policy will be automatically deleted – via the Recycle Bin – at the end of the retention period.
  • Multiple retention labels published in a single retention policy, that is applied on this site or other sites that can be mapped to the same function. This means that, when applied to a document library, every one of the labels will appear in the drop down menu in the library settings to apply a label. Depending on how the label has been configured, the records may be automatically deleted or subject to a disposition review.

Label-based retention policies – retention settings

Each retention label that is created will include a name and description, and then the label retention settings.

  • How long it is to be kept (e.g., 7 years).
  • What happens at the end of that period (delete automatically, disposition review, nothing).
  • Trigger for disposal – date created, date modified, date labeled. The ‘Date labelled’ option is preferred as it will not prevent day-to-day actions on the library or make the synced version read-only.

This process is repeated for each label. Each label can include the ‘File Plan’ settings, for example any reference numbers, the Function and Activity, and so on.

Here are two of the labels that have been created:


Publishing the labels

After each label has been created, they can then be published together in a single (‘Client Services) retention policy that is applied to the site (Client Services).


The published policy now appears in the list of label-based retention policies. It also appears under the ‘Retention’ tab of the Information Governance section, along with all other published label-based policies and policies that are not based on policies.


Non-label retention policy

The ‘implicit’ or background policy is created directly as a retention policy, without the need for a label. This policy, named ‘Temporary records’, has a three-year retention. It is applied directly to the site (or multiple sites).

Applying the label-based policies to the site

The Client Services site has several libraries as shown below.

We want to apply the label-based policies to the libraries named ‘Meetings 2020’, ‘Rosters 2020’ and ‘Marketing’. The general ‘Documents’ library will be covered by the implicit retention policy for ‘Temporary records’.


To apply the label-based policies to the library, click on the library and navigate to Library Settings where the option to ‘Apply label to items in this list or library’ is found.


A drop down list shows all available label-based policies. As the ‘Client Services’ policy was only applied to this site, only those labels appear. Only one option can be selected for each library.

It is usually a good idea to check the box (hidden behind the list of policies in the screenshot below) to ensure that anything already stored in the library will be covered by the policy.


The Meetings 2020 library has now been assigned the Client Services Meetings – 7 years policy. As soon as this label as been applied:

  • It will no longer be possible to delete any content.
  • If the library has been synced to File Explorer, the library in File Explorer will become read only.

The only way to to remove this restriction is to remove the policy. Accordingly, it may be better to apply the label only when the library has become inactive.

Note – The Temporary records implicit policy will continue to operate in the background and will apply to any content in any library or list not covered by an explicit policy. Anything deleted will be moved to the Preservation Hold Library accessible only by the Site Collection Admins or higher.

The final model can be visualised as follows:



The longest retention option will always take precedence. So, if an explicit label-based policy has a retention period of 2 years, and the background implicit retention policy has a retention of 5 years, the content will be kept for 5 years.

Note also that only the content of the libraries or lists is deleted at the end of the retention period. The library or list – and the site – remain.


As described in this post, it is possible to create multiple retention policies and apply them only to a single SharePoint site.

This allow organisations to create targeted groups of retention policies which is likely to be useful in organisations with detailed or function/activity based retention schedules.

Planning is required to ensure that there is appropriate and effective retention coverage for all the content created and captured in all SharePoint sites.

Posted in Digitisation, Records management, Retention and disposal

Storing offsite or digitising paper records – which is more cost-effective?

As organisations increasingly move to digital recordkeeping, many are left with paper records (often stored in commercially provided offsite storage) that need to be retained or kept for a minimum period.

Some organisations may consider digitising these records in the belief that this may be more cost effective and useful than keeping them in paper form.

This argument is not always correct and is often based on a poor understanding of the total costs associated with either option.

This article outlines the indicative costs associated with both offsite storage and digitisation. It concludes that it is almost always most cost effective to keep inactive paper records in commercial offsite storage than it is to digitise them.

Note, in this post, the box size is assumed to be 310L x 390W x 250D (mm), or similar that can store 20 files with 100 pages each, or up to 2000 individual pages. All costs are shown in Australian dollars.

Offsite (commercial) storage

Commercial offsite storage helps to free up space, improve the quality of storage and reduce potential risks.

A decision to store paper records in offsite storage is often based on what appears to be a relatively low storage rate per box per month, rather than the total costs for the life of the box, from collection to destruction. 

Offsite storage cost elements

Before deciding to digitise records, it is a good idea to understand all the costs (and potential costs) associated with current or proposed offsite storage, listed below.

  • Cost of the box and barcode label. Around A$3 per box.
  • Delivery cost per box. Around A$1.50 per box (when delivered in a pack of 10).
  • Collection of boxes for storage. Depends on volume collected, but likely to be around A$3 per box for a typical small-size collection.
  • Registration of boxes for storage. Often around A$2 per box.
  • Retrieval from storage, delivery and collection and return to storage. Around $15 per box, per retrieval and return. This price will vary depending on a number of factors including how many are retrieved or delivered, and if the retrieval is priority or urgent. A single urgent box retrieval, especially out of hours, can cost as much as A$100.
  • Annual storage. This cost can range from A$2.40 and A$12 per box per year depending on volumes and contracted rates. For this post I am using an average annual cost of A$6, which would work out to around A$42 for seven years.
  • Destruction including retrieval from storage. Around A$5 per box.

These costs can be documented in a simple Excel-based cost calculator. Offsite storage providers usually have a similar model to set costs.  

Indicative costs for offsite storage

Depending on volumes and the contracted rates, the total average lifetime cost per box in storage for 7 years can be as little as A$35 (based on actual experience with ~35,000 boxes in storage), or well over $100. Additional retrievals or other activities (often described as ‘ancillary costs’) will add to this cost – see below.

Longer storage periods, retrievals (especially urgent or priority retrievals) and collection and return to storage will add to the cost.

Controlling offsite storage costs

Offsite storage costs can get out of hand when:

  • Offsite storage contracts are devolved to business units that use different companies or have different rates with the same company.
  • There is no central control point for collections or retrievals, or proactive management, including regular disposal, of boxes in storage.
  • Boxes are sent to storage and forgotten. Proactive management of boxes in offsite storage – and especially regular authorised disposal – of records in offsite storage is essential to keeping a lid on costs. 
  • Nobody knows what’s in the boxes, resulting in very costly requests to retrieve and document the boxes, or risk-based decisions to destroy them.


There are usually two options to digitise records – (a) in-house or (b) outsourced.

Unless the organisation has all the necessary equipment to digitise records, it is usually more cost-effective to outsource the process to a dedicated commercial provider. The provider may not be the same company where the boxes are stored.

Digitisation standards

Before any digitisation exercise is undertaken it is important to understand and establish the minimum acceptable requirements for the digitised records. This is especially the case if they are to replace paper records as the only record – for example, single or multi-page PDFs (or PDF/A), searchability (OCR), color or grey scale, dots per inch (DPI) and so on.

The National Archives of Australia (NAA) has published a list of ‘Scanning Specifications‘ (PDF) which also makes reference to AS/NZS ISO 13028:2012, ‘Information and documentation – Implementation guidelines for digitization of records’ for a guide to suitable quality assurance checks.

The NAA guidance states that, for digitised documents where colour is present, the minimun requirements are:

  • Format: PDF (PDF/A3 encouraged [but not mandated]); JPEG 20004, PNG5 or TIFF
  • Resolution: 300 dpi
  • Scanning ratio: 100%
  • Colour profile: colour
  • Bit-depth: 8 bits per channel RGB
  • Colour management: embedded ICC colour profile encouraged
  • Searchability: OCR6 encouraged (PDF or PDF/A complies)

These requirements, or a variation on them, must be included with any request for quotation, or included in the outsourced provider’s response. Once the digitisation is underway regular samples should be taken to confirm compliance and scan quality.

Digitisation cost elements

The following describes the main actions and indicative costs associated with digitising a box of records:

  • Collection of records (in boxes). Price will vary depending on volume – the lower the number, the higher the cost. Note – if the boxes are already in offsite storage, the storage provider will charge a ‘retrieval’ fee.
  • Document preparation (‘doc prep’). This involves pulling apart files, removing staples, and adding separation sheets (e.g., between files). This activity is usually charged at an hourly price (often around A$40 per hour) and usually based on around 700 pages of document preparation per hour. Note that this activity does NOT include re-creating the files post-scanning – see below.
  • Imaging scanning. Probably the cheapest element of the quote as good scanning companies use very fast scanners. Scanning is likely to be charged in a few cents per page (e.g., $0.04). Note that the scanning process also counts the pages that are scanned, which should equal the total number of images, a simple quality check.
  • Optical Character Recognition (OCR). This is done by the machine at the same time as scanning and also charged in cents per page. Note that OCR’ing the documents make the difference between a searchable PDF and a ‘dumb’ (image only) PDF. If the digitisation involves multiple separate pages in a ‘file’, the output may be a single multi-page PDF for all those pages.
  • Indexing. This is usually a manual process unless the indexing metadata can be acquired from the original documents during the digitisation process (for example, if a file number is always in the same place on the cover of a file and is readable). Indexing costs are often charged per field and is also likely to be quoted in cents. As an example, if a file or document requires five indexing fields and they are charged at $0.05 per field, the cost per file or document will be $0.25. Indexing data is often provided in the form of a csv file to accompany the PDF outputs.
  • Quality assurance checks. Often not quoted, but it is essential that all records that are scanned are subject to some form of quality assurance checks. This may be simply a visual check of the images to check for things like skewing or dropped color, but it may also include things like page/image counts.
  • Output to a storage medium – external drive, USB or cloud storage. Cost varies.
  • Re-packing of boxes (but NOT re-stapling of documents in files) – charged per hour.
  • Disposal of the original paper records (if requested). This is commonly around A$5 – A$7 per box.

If you decide to use an outsourced provider, you should ask about the storage of the scan images that are likely to be still stored in their system, and the process for ensuring that these will be deleted after the job has completed.

Indicative costs for digitised records

The following costs are from an actual quote provided by a specialised and dedicated digitisation company. The company documented all the costs associated with the work, as listed above.

  • 48 boxes – A$6,135, or $A127.82 per box. (Simple job)
  • 1,210 files in approx 80 boxes – $15,600, or $195 per box. (Complex job that required additional cataloguing of the files, on top of indexing data fields.)

As can be seen, it is likely to be more expensive to digitise a box of records than it is to store it. Generally speaking, digitisation is likely to cost around A$200 per box.

Digitisation usually costs more

From the above cost comparison it should be obvious that, even using a dedicated digitisation company, the costs to scan a box of files may be considerably higher than leaving them in storage. There would have to be a good reason to want to digitise the records.

On the other hand, if the organisation has really poor (high) contracted rates for collection, storage, retrievals and destruction, it may work out about even. However, keep in mind that most offsite storage companies have a ‘hostage fee’ for boxes in storage. This fee will be charged if the boxes are ‘permanently’ removed from storage, adding to the total cost for digitisation.

Digitising in-house

Many organisations decide to digitise paper files in-house, sometimes for ‘security’ or ‘privacy’ reasons.

This is often not cost-effective and the digitised record quality may not meet minimum standards (listed above) for recordkeeping, especially if they are ‘dumb’ scans (without OCR), a common output of most multi-function devices (printer/scanner).

Companies that specialise in digitisation of paper records often have security and/or government security clearances. This should be confirmed with the vendor.

Storing digitised records – and destroying originals

Organisations that decide to digitise paper records need to consider where the digitised records will be stored, and what should be the fate of the original paper records.

Storing digitised records

There is little point spending so much to digitise records if they are to be saved to a network file share or left on an external hard drive.

Digitised paper records should, ideally, be imported (along with the indexing metadata) into a recordkeeping system, ideally one that does not also store the original born-digital records!

Destroying the original paper after digitisation

Organisations are often unwilling to destroy the original copies of digitised paper records. Two reasons are often quoted:

  • The paper originals are still required by law (which is usually not the case).
  • ‘Just in case’ they need to refer back to the originals because there is a problem, or perceived problem, with the digitised version or with digital recordkeeping (or an absence of it).

There is often a reluctance to destroy the paper originals simply because they are the originals (often printed from a digital original), which leads to the somewhat bizarre outcome that the organisation continues to store:

  • the original digital version; AND
  • the printed version; AND
  • the digitised version of the printed original.

This is, sadly, a common scenario in many organisations.

Nevertheless, it is often safe to destroy the originals after a given period – 3 to 6 months is common. Many government records disposal authorities include a class for the purpose of destroying the original paper versions of records that have been digitised.


Digitising or scanning paper records may be cost effective if the records need to be retrieved regularly or accessed frequently, including for public access. Keeping the original paper versions of digitised records can add to the cost (and potential for confusion over the ‘original’ version).

Offsite storage of paper files may be more cost effective if the records need to be kept for less than 10 years, are rarely retrieved while in storage, and the monthly storage cost is minimal.

Offsite storage arrangements should be reviewed regularly, including to identify records for disposal. It is not uncommon for organisations not to do this, increasing costs over time. It is not in the interest of the offsite storage providers to be (overly) proactive about the destruction of boxes in storage. Some may send out a regular reminder of boxes due for disposal based on a standard 7 year period but if no reply is received, no action is taken.

Note that the process of destroying the printed version of born-digital records does not destroy the original digital records. These are likely to remain stored on drives, in email mailboxes and on backup tapes.