Ever since emails first appeared as a way to communicate more than 30 years ago they have been a problem for records management, for two main reasons.
- Emails (and attachments) are created and captured in a separate (email) system, and are stored in mailboxes that are inaccessible to records managers (a bit like ‘personal’ drives).
- The only way to manage them in the context of other records was/is to print and file or copy them to a separate recordkeeping system, leaving the originals in place.
Thirty-plus years of email has left a trail of mostly inaccessible digital debris. An unknown volume of records remains locked away in ‘personal’ and archived mailboxes. Often, the only way to find these records is via legal eDiscovery, but even that can be limited in terms of how back you can go.
Options for the preservation of legacy emails
The Council on Information and Library Resources (CLIR) published a detailed report in August 2018 titled ‘The Future of Email Archives: A Report from the Task Force on Technical Approaches to Email Archives‘.
The report noted (from page 58) three common approaches to the preservation of legacy emails:
- Bit-Level Preservation
- Migration (to MBOX, EML or even XML)
In a follow up article, the Australian IDM magazine published an article in March 2020 by one of the CLIR report authors (Chris Prom). The article, titled ‘The Future of Past Email is PDF‘, suggested that PDF may be (or become) a more suitable long-term solution for preservation of legacy emails.
Preservation is one thing, what about access
There is little point in preserving important records if they cannot be accessed. The two must go together. In fact, preservation without the ability access a record is not a long different from destruction through negligence.
Assuming emails can be migrated to a long-term and accessible format, what then?
No-one (except possible well-funded archival institutions perhaps) is seriously likely to attempt to move or copy individual legacy emails to pre-defined and pre-existing containers or aggregations of other records. This would be like printing individual emails and storing them in the same paper file or box that other records on the same subject are stored.
Access to legacy emails in an digitally accessible, metadata-rich format like PDF provides a range of potential opportunities to ‘harvest’ and make use of the content, including through machine learning and artificial intelligence.
These options have been available for close to twenty years in the eDiscovery world, but to support specific legal requirements.
Search, discovery and retention/disposal tools available in the Microsoft 365 Compliance portal, along with the underlying Graph and AI tools (including SharePoint Syntex) provide the potential to manage legacy content, including emails.
The starting point is migrating all those old legacy emails to an accessible format.