Most people should be aware that pressing the ‘delete’ option for a file stored on a computer doesn’t actually delete the item, it only makes the file ‘invisible’. The actual file is still accessible on the disk and can be retrieved relatively easily or using forensic tools until the space it was stored on is overwritten.
Traditional legacy electronic document and records management (EDRM) systems have two components:
- A database (e.g., SQL, Oracle) where the metadata about the records are stored
- A linked file share where the actual objects are stored, most of which are copies of emails or network file share files that remain in their original location.
In most on-premise systems, email mailboxes, network file shares, and the EDRMS database and linked file share are likely to be backed up.
When a digital record comes to the end of its retention and is subject to a ‘destruction’ process, how do you know if the record has actually been destroyed? And even if it is, how can you be sure that the original isn’t still stored in a mailbox, network file share, or a back up?
This post examines what actually happens when a file is ‘deleted’ from a Windows NT File System (NTFS), and questions whether digital records stored in an EDRMS are really destroyed at the end of the retention period.
The Windows NTFS Master File Table (MFT)
Details of every file stored on a computer drive will be found in the NTFS Master File Table (MFT).
In some ways, the MFT operates like a traditional electronic document management system – it is a kind of database that it records metadata about the attributes of the digital objects stored on the drive. These attributes include the following:
As noted in the diagram above, the details stored by the MFT include the $File_Name and $Data attributes.
- The $File_Name attributes include the actual name of the file as well as when it was created and modified, and its size. This is the information that can be seen via File Explorer and is often copied to the EDRMS metadata.
- The $Data attribute contains details of where the actual data in the file is stored on the disk (in 0s and 1s) or the complete data if the file is small enough to fit in the MFT record.
If the MFT record has many attributes or the file data is stored in multiple fragments on a disk (for example as a file is being edited), additional MFT ‘extension’ records may be created.
When a file is deleted, the MFT records the deletion.
- If the file is simply deleted, the record will remain on the disk and can be recovered from the Recycle Bin.
- If the file is deleted through SHIFT-DEL or emptying the Recycle Bin, the MFT will be updated to the ‘Deleted’ state and update the cluster bitmap section to set the file’s cluster (where the data is stored) as being free for reuse. The MFT record remains until it is re-used or the data clusters are allocated in whole or part to another file.
So, in summary, ‘deleting’ a file does not actually delete it. It may either:
- Store the file in the Recycle Bin, making it relatively easy to recover, or
- Change the MFT record to show the file as being deleted but leave the file data on the desk until it is overwritten.
How does an EDRMS store and manage files?
The following summary relates to a well-known Electronic Document and Records Management System (EDRMS). Other systems may work differently but the point is that records managers should understand exactly how they work and what happens when electronic files are destroyed at the end of a retention period.
Most EDRM systems are made up of two parts:
- A database (SQL, Oracle etc) to store the metadata about the record.
- An attached file store that stores the actual digital objects.
When EDRM systems are used to register paper or physical records (files and boxes), only the database is used.
When digital records are uploaded to the EDRMS:
- The metadata in the original file, including the file type, original file name, date created, date modified and author are ‘captured’ by the system and recorded in the new database record.
- Additional metadata may be added, including a content or record ‘type’.
- The record will usually be associated with a ‘container’ (e.g., ‘file’). This containment makes the record appear to be ‘contained’ within that container, whereas in fact it is simply a metadata record of an object stored elsewhere.
- The original record filename is changed to random characters (to make it harder to find, in theory) and then stored on the attached (usually Windows NTFS) file store, often in a series of folders.
- A link is made between the database record and the record object stored in the file store (the MFT record).
When the end-user opens the EDRMS, they can search for or navigate to containers/files and see what appears to be the digital objects ‘stored’ in that container/file. In reality, they are seeing a link to the object stored (randomly) in the file store.
What happens when an EDRMS record is destroyed?
If there is no requirement to extend their retention, or keep them on a legal hold, records may be destroyed at the conclusion of a retention period.
For physical records, this usually means destroying the physical objects so they cannot be recovered, a process that may include bulk shredding or pulping.
For digital records, however, there may be less certainty about the outcome of the destruction. While the EDRMS may flag the record as being ‘destroyed’ it is not completely clear if the destruction process has actually destroyed the records and overwritten the digital records in a way that ensures its destruction to the same level as destroyed paper files.
- If the original associated NTFS file share becomes full and a new one is used, the original is likely to be made read only.
- There is likely to be a backup of the EDRMS.
- The original records uploaded to the EDRMS probably continue to exist on network files shares, in email, or in back up tapes.
- Digital forensics can be used to recover ‘deleted’ files from the associated file share.
Consider this scenario:
- An email containing evidence of something is saved to a container in an EDRMS.
- The container of records is ‘destroyed’ after the retention period expires.
- A legal case arises after the container is ‘destroyed’
- A subpoena is made for all records, including those specific records.
- Has the record actually been destroyed, or could it still be recoverable, including from backups or the digital originals?
Is it really possible to destroy digital records, and does it matter?
Yes, records can be destroyed by overwriting the cluster where the record is kept, and some EDRM systems may offer this option.
- Do EDRM systems overwrite the cluster when a digital record is destroyed in line with your records retention and disposal authorities, or simply mark the record as being deleted, when it is still technically recoverable?
- Could the record still exist in the network file shares or email, or in backups of these or the EDRMS?
- Might it be possible to recover the record with digital forensics tools?
- Does it matter?
It might be worth asking IT and your EDRMS vendor.