Posted in Compliance, Exchange Online, In Place Records, Information Management, Records management, SharePoint Online

In place vs in place management of records in Microsoft 365

The ability to manage records ‘in place’ in SharePoint has existed since around 2013. But this is not the same thing as leaving records where they were created or captured and managing them there – ‘in place’.

This post explains the difference between the two ‘in place’ options. In brief:

  • The Microsoft ‘in place’ model is based on making the distinction between non-records content and content declared as records (as per DOD 5015.2), that may be stored in the same SharePoint site, or using Exchange in-place options.
  • The other ‘in place’ model is simply based on leaving records and other content where they were created or captured, and managing it there – including (where necessary) by applying the ‘in place’ options in the previous point.

The Microsoft in-place model

SharePoint

The Microsoft in-place model for managing records in SharePoint is based on the requirement to comply with the US Department of Defense (DOD) standard titled ‘Design Criteria Standard for Electronic Records Management Software Applications’, usually known by its authority number – DOD Directive 5015.2, Department of Defense Records Management Program, originally published in 11 April 1997.

Section C2.2.3 ‘Declaring and Filing Records’ of the standard defines 26 specific requirements for declaring and filing records, including the following points:

  • The capability to associate the attributes of one or more record folder(s) to a record, or for categories to be managed at the record level, and to provide the capability to associate a record category to a record
  • Mandatory record metadata.
  • Restrictions on who can create, edit, and delete record metadata components, and their associated selection lists.
  • Unique computer-generated record identifiers for each record, regardless of where that record is stored.
  • The capability to create, view, save, and print the complete record metadata, or user-specified portions thereof, in user-selectable order.
  • The ability to prevent subsequent changes to electronic records stored in its supported repositories and preserving the content of the record, once filed
  • Not permitting modification of certain metadata fields.
  • The capability to support multiple renditions of a record.
  • The capability to increment versions of records when filing.
    Linking the record metadata to the record so that it can be accessed for display, export.
  • Enforcement of data integrity, referential integrity, and relational integrity.

Microsoft’s initial guidance on configuring in place records management describes how to activate and apply this functionality primarily in SharePoint on-premise. It is still possible to apply this in SharePoint Online (but see below). The SharePoint in place model refers to a mixed content approach where both records and non-records can be managed in the same location (an EDMS with RM capability):

Managing records ‘in place’ also enables these records to be part of a collaborative workspace, living alongside other documents you are working on.

The same link above, however, also refers to newer capability that was introduced with the Microsoft 365 Records Management solution in the Compliance admin portal. This new capability allows organisations to use retention labels instead to declare content as records when the label is applied, which ‘effectively replaces the need to use the Records Center or in-place records management features.’

The guidance also noted that, ‘… moving forward, for the purpose of records management, we recommend using the Compliance Center solution instead of the Records Center.’

Exchange

A form of in-place management has also been available for Exchange on-premise mailboxes, with in place archiving based on using archive mailboxes – see the Microsoft guidance ‘In-Place Archiving for in Exchange Server‘.

One draw-back of this model is that the (email) records in these mailboxes were not covered by the same DOD 5015.2 rigor as those in SharePoint, but they could at least be isolated and protected against modification or deletion, for retention, control and compliance purposes.

Archive mailboxes, including ‘auto-expanding archives’, also exist in Exchange Online. In the Exchange Online archiving service description, it is noted that:

Microsoft Exchange Online Archiving is a Microsoft 365 cloud-based, enterprise-class archiving solution for organizations that have deployed Microsoft Exchange Server 2019, Microsoft Exchange Server 2016, Microsoft Exchange Server 2013, Microsoft Exchange Server 2010 (SP2 and later), or subscribe to certain Exchange Online or Microsoft365 plans. Exchange Online Archiving assists these organizations with their archiving, compliance, regulatory, and eDiscovery challenges while simplifying on-premises infrastructure, and thereby reducing costs and easing IT burdens.

The new ‘in place’ model

A newer form of in-place records management has appeared with Microsoft 365.

Essentially, the new model simply means leaving records where they were created or captured – in Exchange mailboxes, SharePoint sites, OneDrive or Teams (and so on). and applying additional controls where it is required.

The newer model of in place records management is based on the assumptions that:

  • It will never be possible to accurately or consistently identify and/or declare or manage every record that might exist across the Microsoft 365 ecosystem. For example, it is not possible to declare Teams chats or Yammer messages.
  • Only some and mostly high value or permanent records, will be subject to specific additional controls, including records declaration and label-based retention.
  • The authenticity, integrity and reliability of a some records may be based more on system information (event metadata) about its history, than by locking a point-in-time version.

Microsoft appear to support this dual in place model with their information governance (broader controls) and records management (specific controls, including declaration) approach to the management of content and records across Microsoft 365, as described in the Microsoft guidance ‘Information Governance in Microsoft 365‘, which includes the graphic below, modified to show the relationship between the two in place concepts.

The relationship between Microsoft’s ‘in place’ focussed records management, and managing everything (including records) in place.

In place co-existence

Can both in place models exist? Yes. There is nothing to prevent both in place models existing in the same environment, in which some records may need to be better managed or controlled than others, but it is important to understand the distinction between the two when it comes to applying controls.

Image: Quarantine Building, Portsea, Victoria Australia. Andrew Warland 2021

Posted in In Place Records, Records management, Retention and disposal

Setting retention labels on folders in SharePoint document libraries

A common question asked by many organisations is whether Microsoft 365 (M365) retention policies – labels in particular – can be applied to folders in SharePoint document libraries so the content in those folders will have the same label.

The quick answer is yes, but it is a manual process and – for all its perceived benefits – is likely to be more of an administrative and support burden and not worth the effort. Folders should NOT be thought of as the replacement for ‘files’ (aggregations of individual records), but more like dividers in a lever arch (= the document library).

This post describes how labels can be applied to and work with folders, including in SharePoint sites linked with Teams. It also suggests alternative options.

How retention labels are applied to a library

Retention labels are created in the M365 Compliance portal under either the Information Governance > Labels or Records Management > File Plan sections.

Where labels are created in the Information Governance section of the Compliance portal

Labels created in the Compliance portal do not do anything once created; they must be applied to content in various ways to make them work. This includes by:

  • Publishing one or more labels as part of a retention policy to various locations including SharePoint sites, Exchange mailboxes and OneDrive (but not Teams – see screenshot of available locations below). In this scenario, each label will be visible to – and selectable by – end-users.
  • Auto-applying them to the same locations based on various options, including (for E5) trainable classifiers
  • Adding them to Content Types used in SharePoint Syntex.
Locations where labels can be published

Publishing labels to SharePoint sites

When one or more labels are published to SharePoint sites they don’t do anything until they are ‘manually’ enabled through one of the following options:

  • On each individual document library via the library settings option ‘Apply labels to items in this list or library’ (see screenshot below)
  • On each individual folder in a library (via the information panel, see screenshot below)
  • On each individual object in the library (also via the information panel)

Applying a label to the library

A label can be applied to the entire document library via Library Settings, as shown in the screenshot below.

Note that the ‘None’ option is shorter if no labels have been published here

If the drop-down option is set to ‘None’, and there are no options to choose from, it means that no labels have been published to this SharePoint site.

If labels exist, they will appear in the drop down list (below the default ‘None’). Note that only one label can be set as the default for the library. If the check box ‘Apply label to existing items in the library’ is selected, this will apply the label to all existing items. It will also likely override any existing label that may have been applied.

When the retention label has been applied to a library, the label only applies to the non-folder objects stored in the library as can be seen in the option below. That is, the retention label is NOT applied to the folders by default.

Document library folders without retention labels

The retention label can be seen when the folder is opened:

Content inside a folder, with retention labels applied

Applying a retention label to a folder or document

It is also possible to apply a retention label to a folder or object stored in the folder via the information panel, even when a default library label has been set, as shown below. This can be done on each individual folder including, for Teams-based sites, each folder that maps to a channel.

When applied to a folder in this way, any content stored in the folder will inherit that retention label.

Documents stored inside the June folder have inherited the folder’s label

If a default label has already been applied at the library level, the folder-based label will replace it, although in testing this, one of the original default labels wasn’t replaced automatically as shown below, but could be manually changed via the information panel.

Implications for Teams-based records (Files)

Every Team in MS Teams has an associated SharePoint site linked with the underlying Microsoft 365 Group.

Every non-private channel in the Team maps to a folder in the Documents library of the SharePoint site as can be seen in the two screenshots below. (Every private channel has a separate SharePoint site that would be covered by a separate retention policy).

The Team’s channels
Four channel-linked folders in a Teams-based SharePoint site

Keep in mind that retention labels remove the ability to delete objects stored in the library (including via the ‘Files’ tab in a Team). If end-users are working in Teams, this could be annoying and potentially put them off using Teams. However, end-users can remove the label by navigating to the SharePoint site and removing the label via the Information panel.

Why folder-based retention labels may not be a good idea

The default options to apply retention labels to content stored in SharePoint document libraries are:

  • By applying them at the library level. This can apply the label to all existing (and future) content stored in the library but does not apply to folders.
  • Through the auto-application of labels.
  • Via SharePoint Syntex using labels on Content Types.

Applying retention labels to individual folders in a document library is a manual-intensive process, one that may be a waste of time given the potential number of libraries that can exist and the ease with which they can be removed by end users.

Additionally, applying retention labels to the channel-linked folders of Teams may be pointless if end users:

  • Store documents at the same level as the channel-linked folders; that is, ‘above’ the folder structure.
  • Create new folders via a synced library or SharePoint. These folders are not linked to channels.
  • Create new libraries in the SharePoint site.

Keep it simple

It is very easy to deliberately or inadvertently establish over-complicated retention settings for content stored in SharePoint, especially as there is currently no simple way to see what label has been applied where.

Given the retention period linked with retention policies generally, there is a good chance that the person who applied the labels may not be around when the retention period expires, or to keep an eye on what has been applied or changed over time.

The best retention intentions may be overruled by practical necessity.

The best retention model, in my opinion, is a simple one that does not get in the way of end-users but ensures that records will be kept for a minimum period required. So, instead of applying retention labels to folders, especially on Teams-based SharePoint site libraries, it is recommended to:

  • Start by trying to avoid mixing content with different retention periods in the same SharePoint site or Team, or document library. That will make it easier to manage the retention outcomes. (If you can’t avoid mixing content, you may need to use auto-application of labels including via Syntex or trainable classifiers).
  • Use ‘back-end’ safety net retention policies applied to all SharePoint sites. This ensures a minimum retention period and does not get in the way of end-user activities.
  • Use retention labels on site libraries where more granular retention is required. Ideally, apply them as the default to all the content in a single document library (including the default library for all Teams-based SharePoint sites) and – preferably – only apply the labels when the content is inactive and the library can be made read only, to protect the records from that point.
  • Only use multiple labels on folders when (a) all the labels applied to the site relate to the same function/activity pair or subject matter, and (b) the content is largely inactive. Ideally, avoid folder-based retention to avoid complication in the future.

Posted in Information Management, Records management, SharePoint Online

Managing digital photos as records in SharePoint Online

Photographs (or, for that matter, any visual work) can be more powerful as a record of something than a written record. Some of the iconic events in history are only known from the visual record.

Photos (and videos) are ubiquitous in the digital age. They are captured in multiple ways and stored on a range of media including computers, portable drives, mobile devices and the cloud (including online storage and social media).

Digital photos can be easily changed or ‘photoshopped’, to use the more common term.

But records management requires us to ensure the authenticity, reliability and integrity of records. How can we ensure this for photographic records?

This post examines how SharePoint can be used to manage (some) photos as records.

What is a digital photo?

For the purpose of this post, a digital photo is any image that is captured and stored electronically as bits in a recognised image format. It excludes digital photos (including scanned or digitised paper records) that are embedded in PDF files.

To quote from the Wikipedia article titled ‘Digital Image‘, a digital photo is:

‘… an image composed of picture elements, also known as pixels, each with finite, discrete quantities of numeric representation for its intensity or gray level that is an output from its two-dimensional functions fed as input by its spatial coordinates denoted with x, y on the x-axis and y-axis, respectively. Depending on whether the image resolution is fixed, it may be of vector or raster type. By itself, the term ‘digital image’ usually refers to raster images or bitmapped images (as opposed to vector images).’

According to the definition, a digital photo is a collection of bits set out in a map that a computer interprets to display it on a screen. For more detail on bitmaps, see this Microsoft page ‘Bitmaps‘.

When to use SharePoint to store digital photos

Almost any digital object that can be stored electronically can be captured in SharePoint. But, just because they can doesn’t mean they should.

It is important to keep in mind that SharePoint may be not be the most suitable option for storing digital photographs in general. The volume, size and intended usage of the photos are all likely to influence the decision to use SharePoint. Organisations may need to consider other ways to store and manage digital photos (and other digital media) such as dedicated digital asset management systems.

However, SharePoint may be a good option to store digital photos as records if those photos:

  • Need to be kept as a record of something.
  • Support or relate to other content in the same document library. For example, photos that relate to or support building plans (which themselves may be scanned images in PDF form) or construction.
  • Are about a specific subject. For example, photos of damage to a physical object.
  • Are relatively low in volume and file size.

Factors to consider before storing digital photos in SharePoint

The following points should be considered before storing any digital photo (or digital media) in SharePoint.

File names

Digital photos are usually stored on devices that capture them with meaningless, device-generated names such as ‘20210423_123192321’, ‘DSC_0330’, or ‘n594015825_1706121_4959’. These types of names provide no clue to the content when the image is saved to SharePoint, as shown below.

Ideally, the device-generated name should be replaced with a more meaningful name as shown below.

We can see from the unique Document ID that it is the same record. The version history also tells us that something was changed but the file size hasn’t changed.

Keep in mind that every change to the file name or any other metadata added to the library will create a new copy (‘version’) of the image. In the version history above, there are now four versions each 3 MB in size.

The Activity section of the information panel to the right (which also provides a preview of the image) also provides some key information to support the version history information:

Does changing the name potentially compromise a key element of metadata? Who should change the name, and when?

Organisations should consider establishing naming conventions for photographs to help with findability and context.

Format

Almost any type of digital object can be stored in SharePoint. SharePoint also supports the ability to view almost every type of contemporary digital photo format (as described in this Microsoft page) so format should not be a problem when it comes to viewing the file.

However, if the organisation plans to store digital photos in older or less common formats, it will need to ensure that these can be accessed for as long as required. This will generally mean ensuring that the appropriate software application is available to anyone who needs to access (open/view) the photos.

Alternatively, consideration should be given to saving the photos in different, more common formats.

Size/Resolution

The size of a digital photograph may affect its usability as a record. The two photos below are of the same scene but the first photo is very low resolution (= small size, 63 KB), while the second is a section of a much large photo (= large size, 3.7 MB). The second version is clearly a more accurate record.

Digital photos used as records should provide sufficient detail to be useful as records. Accordingly, in most instances, the original photo, not a re-sized version, should be saved.

Reliability as a record

As noted above, photos can be easily modified, sometimes making it difficult to know if it accurately reflects the image it purports to capture. This is also an issue for any form of visual record, including drawings.

Metadata created when the photo was taken and stored in the photo’s properties (including EXIF metadata) can provide evidence of the reliability of the photo as a record, even when the record was stored in SharePoint. The following are the key metadata properties that are usually created and stored with a digital photo:

  • Size.
  • (Date) Created. The date and time when the photograph was created as a photo.
  • Date taken. This is, for most digital photos, the same date and time as the ‘created’ date.
  • (Date) Modified. This should be very close to the original date and time created on the original photograph, but may be several seconds different. If the photograph has been modified (including simple photo adjustments or ‘photoshopped’), it will show a different date and time which might give a clue to its reliability as a record.
  • Image dimensions, width, height, horizontal and vertical resolution, bit depth
  • Camera details including F-stop, exposure time, ISO speed, flash mode

Storing digital photos in SharePoint Online

Note that SharePoint previously had the option to create a dedicated ‘Picture Library’. This is no longer necessary as any document library can be used to store any digital object, and SharePoint Online document libraries now have additional options to view the photos.

Any digital photo can now be saved to a standard SharePoint document library, including via the ‘Files’ tab in MS Teams.

Metadata

Unlike some EDRM systems, SharePoint does not ‘capture’ or extract the details of digital objects when they are saved to a document library. Instead, these remain with the original digital object.

If digital photos are to be saved to a SharePoint Online document library, consideration should be given to using existing or additional metadata columns to describe the image, for example, the ‘Image’ column (small version preview that is stored separately in the Site Assets library >’Lists’ folder > GUID-named folder).

The following screenshot shows a preview image of the main photo.

Viewing options

SharePoint includes three options to view the content in a document library list view: List, Compact List, or Tiles.

The following is an example of a single digital photo stored in a document library (same as the one in the other options earlier) set to the ‘Tiles’ view:

Reliability as a record

Once saved to SharePoint, the photo’s reliability as a record can be determined from version history.

Additional protections may include access/permission controls, retention and/or information security labels.

Summary

SharePoint can be used to store (some) digital photos as records, but it should not generally be used as a general storage location for digital photos. Other dedicated digital asset management systems may be more suitable for that purpose.

Before storing digital photos in SharePoint, organisations should establish procedural rules or principles including (re-) naming conventions, format and file size requirements.

SharePoint does not extract and store the metadata from digital objects, which means that digital photos should retain metadata that shows when they were created or modified, and other details about the photograph which will provide evidence to support the authenticity of the photo as a record. Some consideration should be given to adding additional metadata to help describe digital photos as records.

A combination of version history, access controls, information protection and retention labels should provide sufficient controls to ensure the reliability, integrity and authenticity of records – or at the very least provide the evidence of changes that may be made.

There are two sides to the question of authenticity, reliability and integrity:

  • Knowing if the photo that has been uploaded is a correct record of what it purports to be.
  • Preventing the photo from modification or deletion or tracking any modifications that may occur.

It might seem impossible to know if a photograph is what is purports to be, but its metadata payload may provide the detail required.

Possibly photoshopped photo

Feature Image: Source not known.

Posted in Classification, Governance, Information Management, Microsoft 365, Records management, Retention and disposal

Classifying records in Microsoft 365

There are three main options in Microsoft 365 to apply recordkeeping classification terms to (some) records:

  • Metadata columns added to SharePoint sites, including those added to Content Types and/or added directly to document libraries.
  • Taxonomy terms stored in the central Term Store, including those added as site columns and added to site content types and/or added directly to document libraries. The only difference with the first option is that with the Term Store the classification terms are stored and managed centrally and are therefore available to every SharePoint site.
  • Retention labels that: (a) ‘map’ to classification terms; (b) are linked with a File Plan that includes the classification terms; (c) are either the same as (a) or (b) and are used in with a Document Understanding Model in SharePoint Syntex; or (d) the same as (a) or (b) and used with conjunction with Trainable Classifiers.

The first two options can only be applied to content stored in SharePoint. Retention labels may be applied to emails and OneDrive content. None of the three options can be applied to Teams chats. Also note that there is no connection between the SharePoint Term Store and the File Plan, both of which can be used to store classification terms.

This post:

  • Defines the meaning of classification from a recordkeeping point of view.
  • Describes each of the above options and their limits.
  • Discusses the requirement to classify records and other options in Microsoft 365.

What is classification?

Humans are natural-born classifiers. We see it in the way we store cutlery or linen, or other household items or personal records.

Business records also need some form of classification. But what does that mean? The 2002 version of the records management standard ISO 15489, defines classification as:

‘the systematic identification and arrangement of business activities and/or records into categories according to logically structured conventions, methods and procedural rules represented in a classification system’. (ISO 15489.1 2017 clause 3.5).

The standard also states (4.2.1) that a classification scheme based on business activities, along with a records disposition authority and a security and access classification scheme, were the principal instruments used in records management operations.

The classification of records in business is important to establish their context and help finding them.

Microsoft 365 includes various options to apply classification terms to records.

Metadata columns in SharePoint

The simplest way to classify records stored in SharePoint document libraries is to either create site columns containing the classification terms and add those columns to document libraries, or create them directly in those libraries.

Benefits

Adding site or library columns is relatively simple. As classification terms are usually in the form of a (hierarchical) list, it is simple to add one choice or lookup column for function and another for activities.

A lookup column can bring across a value from another column when an item is selected; for example, if the look up list places ‘Accounting’ (Activity) in the same list row as ‘Financial Management’, selection of ‘Accounting’ will bring across ‘Financial Management’ as a separate (linked) column.

Default values (or even one value) can be set meaning that records added to a library (that only contains records with those classification terms) can be assigned the same classification terms each time without user intervention.

Negatives

SharePoint choice or lookup columns do not allow for hierarchical views or values to be displayed from the list view so the context for the classification terms may not be obvious unless both function and activity are listed.

The Term Store

The Term Store, also known as the Managed Metadata Service (MMS) has existed in SharePoint as a option to create and centrally manage classification and taxonomy terms in SharePoint only for at least a decade.

In 2020, access to the Term Store was re-located from its previous location (https://tenantname-admin.sharepoint.com/_layouts/15/TermStoreManager.aspx) to the SharePoint Online admin portal under the ‘Content Services’ section:

The location of the Term Store in the SharePoint admin center

Organisations can create multiple sets of taxonomies or ‘term groups’ (e.g, ‘BCS’ or ‘People’) within the Term Store. Each Term Group consists of the following:

  • Term Sets. These generally could map to a business function. Each Term Set has a name and description, and four tabs with the following information: (a) General: Owner, Stakeholders, Contact, Unique ID (GUID); (b) Usage settings: Submission policy, Available for tagging, Sort order; (c) Navigation: Use term set for site navigation or faceted navigation – both disabled by default; (d) Advanced: Translation options, custom properties.
  • Terms. These generally could map to an activity. Each Term has a name and three tabs: (a) General: Language, translation, synonyms and description; (b) Usage settings: Available for tagging, Member of (Term Set), Unique ID (GUID); (c) Advanced: Shared custom properties, Local custom properties.

In the example below, the Term Set (function) of ‘Community Relations’ has three Terms (activities).

Once they have been created in the Term Store, term set or terms can be added to a SharePoint site, either as a new site or local library/list column, as shown in the two screenshots below:

First, create a new column and select Managed Metadata

Then scroll down to Term Set Settings and choose the term set to be used.

Once added as a site column, the new column may be added to a Content Type that is added to a library, or directly on the library or list.

A Term Store-based column added to a library via a Content Type.

Benefits

The primary benefit using the central Term Store terms via a Managed Metadata column is that the Term Store is the ‘master’ classification scheme providing consistency in classification terms for all SharePoint sites.

As we will see below, Term Store terms may be used to help with the application of retention labels (which themselves may ‘map’ to classification terms in a function/activity-based retention schedule).

Negatives

Using metadata terms from the Term Store is almost identical with using a choice or lookup column. The only real difference is that the Term Store provides a ‘master’ and consistent list of classification (and other) terms.

Term store classification terms, including in Content Types, may only be used on a minority of SharePoint sites.

Additionally:

  • It is not possible to select a Term Set (e.g., the function level), only a Term within a Term Set.
  • Only the selected classification Term appears in the library metadata, without the parent Term Set or visual hierarchy reference to that Term Set – see screenshot below. Technically only that Term is searchable. It is not possible to view a global listing of all records classified according to function and activity.
  • If multiple choices are allowed, a record may be classified according to more than one Term. This may cause issues with grouping, sorting or filtering the content of a library in views.
How the Term appears in the library column.

As we will see below, there is no connection between the classification Terms in Term Sets and the categorisation options available when creating new retention labels via a File Plan. ‘Business Function’ or ‘Category’ choices in the File Plan do not connect with the Term Store.

Term Store terms and Content Types can only be used to classify content stored in SharePoint.

Retention labels

Retention labels in Microsoft 365 can be used in an indirect way to classify records in SharePoint, email and OneDrive because they can be ‘mapped’ to classification elements.

For example, a label may be based on the following elements:

  • Function: Financial Management
  • Activity: Accounting
  • Description: Accounting records
  • Retention: 7 years

Every retention label contains the following options:

  • Name. The name can provide simple details of the classification, for example: ‘Financial Management Accounting – 7 years’.
  • Description for users. This can be the full wording of the retention class.
  • Description for admins. This can contain details of how to apply or interpret the class, if required.
  • Retention settings (e.g., 7 years after date created/modified or label applied).

Benefits

Where the classification terms map to a retention class, the process of applying a retention label to an individual record, email or OneDrive content could potentially be seen as classifying those records against the classification scheme.

The Data Classification section in the Microsoft 365 Compliance portal provides an overview of the volume of records in SharePoint, OneDrive or Exchange that have a specific retention class:

Negatives

Not every record in every SharePoint document library may be subject to a retention label. Many records (for example in Teams-based SharePoint sites) may be subject to a ‘back end’ retention policy applied to the entire site (which creates a Preservation Hold library).

A retention label applied to a record doesn’t actually add any classification terms to the record.

Retention labels don’t map in any way to Term Store classification terms, except in SharePoint Syntex – see below (but this only applies to SharePoint content).

Retention labels/File Plan combination

The File Plan option (Records Management > File Plan, requires E5 licences) can also be used to add classification terms to a retention label as shown in the screenshot below. Note that there is no link with the Term Store.

Benefits

Records (including emails) that have been assigned a retention label could, in theory, be regarded as having been classified in this way because the label contains (or references) the classification terms.

Negatives

When applied to content in SharePoint, OneDrive or Exchange, retention labels linked with the File Plan do not show the File Plan classification terms. It may be possible to write a script that displays all records with the terms from the File Plan, but it may be easier to do this using the Data Classification option described above.

Retention labels/SharePoint Syntex combination

SharePoint Syntex provides a way to apply retention labels to records, stored in SharePoint, that have been identified through the Document Understanding Model process.

Benefits

As can be seen in the screenshot above, each new DU model allows similar types of records (in the example above, ‘Statements of Work’) to be associated with a new or existing Content Type that can include a Term Store Term – for SharePoint records only – and a retention label. This provides three types of ‘classification’:

  • Grouping by record type (e.g., Statement of Work, Invoice)
  • Linking (of sorts) between the records ‘classified’ in this way and a Term Store term added as a metadata column to the Content Type.
  • Assigning of a retention label. This provides the same form of retention label-based classification described above.

Furthermore, if the Extraction option is also used, data extracted to SharePoint columns can be based on choices listed in the Term Store metadata.

Negatives

SharePoint Syntex only works for records – and only those records that have some form of consistency – stored in SharePoint.

Retention labels/trainable classifiers combination

Trainable classifiers are another way that could be used to identify related records and apply a retention label to those records. Microsoft 365 includes six ‘out of the box’ trainable classifiers that will not be of much value to records managers for the classification of records:

  • Source code
  • Harassment
  • Profanity
  • Threat
  • Resume
  • Offensive language (to be deprecated)

The creation of new trainable classifiers requires an E5 licence; they are created through the Data Classification area of the Microsoft 365 Compliance admin portal. Machine Learning is used to identify related records to create the trainable classifiers.

Once created, a retention label may be auto-applied to content stored in SharePoint or Exchange mailboxes using the classifier.

The option to auto-apply a label based on a trainable classifier.

Benefits

The primary outcome (from a recordkeeping classification point of view) of using trainable classifiers is the application of a retention label to content stored in SharePoint and Exchange mailboxes. It can also be used to apply a sensitivity label to that content.

Negatives

It is unlikely that every record will be classified according to every classification option.

Trainable classifiers only work with SharePoint and Exchange mailboxes.

Classifying records per workload

The options are summarised below for each main workload:

  • SharePoint: Use local site or library columns, Term Store terms or retention labels (mapped to a File Plan as necessary), applied manually or automatically, including via SharePoint Syntex or trainable classifiers.
  • Exchange mailboxes: The only feasible option to classify these records is to manually or auto-apply retention labels that are mapped to a classification, including a trainable classifer.
  • OneDrive: Manually or auto-apply retention labels mapped to a classification.
  • Teams. It is not possible to classify Teams chats with the options available.

Is classification necessary?

The classification model described in ISO 15489 and other standards was based on the idea that records would be stored in a central recordkeeping system where they would be subject to and tagged by the terms contained a classification scheme, often applied at the aggregation level (e.g., a file).

Microsoft 365 is not a recordkeeping system but a collection of multiple applications that may create or capture records, primarily in Exchange mailboxes, SharePoint, OneDrive and MS Teams (and also Yammer).

There is no central option to classify records in the recordkeeping sense. The closest options are:

  • The grouping of records in SharePoint sites (and Teams, each of which has a SharePoint site) and libraries that map to business functions and activities.
  • The use of metadata, either terms set in the central Term Store or created in local sites/libraries, to ‘classify’ individual records (including emails) stored in SharePoint document libraries. Each item in the library might have a default classification, or could be classified differently.
  • The use of retention labels that ‘map’ to function/activity pairs in a records disposal authority/schedule. These may be applied, manually or automatically, to content stored in SharePoint, OneDrive and Exchange mailboxes.

Neither of the above may apply, or be applied consistently, to all SharePoint sites, Exchange mailboxes, OneDrive accounts. And neither can be applied to Teams chats.

A different approach to this problem is required, one that likely will likely involve greater use of Artificial Intelligence (AI) and Machine Learning (ML) methods to identify and enable the grouping of records, and provide visualisations of the records so-classified.

Image: Werribee Mansion, Victoria, Australia stairwell (Andrew Warland photo)

Posted in EDRMS, In Place Records, Products and applications, Records management, SharePoint Online, Solutions

SharePoint is not an EDRMS

In my February 2021 post A brief history of electronic document and records management systems and related standards, I quoted from a presentation by Philip Bantin in 2001 that summarised the difference between the two systems.

  • An electronic document management system (EDMS) supported day-to-day use of documents for ongoing business. Among other things, this meant that the records stored in the system could continue to be modified and exist in several versions. Records could also be deleted.
  • An electronic records management system (ERMS) was designed to provide a secure repository for authentic and reliable business records. Although it contained the same or similar document management functionality as an EDMS, a key difference was that records stored in an ERMS could not be modified or deleted.

It could be argued that SharePoint began its life in 2001 as an EDMS and began to include ERMS functionality when SharePoint 2010 was released. So, how is it not an EDRMS?

In simple terms, unlike a traditional EDRMS model that was intended to be the repository for all business records, SharePoint is only one of several repositories that are used to store and manage business records. Instead of centralising the storage and management of records, systems such as Microsoft 365 provide centralised management of records wherever they are stored.

The EDRMS model

Almost all EDRM systems since the mid-1990s have had the following characteristics:

  • Compliance with a standard. For example, DOD 5015.2 (1997), AS 4390 (1996)/ISO 15489 (2001), ISO 16175 (2010), VERS (1999), MoReq (2001)/MoReq2, etc.
  • A relational database (for the various functions, metadata and controls (see diagram below) and a separate file share (for the actual content/records), accessed via a user interface.
  • An expectation that most or all (digital) records would be copied from other systems used to create or capture them to the EDRMS for ongoing storage and protection. This outcome might be achieved through integration with other systems, for example to copy emails to the EDRMS.

The integrated EDM/ERM system model was described in the following diagram in 2008 by the International Standards Organisation committee, ISO/TC171/SC2, in its document ‘Document management applications’:

There are two key problems with the EDRMS model:

  • They do not (and cannot) realistically capture, store or classify all business records. Emails have remained a problem in this regard for at least two decades. Additionally, there is now a much wider range of born-digital records that remain uncaptured.
  • Many of the born-digital records that were copied to the EDRMS continue to exist where they were first created or captured.

Any organisation that has been subject to a legal subpoena or Freedom of Information (FOI) request for records will know the limited extent to which EDRM systems provide evidence of all business activities or decision making.

And yet, almost every organisation has a requirement to manage records. There is clearly a disconnect between the various requirements and standards to keep records and the ability to keep them.

But SharePoint (on its own) is not the answer.

Why isn’t SharePoint an EDRMS?

SharePoint was never promoted or marketed as an EDRM system. It is not even a relational database (see this 2019 post by Sergey Golubenko ‘Why SharePoint is no good as a database‘ for details).

To paraphrase from Tony Redmond’s recent (21 March 2021) post ‘SharePoint Hits 20: Memories and Reflections‘, SharePoint was originally designed to allow content (including page-based content in the form of intranets) to be accessed via the web, thereby replacing network file shares.

In practice, most organisations retained their network file shares, built their intranet using SharePoint, and otherwise used SharePoint to manage only some digital content.

It could be argued SharePoint started out as an EDM system and became more like an ERMS when more recordkeeping capability was introduced in SharePoint 2011. But that doesn’t make it an EDRMS in the traditional sense of being a central repository of business records.

The reality is that records are, have been, and will continue to be created or captured in many places:

  • Email systems. In Microsoft 365, Exchange Online mailboxes are also used to store the ‘compliance copy’ of Teams chats messages.
  • Network file shares.
  • SharePoint, including OneDrive.
  • Other online document management systems, including Google Drive.
  • Text, chat and instant messages often created in third-party systems, often completely inaccessible or encrypted.

The type and format of a record can vary considerably. For example:

  • An emoji.
  • A calendar entry.
  • A photograph or video recording (including CCTV recording).
  • The recording of a meeting, in video or transcript form, or both.
  • Virtual reality simulations.
  • Social media posts.
  • 3D and digital drawings (e.g., via digital whiteboards).

And, of course, all the data in line of business systems.

Instead of trying to save records into a single EDRM system, Microsoft 365 provides the ability to apply controls over the management of most records where they were created or captured – in email (including archived social media and other records), Teams chat, SharePoint, or OneDrive, and Yammer.

Is there a need for an EDRMS?

There is nothing stopping organisations acquiring and implementing traditional EDRM systems, or even setting up some SharePoint sites, to manage certain high value or permanent records, including some records that can be copied from other create/capture systems (such as email).

But there is also a need to address the management of all other records that remain in the systems where they were created or captured.

In most modern organisations, this requires broader controls such as applying minimum retention periods at the backend, monitoring usage and activity, and the proactive management of disposals. Not trying to copy them all to SharePoint.

Featured image: Lorenz Stoer, late 1500s.

Posted in Conservation and preservation, Digital preservation, Electronic records, Records management, Retention and disposal

The challenge of identifying born-digital records

A recent ‘functional and efficiency’ review into the National Archives of Australia (also known as the ‘Tune Review’, published on 30 January 2021) noted the ‘rapid and ever-evolving challenges of the digital world’.

It stated that ‘the definition of a ‘record’ needs to reflect current international standards, be more directly applied to digital technologies, and more clearly provide for direct capture of records that are susceptible to deletion, such as emails, texts or online messages’.

The review also highlighted the difficulties associated with ingesting digital records ‘via manual intensive activities (due to lack of interoperable systems)’ and proposed a new model based on the ‘continuous automated appraisal of [Agency] digital records that would require a combination of artificial intelligence and skilled archivists’.

The review underlined the challenges of identifying and managing born-digital records, and the need for better solutions.

This post explores the challenges of accurately and identifying born-digital records in order to manage them.

Identifying and protecting records

Records usually provide evidence of something that happened – an action, an activity or process, a decision, or a current state (including a photograph or video record). They may have or be associated with descriptive metadata used to provide context to the records and guide or determine retention.

Like all other types of evidence, the authenticity, integrity and reliability should be protected for as long as they must be kept.

In the paper world, this outcome was achieved by storing physical records (including the printed version of born-digital records) on paper files or in physical storage spaces.

For the past twenty years or so, this outcome was achieved for (some) digital records by (mostly manually) copying them from a network drive or email system (or via a connector) to a dedicated electronic records management (ERM) system and then ‘locking’ them in that system to prevent unauthorised change or deletion. Most ERM systems consisted of a database for the metadata and an associated network drive file store for the objects.

The main problem with this centralised storage model – however good it might be at protecting copies of records stored in it – was that the original versions, along with all the other records that were not identified or could not be copied to the ERMS, remained where they were created or captured.

And the records stored ‘in’ the ERMS were actually stored on a network file share on a server that was (a) accessible to IT, and (b) almost always backed up. So, yet more copies existed.

The challenge of born-digital records

There are several key challenges with born-digital records:

  • Consistently and accurately identifying (or ‘declaring’) all records in all formats created or captured in all locations. For too long, the focus has primarily been on emails and anything that can be saved to a network drive with the onus of identifying a record on end-users.
  • Ensuring their authenticity, reliability and integrity over time. For records stored in the ERMS, this has usually involved locking them from edit, including through the ‘declaration’ process, or preventing deletion. But in almost all cases, the original version (in email, on the network drives), could continue to be modified. Other records that were not identified or stored in an ERMS may be deleted.
  • Ensuring that born-digital records will remain accessible for as long as they are required.

It is not possible to consistently and accurately manually (or even automatically) identify every born-digital record that an organisation creates or captures to ensure their authenticity, reliability, integrity or accessibility over time. Only a small percentage of born-digital records are copied to an ERMS.

Records remain hidden in personal mailboxes, personal drives and third-party (often unauthorised) systems. Records may exist in multiple forms and formats, sometimes created or stored in ‘private’ systems or on social media platforms. They may take the form of text or instant messages or social networking posts and threads. They may be drawings, images, voice or video recordings.

Even if a record is identified, it is not always possible to save it to an ERMS. Text or instant messages on mobile devices are a case in point that has been a problem for at least two decades. More recent examples include chat messages, reactions (emojis, comments), and recordings of online meetings.

And even if a high percentage of born-digital records could be stored in the ERMS, the original versions will almost always remain where they were created or captured.

A different approach is needed.

Triaging records?

One approach to the problem would be to accept that not all records have equal value. That is, not all records need to be managed the same way.

To some degree, this way of thinking is already reflected in classes in the structure of records retention schedules and the attention paid to each:

  • Records that have permanent or archival value and need to be transferred to archival institutions.
  • Specific types of records that must be created or kept by the organisation for a minimum periods (sometimes quite long but not ‘forever’), for legal, compliance or auditing purposes.
  • Records that are not subject to legal or compliance requirements but which the organisation decides to keep for a minimum period of time.
  • Everything else.

Triaging records means that they can be managed as required at each level, but nothing is missed. It requires a risk management approach.

For records of permanent value, or are subject to legal or compliance requirements, it means that ensuring that these records receive the most attention and every effort it made to ensure that they are and can be identified (declared) and managed accordingly. This would include ensuring that it is possible to identify and capture these records in the systems used to create or capture them, for example, key emails.

A similar approach would be taken to records that need to be kept for legal, compliance or auditing purposes but with an understanding that some of these records (e.g., emails) may remain in the original system where they were created or captured. Technological solutions may be used to identify or tag these records. The destruction of these records should be subject to some form of review and a record kept of the approval and what was destroyed.

For all other records would remain stored wherever they were created or captured and subject to minimum retention periods after which they can be destroyed without review – but a record kept of the basic metadata of each record (including original storage location).

Protecting – or proving – the authenticity, integrity and reliability of records

The assumption behind the protection of records is that they should not be changed or deleted.

The reality, with digital records, is that they may change at any time through new threads, new revisions, new chats, or even through photoshopping.

A more realistic approach may be to use information about what was changed, by whom, and when – not to protect the record but to provide an evidentiary trail to prove what it is or was. The ‘smoking gun’ evidence for most born-digital records is the metadata that is recorded when it was captured or modified, not (necessarily) the added descriptive metadata.

For example:

  • Someone may author a document (metadata records each revision, and each revision can be viewed).
  • The document may be approved electronically (recorded in metadata).
  • Someone then modifies the approved version.
  • All of the above is recorded in the ‘modified’, ‘modified by’ and approval metadata.
  • The record should (or may) also recorded who viewed the record, and when.

EXIF metadata stored on images provides a similar form of evidence (and may even include GPS information).

Which record is more likely to be accepted as evidence:

  • A record stored in an EDRMS, versions or revisions of which may exist in multiple other places, including on network file shares, email system and even backup tapes
  • A record stored in a system that shows the full set of metadata about access and changes, or the most recent thread of an email discussion?

Conclusions

At the end of the day, it should be possible to confirm the authenticity, reliability and integrity of records based on information/metadata that forms part of the born-digital record: who and when it was created, the context in which it was created and its relationship with other records.

Perhaps, instead of focussing on trying to identify and capture all born-digital objects that might be records and ‘protecting’ a version of that record, it may be more practical and easier to leave most records where they were created or captured (and retained by retention policies) and use change or revision metadata to provide evidence of authenticity.

This may, in the end, be a much easier way to protect the authenticity of records than having to rely on manual identification or declaration.

Posted in Artificial Intelligence, Classification, Electronic records, Information Management, Microsoft 365, Records management, Retention and disposal

Can Microsoft technology classify records better than a human?

In late 2012, IDM magazine published an article I co-authored with Umi Asma Mokhtar in Malaysia titled ‘Can technology classify records better than a human?’

The article drew on research into recent advances in technology to assist in legal discovery, known as ‘computer-assisted coding’, or ‘predictive coding’, including the following two articles:

Grossman and Cormack’s article noted that ‘a technology-assisted review process involves the interplay of humans and computers to identify the documents in a collection that are responsive to a production request, or to identify those documents that should be withheld on the basis of privilege‘. By contrast, an ‘exhaustive manual review’ required ‘one or more humans to examine each and every document in the collection, and to code them as response (or privileged) or not‘.

The article noted, somewhat gently, that ‘relevant literature suggests that manual review is far from perfect’.

Peck’s article contained similar conclusions. He also noted how computer-based coding was based on a initial ‘seed set’ of documents identified by a human; the computer then identified the properties of those documents and used that to code other similar documents. ‘As the senior reviewer continues to code more sample documents, the computer predicts the reviewer’s coding‘ (hence predictive coding).

By 2011, this new technology was challenging old methods of manual review and classification. Despite some scepticism and slow uptake (for example, see this 2015 IDM article ‘Predictive Coding – What happened to the next big thing?‘), by 2021, it had become an accepted option to support discovery, sometimes involving offshore processing for high volumes of content.

Meanwhile, in an almost unnoticed part of the technology woods, Microsoft acquired Equivio in January 2015. In its press release ‘Microsoft acquires Equivio, provider of machine learning-powered compliance solutions‘, Microsoft stated that the product:

‘… applies machine learning … enabling users to explore large, unstructured sets of data and quickly find what is relevant. It uses advanced text analytics to perform multi-dimensional analyses of data collections, intelligently sorting documents into themes, grouping near-duplicates, isolating unique data, and helping users quickly identify the documents they need. As part of this process, users train the system to identify documents relevant to a particular subject, such as a legal case or investigation. This iterative process is more accurate and cost effective than keyword searches and manual review of vast quantities of documents.’ 

It added that the product would be deployed in Office 365.

Classifying records

The concept of classification for records was defined in paragraph 7.3 of part 1 of the Australian Standard (AS) 4390, released in 1996. The standard defined classification as:

‘… the process of devising and applying schemes based on the business activities generating records, whereby they are categorised in systematic and consistent ways to facilitate their capture, retrieval, maintenance and disposal. Classification includes the determination of naming conventions, user permissions and security restrictions on records’.

The definition provided a number of examples of how the classification of business activities could act as a ‘powerful tool to assist in many of the processes involved in the management of records, resulting from those activities’. This included ‘determining appropriate retention periods for records’.

The only problem with the concept was the assumption that all records could be classified in this way, in a singular recordkeeping system. Unless they were copied to that system, emails largely escaped classification.

Fast forward to 2020

Managing all digital records according to recordkeeping standards has always been a problem. Electronic records management (ERM) systems managed the records that were copied into them, but a much higher percentage remained outside its control – in email systems, network files shares and, increasingly over the past 10 years, created and captured on host of alternative systems including third-party and social media platforms.

By the end of 2019, Microsoft had built a comprehensive single ecosystem to create, capture and manage digital content, including most of the records that would have been previously consigned to an ERMS. And then COVID appeared and working from home become common. All of a sudden (almost), it had to be possible to work online. Online meeting and collaboration systems such as Microsoft Teams took off, usually in parallel with email. Anything that required a VPN to access became a problem.

2021 – Automated classification for records (maybe)

The Microsoft 365 ecosystem generated a huge volume of new content scattered across four main workloads – Exchange/Outlook, SharePoint, OneDrive and Teams. A few other systems such as Yammer also added to the mix.

Most of this information was not subject to any form of classification in the recordkeeping sense. The Microsoft 365 platform included the ability to apply retention policies to content but there was a disconnect between classification and retention.

Microsoft announced Project Cortex at Ignite in 2019. According to the announcement, Project Cortex:

  • Uses advanced AI to deliver insights and expertise in the apps that are used every day, to harness collective knowledge and to empower people and teams to learn, upskill and innovate faster.
  • Uses AI to reason over content across teams and systems, recognizing content types, extracting important information, and automatically organizing content into shared topics like projects, products, processes and customers.
  • Creates a knowledge network based on relationships among topics, content, and people.

Project Cortex drew on technological capabilities present in Azure’s Cognitive Services and the Microsoft Graph. It is not known to what extent the Equivio product, acquired in 2015, was integrated with these solutions but, from all the available details, it appears the technology is at least connected in one way or another.

During Ignite 2020, Microsoft announced SharePoint Syntex and trainable classifiers, either of which could be deployed to classify information and apply retention rules.

Trainable classifiers

Trainable classifiers were made generally available (GA) in January 2021.

Trainable classifiers sound very similar to the predictive coding capability that appeared from 2011. However, they:

  • Use the power of Machine Learning (ML) to identify categories of information. This is achieved by creating an initial ‘seed’ of data in a SharePoint library, creating a new trainable classifier and pointing it at the seed, then reviewing the outcomes. More content is added to ensure accuracy.
  • Can be used to identify similar content in Exchange mailboxes, SharePoint sites, OneDrive for Business accounts, and Microsoft 365 Groups and apply a pre-defined retention label to that content.

In theory, this means it might be possible to identify a set of similar records – for example, financial documents – and apply the same retention label to them. The Content Explorer in the Compliance admin portal will list the records that are subject to that label.

SharePoint Syntex

SharePoint Syntex was announced at Ignite in September 2020 and made generally available in early 2021.

The original version of Syntex (as part of Project Cortex) was targeted at the ability to extract metadata from forms, a capability that has existed with various other scanning/OCR products for at least a decade. The capability that was released in early 2021 included the base metadata extraction capability as well as a broader capability to classify content and apply a retention label.

The two Syntex capabilities, described in a YouTube video from Microsoft titled ‘Step-by-Step: How to Build a Document Understanding Model using Project Cortex‘, are:

  • Classification. This capability involves the following steps: (a) Creation of (SharePoint site) Content Center; (b) Creation of a Document Understanding Model (DUM) for each ‘type’ of record; the DUM can create a new content type or point to an existing one; the DUM can also link with the retention label to be applied; (c) Creation of an initial seed of records (positives and a couple of negatives); (d) Creation of Explanations that help the model find records by phrase, proximity, or pattern (matching, e.g., dates); (e) Training; (f) Applying the model to SharePoint sites or libraries. The outcome of the classification is that matching records in the location where it is pointed are assigned to the Content Type (replacing any previous one) and tagged with a retention label (also replacing any previous one).
  • Extraction. This capability has similar steps to the classification option except that the Explanations identify what metadata is to be extracted from where (again based on phrase, proximity or pattern) to what metadata column. The outcome of extraction is that the matching records include the extracted metadata in the library columns (in addition to the Content Type and retention label).

As with trainable classifiers, Syntex uses Machine Learning to classify records, but Syntex also has the ability to extract metadata. Syntex can only classify or extract data from SharePoint libraries.

Trainable classifiers or Syntex?

Both options require the organisation to create an initial seed of content and to use Machine Learning to develop an understanding of the content, in order to classify it.

The models are similar, the primary difference is that trainable classifiers can work on content stored in email, SharePoint and OneDrive, whereas Syntex is currently restricted to SharePoint.

Predictive coding

On 18 March 2021, Microsoft announced the pending (April 2021) preview release of an enhanced predictive coding module for advanced eDiscovery in Microsoft 365.

The announcement, pointing to this roadmap item, noted that eDiscovery managers would be able to create and train relevance models within Advanced eDiscovery using as few as 50 documents, to prioritize review.

So, can Microsoft technology classify records better than humans?

In their 1999 book ‘Sorting Things Out: Classification and its Consequences‘ (MIT Press), Geoffrey Bowker and Susan Leigh Star noted that ‘to classify is human’ and that classification was ‘the sleeping beauty of information science’ and ‘the scaffolding of information infrastructures’.

But they also noted how ‘each standard and category valorizes some point or view and silences another. Standards and classifications (can) produce advantage or suffering’ (quote from review in link above).

Technology-based classification in theory is impartial. It categorises what it finds through machine learning and algorithms. But, technology-based classification requires human review of the initial and subsequent seeds. Accordingly such classification has the potential to be skewed according to the way the reviewer’s bias or predilections, the selection of one set of preferred or ‘matching’ records over another.

Ultimately, a ‘match’ is based on a scoring ‘relevancy’ algorithm. Perhaps the technology can classify better than humans, but whether the classification is accurate may depend on the human to make accurate, consistent and impartial decisions.

Either way, the manual classification of records is likely to go the same way as the manual review of legal documents for discovery.

Image source: Providence Public Library Flickr