Archive for the ‘Classification’ Category

Office 365 – new data governance and records retention management features

October 7, 2017

At the September 2017 Ignite conference in Orlando, Florida, Microsoft announced a range of new features coming soon to data governance in Office 365.

These new features build on the options already available in the Security and Compliance section of the Office 365 Admin portal. You can watch the video of the slide presentation here.

Both information technology and records management professionals working in organisations that have Office 365 need to work together to understand these new features and how they will be implemented.

Some of the key catch-phrases to come out of the presentation included ‘keep information in place’, ‘don’t horde everything’, ‘no more moving everything to one bucket’, ‘three-zone policy’, and ‘defensible deletion process’. The last one is probably the most important.

How do you manage the retention of digital content?

If your organisation is like most others, you will have no effective records retention policy or process for emails or content stored across network file shares and in ‘personal’ drives.

If you have an old-style EDRM system you may have acquired a third-party product and/or tried to encourage users (with some success, perhaps) to store emails in that system, in ‘containers’ set up by records managers.

The problem with most of these traditional methods is that it assumes there should be one place to store records relating to a given subject. In reality, attempts to get all related records in the one place conjures up the ‘herding cats’ problem. It’s not easy.

What is Microsoft’s take on this?

For many years now, Microsoft have adopted an alternative approach, one that is not dissimilar to the view taken by eDiscovery vendors such as Recommind. Instead of trying to force users to put records in a single location, it makes more sense to use powerful search and tagging tools to find and manage the retention of records wherever they are stored.

Office 365 already comes with powerful eDiscovery capability, allowing the organisation to search for and put on hold records relating to a given subject, or ‘case’. But it also now has very powerful records retention tools that are about to get even better.

This post extends my previous posting ‘Applying New Retention Policies to Office 365 Content‘, and won’t repeat all of it as a result.

Where do you start?

A standard starting point for the management of the retention and disposal of records is a records retention schedule. These are also known in the Australian recordkeeping context as disposal authorities, general disposal authorities, and records authorities. They may be very granular and contain hundreds of classes, or ‘big bucket’ (for example, Australian Federal government RAs).

Records retention schedules usually describe types of records (sometimes grouped by ‘function’ and ‘activity’, or by business area) and how long they must be retained before they can be disposed of, unless they must be kept for a very long time as archival records.

The classes contained in records retention schedules or similar documents become retention policies in Office 365.

Records retention in Office 365

It is really important to understand that records retention management in Office 365 covers the entire environment – Exchange (EXO), SharePoint (SPO), OneDrive for Business (OD), Office 365 Groups (O365G), Skype for Business. Coverage for Microsoft Teams and OneNote is coming soon. Yammer will not be included until at least the second half of 2018.

That is, records retention is not just about documents stored in SharePoint. It’s everything except as noted.

Records managers working in organisations that have implemented (or are implementing) Office 365 need to be on top of this, to understand this way of approaching and managing the records retention process.

Retention policies in Office 365 are set up in the Security and Compliance Admin Centre, a part of the Office 365 Admin portal. Ideally, records managers should be allocated a role to allow them to access this area.

There are two retention policy subsections:

  • Data Governance > Retention > Policy
  • Classification > Labels > Policy

The settings in both are almost identical but have slightly different settings and purposes. However, note all retention policies that are set up are visible in both locations.

The difference between the two options is that:

  • Retention-based policies are (according to Microsoft) meant for IT to be used more for ‘global’ policies. For example, a global policy for the retention of emails not subject to any other retention policy.
  • Label-based policies map to the individual classes in a retention schedule or disposal authority.

Note: Organisations that have many hundreds or even thousands of records retention classes will need to create them using Powershell.

Creating a retention-based policy

Retention-based policies have the following options:


Directly underneath this are two options:

  • Find specific types of records based on keyword searches [COMING > also label-based]
  • Find Data Loss Prevention (DLP) sensitive information types. [COMING > label-based DLP-related polices can be auto-applied]

A decision must then be made as to where this policy will be applied – see below.

Creating a label-based policy

To create a classification label manually, click on ‘Create a label’.



  • Labels are not available until they are published.
  • Labels can be auto-applied

The screenshot below shows the options for creating a new label.


Label- based policies have the following settings:

  • Retain the content for n days/months/years
  • Based on Created or Last Modified [COMING > when labelled, an event*]
  • Then three options: (a) delete it after n days/months/years (b) subject it to a disposition review process (labels only), or (c) don’t delete.

* Such as when certain actions take place on the system.


Applying the policies

Once a policy has been created it can then be applied to the entire Office 365 environment or to only specific elements, for example EXO, SPO, OD, O365G.

  • IT may want to establish a specific global policy
  • Most other policies will be based on the organisation’s records retention schedule

Once they have been published, labels may then be applied automatically or users can have the option to apply them manually.

In EXO, a user may create a folder and apply the policy there. All emails dragged into that folder will be subject to the same policy.

In SPO, retention policies may be applied to a document library and can be applied automatically as the default setting to all new documents. [COMING > also to a folder and a document set]. Adding a label-based policy to a library also creates a new column so the user can easily see what policy the documents are subject to.

Note: Individual documents stored in the library will be subject to disposal, not the library. 

What about Content Types?

Organisations that have used content types to manage groups of records including for retention management will be able to continue to do so, but Microsoft appears to take the view (in the presentation above) that this method should probably replaced by labelling. This points needs further consideration as content types are usually used as a way to apply metadata to records.

Note: If the ability to delete content (emails, documents) is enabled, any deleted content subject to a retention policy will be retained in a hidden location. The option also exists when a label-based policy is created to ‘declare’ records based on the application of a label. 

What happens when records are due for disposal?

Once the records reach the end of their retention period, they will be:

  • Deleted
  • Subject to a new disposition review process [COMING in 2017 – see below]
  • Remain in place (i.e., nothing happens)

In relation to the second option above, a new ‘Disposition’ section under Data Governance will allow the records manager or other authorised person to review records (tagged for Disposition Review) that have become due for disposal.

This is an important point – only records that had a label with the option ‘Disposition Review’ checked will be subject to review. All other records will be destroyed. Therefore, if the organisation needs to keep a record of what was destroyed, then the classification label must have ‘Disposition Review’ selected.

Records that are reviewed and approved to be destroyed are marked as ‘Completed’. This means there is a record of everything (subject to disposition review) that has been destroyed, a key requirement for records managers.

Other new or coming features

A number of other new features demonstrated at the Ignite conference, are coming.

  • Labels will have a new ‘Advanced’ check box. This option will allow records marked with that label to have any of the following: watermark, header/footer, subject line suffix, colour.
  • Data Governance > Records Management Dashboard. The dashboard will provide an overview of all disposition activity.
  • Data Governance > Access Governance. This dashboard, which supports data leakage controls, will show any items that (a) appear to contain sensitive content and (b) can be accessed by ‘too many’ people.
  • Auto-suggested records retention policies. The system may identify groups of records that do not seem to be subject to a suitable retention policy and make a recommendation to create one.
  • For those parts of the world who need it, new General Data Protection Regulations (GDPR) controls
  • Microsoft Information Protection, to replace Azure Information Protection and provide a single set of controls over all of Microsoft’s platforms.

Can ‘traditional’ records management survive the digital future

December 23, 2015

At a conference last year I listened with interest to a panel discussion on the subject of ‘paper versus digital’. Perhaps the debate was only meant to be light-hearted but it was clear that quite a few records managers continue to manage paper records and see them as their primary focus. In almost all cases these records are the printed version of a digital original. In most cases those digital originals remain stored on a network drive (or a person’s USB), or attached to an email, or – if you are lucky – copied to the organisation’s electronic document management (EDM) system to become the ‘official’ version.

Only a month ago I saw a colleague, whose position title clearly indicates she is responsible for digital recordkeeping, asking about KPIs for the creation of paper files. Perhaps it was part of the digital recordkeeping strategy to examine this subject, but it underlined for me the persistence of ‘traditional’ recordkeeping concepts, that documents belong in containers, that may be files or groups of files (or series etc). This is not just an issue for records managers, but for end users as well who continue to ‘think paper’.

It concerns me that records managers persist with the concept that digital records somehow ‘belong’ in digital files, often directly connected with the descriptive ‘business classification’ system of function and activity (in the sense that the file title or metadata may include the function and activity).

This way of thinking reinforces the concept that these records belong nowhere else, or might have no context outside the container, which of course is not necessarily true. It is, of course, possible to cross-link documents to a different container, but at what point is this a manual exercise? And where does it stop?

Part of the problem has been that digital systems mimic in their appearance and functionality the concept of a file or folder. We see them on network drives, in email, and in the containers of EDM systems. These folders or containers provide a sense of surety, that a document has been ‘stored’ or saved in a specific file. The file, however, is no more than a visual construct; the document’s file/folder/container metadata is what causes the document to appear that way.

This is all the more obvious in some EDM systems that store documents in a file store (often folder-based) and the metadata about the documents in a separate database. Aside from the folder structure, these documents are more or less a bunch of uncontrolled documents with metadata links to the ‘file’ construct in the database.

Of course, keeping records in some form of business context is important both for the management of those records (short and long term) and for retention and disposal purposes. But that doesn’t necessarily mean that documents must be associated with ‘files’ that, according to some records managers (and paper filing theory), should contain no more than 300 documents.

All of this is unnecessarily restrictive and, for end users, results in additional work. It’s no wonder that EDM systems have lagged behind the more ‘traditional’ way that users store documents, in network drives.

We should, I believe, think of digital records as being self-contained and dynamic objects with their own metadata payload that may be relevant in multiple contexts, not a fixed object stored in another fixed object.

Enter Microsoft Groups

In 2015 Microsoft announced the concept of Groups, accessed primarily from Outlook. Conceptually, you could think of a Group as being more similar to an Active Directory Distribution List rather than a shared mailbox. It is also similar to Yammer groups, but with much more functionality.

From Outlook, a user can create a Group to work collaboratively. Group members can initiate conversations with other members of the group (including Skype discussions) and – importantly – share documents.

So, Microsoft is taking us into the world of Group-centred information collaboration. This is not an experiment on the part of Microsoft, it is part of the world of Office 365 which includes SharePoint Online and OneDrive for Business, Delve, and more, supported by any device to enable users to create, access and share content anywhere, anytime.

Now, not every organisation uses Microsoft products, but an awful lot do. And for those who do, Groups in Office 365 is the future. The impact on traditional recordkeeping practices is likely to be large.

By 2020 or earlier, users will create Groups via Outlook to collaborate. Within the Group they will store digital content (not just documents); documents are stored in SharePoint Online site collections which are in addition to any other Site Collections. That’s right – in addition to your controlled Site Collections, there will be a multitude of Group-specific Site Collections. Conversations that take place in the group are stored as items in the inbox of the Group mailbox. In other words, all the content (and perhaps context) will be based around the Group ‘subject’.

To access all this rich content users will have three main options: (a) be an active member of the group; (b) SharePoint search; and (c) Delve. Business intelligence and data analytics, or e-Discovery type products, may also provide a fourth line of access.

Likely impact on recordkeeping

The recordkeeping practices that I believe will be most affected are: (a) metadata, including the application of BCS terms; (b) container-based storage, and (c) retention and disposal. It may also impact the careers of records managers. There will be no control over the creation of new groups (there can be, but why?), so their information content (in Outlook and in the associated SharePoint Online site collection) will become, in many respects, the new containers for records, in addition to other site collections.

On a positive note, I believe users will adapt and adopt this new model of working rapidly. Users, many of whom remain glued to Outlook as a business ‘collaboration’ tool, will find that Groups provide them with the ability to store and share documents via the Outlook interface. They may need to get used to the idea of ‘working out loud’ in Groups, but I think that will happen fairly quickly.

One thing is sure – change is inevitable.

Life without folders – SharePoint 2010

June 4, 2014

For a generation now we have been using folders on network drives, in home computers, and in email systems, to categorise and store digital content. We now do it in the cloud too.

They are a hard habit to break because they make so much sense.

Almost everyone in work today uses folders to categorise, store and retrieve digital content. The extensive use of folders in drives and email systems has presented significant challenges for records managers, both in the management of the records contained in them as well as the implementation of recordkeeping systems.

Let’s be clear – users love being able to categorise information in a way that makes sense.

It’s hard to imagine life without folders. Or is it? The last time I looked, one of the biggest social media systems in the world, Facebook, didn’t offer folders.

What are folders, really?

But folders are no more than a virtual construct, like containers in almost every recordkeeping systems. What you see on the screen is a folder (or directory) or container, but it isn’t a real folder like a physical folder that contains paper documents. Documents that appear to exist within a virtual folder include a pointer to the folder in their system-generated metadata.

Folders in SharePoint 2010

The primary way to group digital content in SharePoint 2010 is by storing them in document libraries. Document libraries are the highest level of aggregation possible so, in a way, they are similar to the highest level folders in a network drive.

In almost all cases, the initial, instinctive user reaction to a document library is to want to categorise the content using folders. Newly created, out of the box, document libraries include the ability to create folders, just like on network drives, and so many users start to replicate the structure of their network drives, or copy the entire content from the network drives across the the library using the ‘Open with Windows Explorer’ option.

But, very quickly, users discover that folders in SharePoint libraries have no navigation clues, no way of knowing if you should navigate up or down to find content. Before long, enthusiasm wanes in the trough of disillusionment and users go back to using network drives and the SharePoint site becomes a ghost town.

SharePoint’s neat little trick – making folders vanish

As noted earlier, folders are no more than a virtual construct. In SharePoint, users can make the folders disappear by creating a view (or modifying the existing one) to change the ‘Folders’ setting from ‘Folders’ to ‘Flat’ to show all items without folders.

As soon as the user clicks OK after changing this setting, all the folders vanish and only the documents stored in the folder structure appear. (Note, this does not work in libraries with Document Sets because Document Sets are regarded as documents, and so both appear).

With careful guidance, and subject to the volume of documents in the library, users may then query whether folders are even necessary. Or they may ask if there are alternatives (yes, categorisation or document sets).

In my own experience, once they see how to work without folders, users seem to quickly abandon the previously strongly held belief that folders are the best way to categorise documents and use either categorisation or document sets instead.

Can predictive coding be used to classify records?

November 6, 2012

A recent legal case in the United States, Plaintiffs v Peck, may set precedents for the way in which documents are categorised for e-Discovery (through ‘predictive coding’), a development that I think could ultimately impact on the way records are classified.

The presiding judge (Andrew Peck, a United States magistrate judge for the Southern District of New York) wrote an article in Law Technology News in October 2011 titled ‘Search, Forward: Will manual document review and keyword searches by replaced by computer-assisted coding?’.

In his article, Peck described the problems associated with the traditional, manual way of document review. He then refers to ‘… two recent research studies that clearly demonstrate that computerised searches are at least as accurate, if not more so, than manual review’. (The details of these studies are included in the article).

Peck discussed the use of keywords applied to electronic documents, and the poor results that often result (‘average recall was just 20% … (a) result (that) has been replicated … over the past few years’). He notes the generally negative judicial reaction to the use of keywords in e-discovery, partially because the manual process is the ‘gold standard’.

Given the increasingly digital nature of e-discovery, Peck noted the increasing use of ‘computer-assisted coding’, more commonly known as ‘predictive coding’. This methodology is described as ‘tools that use sophisticated algorithms to enable the computer to determine relevance based on interaction with a human reviewer’, using a set of documents to ‘train’ the system.

He stated that, unlike keywords, the (better) results achieved by auto-classification systems (predictive coding) are likely to make the latter approach more appealing in US Courts in the near future.

The specific case referred to (Moore v. Publicis, a ‘high profile employment discrimination case’) involved 3 million electronic documents, and the need to cull them. The plaintiffs objected to the use of predictive coding and criticised ‘… the use of such a novel method of discovery without supporting evidence or procedures for assessing reliability’.

Another Judge was asked to review the case; Judge Carter upheld the decision ‘after finding it to be well reasoned and therefore not subject to reversal.’

For background to the case, see: Plaintiffs v Peck – A Worthy Addition to your Summer Reading List. 10 July 2012. ELLBLOG .

Recommind has produced a short booklet titled ‘Predictive Coding for Dummies’, available as a free, 36-page, pdf. The Dummies Guide notes that ‘Real living, breathing legal experts are essential to predictive coding. These experts use built-in search and analytical tools — including keyword, Boolean and concept search, category grouping, and more than 40 other automatically populated filters — collectively referred to as predictive analytics — to identify documents that need to be reviewed and coded.’ Replace ‘legal experts’ with ‘records managers’ and the role of the records manager is clear.

Of course, finding all the correct documents within a classification is only one part of the requirement. The classification needs to be persistent and connected with other recordkeeping requirements including retention management.

I co-wrote an article for the November issue of IQ, the RIMPA industry quarterly, with Umi Mokhtar from Universiti Kebangsaan Malaysia (UKM) in Malaysia asking the question as to whether technology can classify records better than a human can.

The article noted apparent success rates when the technology was used for legal review, and questioned whether the same technology could be used to classify (or apply classification terms) to records instead of expecting: (a) ‘containers’ to have the right classification terms given their content or (b) users ‘filing’ documents against the correct classification.

I was fortunate to have the chance to sit with one of the predictive coding vendors last week to discuss these issues and my concerns about the effectiveness of the technology to classify records correctly.

I had a similar discussion with a reseller of the same product almost 8 years ago, at the height of EDRMS implementations, when this technology was only seen for its value in finding information.

Roll forwards 8 years, with even more massive amounts of digital information being captured and stored, and EDRMS systems capturing only a small fraction of that information, and the technology looks more appealing as a tool that can support digital recordkeeping.

What struck me most about the technology was the way it presents the results to a user. If you didn’t know it was an advanced search and categorisation engine, you might be forgiven for thinking that the screen of results was actually from an EDRMS.

  • On the left hand side is the classification scheme that I could browse.
  • Click on one of the activities, and I was taken to the subject.
  • The list of results shows all or most of the same basic metadata you would expect in your EDRMS; more if it was added when the record was saved.
  • From the results listed I could find similar documents, see similar or related search results, and add public or private tags.

The technology doesn’t just figure out the classification by itself – it has to be trained, and who better to train the system than records managers? Start with the business classification scheme, find 100 records that match the classification, ask the system to find 1000 and confirm (and manage exceptions). And so on, until all the digital records you have allowed the technology to search (including network drives, email etc) classifies all your digital records.

The link with disposal is there too. The technology classifies all the digital records you point it at. If your classification system is linked to your retention schedules, you are presented with all the information that can be then subject to a retention rule. Keeping the metadata about the records that you destroy is then a matter of capturing the metadata found in the search and applying additional metadata about the disposal action.

Options for using technology like this might include:

  • Applying it against legacy digital stores, to clean up old digital records.
  • Applying it in conjunction with an EDRMS (or SharePoint), so that the technology assigns the correct classification regardless of where the user puts it (or can file it according to that classification).
  • Applying it against legacy and active digital stores, instead of using an EDRMS.

Criticism of predictive coding

The online legal site published an article on 31 October 2012 ‘Pitting Computers Against Humans in Document Review‘. The article:

  • Examined whether a frequently cited (‘TREC’) study (quoted in the article by Peck above) ‘is sufficient to support the conclusions’ that technology is better than humans.
  • Noted several weaknesses in the original TREC study, pointing out that some of the ‘technology-assisted’ teams actually performed miserably.
  • Noted an ‘inherent flaw that the TREC study was not a fair comparison of manual reviewers to the technology-assisted teams.

However, the article also concludes that ‘one should not jump to discredit the usefulness of computer assisted methods’; techological solutions depend on the expertise of the people who use it.

Craig Ball offered an interesting reply to this article leading with the comment: ‘Whoever challenges our assumptions and forces us to defend them is performing a valuable service, no matter what their motives’.

Ball noted the ‘sad fact … that human reviewers perform poorly in a consistent fashion, and we needn’t rely upon TREC or the Grossman/Cormack article alone to prove same’. He added ‘the fact is that the errors human reviewers make are rife, even when well trained and -motivated (if we are candid, an all-too-rare and exceptional circumstance)’. And the ‘errors human reviewers make are overwhelmingly not close calls (but) plainly, manifestly errors of the sort that mindless, heartless computers do not make’.

I think it’s possible to replace ‘human reviewers’ in this context with ‘end users’ who don’t understand classification terms (or, generally, recordkeeping).

My own view is that predictive coding technology has the ability to support digital recordkeeping, with the active involvement of records managers, with or without an EDRMS. It has the ability (once trained) to aggregate records by BCS terms and to apply retention rules to those records. (Incidentally, the same concept is used to put records on legal holds to prevent their disposal).

Continuous Classification – Article by Greg Bak

October 14, 2012

Interesting recent (March 2012) academic paper by Greg Bak from the Archival Studies Program, Department of History, University of Manitoba, Canada, titled ‘Continuous classification: capturing dynamic relationships among information resources’.

Bak notes the following in the abstract to the paper:

‘Records classification within electronic records management systems often is constrained by rules derived from paper-era recordkeeping, particularly the rule that one record can have only one file code – a rule that was developed to enable the management of records in aggregate. This paper calls for a transformation of recordkeeping and archival practice through an expanded definition of records classification and through item-level management of electronic records’.

The first part of Bak’s paper discusses classification theory, and the supposed relationship between records classification and biological classification. He makes reference to Sir Hilary Jenkinson’s archival theory, to the Dutch Manual and recent articles by Kate Cumming and Chris Hurley.

He notes the problem of assuming that we can achieve a ‘perfect’ classification and quotes from a Jens Erik Mai article (‘The Modernity of Classification’, September 2010): ‘The arrangement that suits one man’s investigations is a hindrance to another’s’ – a comment that reminds me of the problems associated with file or document titling; how you might title a file or document isn’t necessary how I am (or anyone else is) going to look for it.

Bak notes, with regard to functional classification, that this is ‘not “natural” but created by archivists and recordkeepers to suit professional recordkeeping purposes, and that it better services the purposes of recordkeepers than those of records creators and users’.

He also suggests that ‘… the single-class rule (i.e., a record can only have one classification applied to it) obscures a key metric for determining the relevance or importance of electronic information resources; their repeated use, and the multiple relatonships that result from repeated use.’

Towards the middle of the paper, Bak discusses functions-based records classification, the ‘archival bond’ and archival appraisal. He notes (not surprisingly, for probably many of us) that ‘it has become commonplace among recordkeepers that users of records classifications systems – including records creators, records users and clerical records management staff – do not like function-based classification schemes, either for discovery, filing or retrieval’. Poor classification, he suggests, ‘leads to unofficial parallel recordkeeping (perhaps better termed information hoarding) either within a work unit or by individuals in their own workspace.

Bak then discusses, in a long section, records classification and electronic records management systems, and then the politics of classification. In this last section he refers to David Weinberger’s 2007 book ‘Everything is Miscellaneous’, and suggests that his analysis ‘has significant implications for records classification’.

Bak concludes the main part of his paper with a discussion of the theories of Peter Scott from the mid 1960’s, which led to the introduction of the Australian or series system. He quotes Clive Smith in relation to ‘virtual files’, in which Smith proposed that a correspondence file or dossier ‘… no longer exists physically, but only as a collection of electronic documents that are assembled through some search criteria’ that exists ‘only as long as the search is maintained’.

The last part of Bak’s paper discussions item-level management and the future of archival practice. He notes that ‘we are at a moment in archival history when digital records are compelling us to reconsider archival systems, standards and practices in light of the realities of digital information ecologies’.

Thanks to Greg Bak for feedback on the above.