Posted in Archiving third party content, Connectors, Conservation and preservation, Electronic records, Information Management, Microsoft 365, Microsoft Graph, Records management, Retention and disposal, Solutions

Using Microsoft 365 connectors to support records management

Microsoft 365 includes a range of connectors, in three categories, that can be used to support the management of records created by other applications. The three categories are:

  • Search connectors, that find content created by and/or stored in a range of internal and external applications, including social media.
  • Archive connectors, that import and archive content created by third-party applications.
  • API connectors, that support business processes such as capturing email attachments.

This post how these connectors can assist with the management of records.

The recordkeeping dilemma

Finding, capturing and managing records across an ever increasing volume of digital content and content types has been one of the biggest challenges for recordkeeping since the early 2000s.

The primary method of managing digital records for most of the past 20 years has been to require digital records (mostly emails and other digital content created on file shares) to be saved to or stored in an electronic document and records management system (EDRMS). The EDRMS was established as ‘the’ recordkeeping system for the organisation.

EDRM systems were also used to manage paper records which, over the past 20 years, have mostly contained the printed version of born-digital records that remain stored in the systems where they were created or captured.

There were two fundamental flaws in the EDRMS model. The first was an expectation that end-users would be willing to save digital records to the EDRMS. The second was that the original digital record remained in place where it was created or captured, usually ignored but often the source of rich pickings for eDiscovery.

The introduction of web-based email and document storage systems, smart phones, social media and personal messaging applications from around 2005 (in addition to already existing text messaging/SMS messages) further challenged the concept of a centralised recordkeeping system; in many cases, the only option to save these records was to print and scan, screenshot and save the image, or save to PDF, none of which were particularly effective in capturing the full set of records.

The hasty introduction from early 2020 of ‘work from home’ applications such as Zoom and Microsoft Teams has been a further blow to these methods.

In place records management

To the chagrin of records managers around the world, Microsoft never made it easy to save an email from Outlook to another system. Emails stubbornly remained stored in Exchange mailboxes with no sign of integration with file shares.

And for good reason – they have a different purpose and architecture to support that purpose. It would be similar to asking when it would be possible to create and send an email in Word.

The introduction of Office 365 (later Microsoft 365) from the mid 2010s changed the paradigm from a centralised model – where records were all copied to a central location and the originals left where they were created or captured, to a de-centralised or ‘in place’ model – where records are mostly left where they were created or captured.

The decentralised model does not exclude the ability to store copies of some records (e.g., emails) in other applications (e.g., SharePoint document libraries), but these are exceptions to the general rule.

It also does not exclude the ability to import or migrate content from third-party applications where necessary for recordkeeping purposes.

Microsoft 365 connectors

Microsoft 365 includes a wide range of options to connect with both internal and external systems. Many of these connectors simplify business processes and support integration models.

Connectors may also be used to support recordkeeping requirements, in three broad categories.

The three connectors

Archive connectors

Archive connectors allow organisations to import and archive data from third-party systems such as social media, instant messaging and document collaboration* platforms. Most of this data will be stored in Exchange mailboxes, where it can be subject to retention policies, eDiscovery and legal holds.

(*This option is still limited via connectors, but also see below under Search).

The social media and instant messaging data that can be archived in this way currently includes Facebook (business pages), LinkedIn company page data, Twitter, Webex Teams, Webpages, WhatsApp, Workplace from Facebook, Zoom Meetings. For the full listing, and a detailed description of what is required to connect each service, see this Microsoft description ‘Archive third-party data‘.

An important thing to keep in mind is that the data will be archived to an Exchange mailbox; this will require an account to be created for the purpose. Any data archived ot the mailbox will contribute to the overall storage quotas.

Search connectors

Search connectors (also known as Microsoft Graph connectors) index third-party data that then appears in Microsoft search results, including via Bing (the ‘Work’ tab), from http://www.office.com, and via SharePoint Online.

Most ECM/EDRM systems are listed, which means that organisations that continue to use those systems can allow end-users to find content from a single search point, only surfacing content that users are permitted to see.

The following is an example of what a Bing search looks like in the ‘Work’ tab (when enabled).

Example Bing search showing the Work tab

Note: as at 17 November 2020, Microsoft’s page ‘Overview of Microsoft Graph connectors‘ (which includes a very helpful architecture diagram) states that these are ‘currently in preview status available for tenants in Targeted release.’

There are two main types of search connector:

  • Microsoft built: Azure Data Lake Storage Gen2, Azure DevOps, Azure SQL, Enterprise websites, MediaWiki, Microsoft SQL, and ServiceNow.
  • Partner built. Includes the following on-premise and online document management/ECM/EDRM connectors – Alfresco, Alfresco Content Services, Box, Confluence, Documentum, Facebook Workplace, File Share (on prem), File System (on prem), Google Drive, IBM Connections, Lotus Notes, iManage, MicroFocus Content Manager (HPE Records Manager, HP TRIM), Objective, OneDrive, Open Text, Oracle, SharePoint (on prem), Slack, Twitter, Xerox DocuShare, Yammer

See the ‘Microsft Graph connectors gallery‘ web page for the full set of current connectors.

A consideration when deploying search connectors is the quality of the data that will be surfaced via searches. Duplicate content is likely to be a problem in identifying the single – or most recent – source of truth of any particular digital record, especially when the organisation has required records to be copied from one system (mailbox/file share) to another (EDRMS).

API Connectors

API connectors provide a way for Microsoft 365 to access and use content, including in third-party applications. To quote from the Microsoft ‘Connectors‘ web page:

‘A connector is a proxy or a wrapper around an API that allows the underlying service to talk to Microsoft Power Automate, Microsoft Power Apps, and Azure Logic Apps. It provides a way for users to connect their accounts and leverage a set of pre-built actions and triggers to build their apps and workflows.’

To see the complete list and for more information about each connector, see the Microsoft web page ‘Connector reference overview‘.

Each connector provides two things:

  • Actions. These are changes initiated by an end-user.
  • Triggers. There are two types of triggers: Polling and Push. Triggers may notify the app when a specific event occurs, resulting in an action. See the above web page for more details.

API connectors can support records management requirements in different ways (such as triggering an action when a specific event occurs) but they should not be confused with archiving or search connectors.

Summing up

The connectors available in Microsoft 365 support the model of keeping records in place where they were first created or captured. They enable the ability to archive data from third-party cloud applications, search for data in those (and on-premise) applications, and triggers actions based on events.

The use of connectors should be part of an overall strategic plan for managing records across the organisation. This may include a business decision to continue using an ECM/EDRMS in addition to the content created and captured in Microsoft 365. Ideally, however, the content in the ECM/EDRMS should not be a copy of what already exists in Microsoft 365.

Posted in Electronic records, Information Management, Microsoft Graph, Office 365, Semantic Office, XML

Metadata Payloads in the Digital World

For at least twenty years, a core tenet of both document and records management has been the metadata that defined records. A number of metadata schema were developed over the years, including the well-known Dublin Core (http://dublincore.org/documents/dces/) that defined 15 core metadata elements for digital content:

  • Contributor
  • Coverage
  • Creator
  • Date
  • Description
  • Format
  • Identifer
  • Language
  • Publisher
  • Relation
  • Rights
  • Source
  • Subject
  • Title
  • Type

Introduction of XML based documents

Parallel with the development of metadata schema, the introduction of XML-based documents (e.g., .docx, odb) from the early 2000s introduced a new way of both structuring and describing documents. Instead of being external to the document, metadata could be embedded within the document, making it effectively a type of ‘metadata payload’.

Around the same time that XML-based documents were introduced, I wrote about the ‘Semantic Office’. The Semantic Office drew on the same ideas developed and implemented for the ‘Semantic Web’. Conceptually, the idea was quite simple – just as web pages would contain their own embedded metadata in the form of Resource Description Framework (RDF) triples (subject – predicate – object, e.g., sky – is – blue), common office documents such as Outlook, Word and Excel could carry their own embedded metadata ‘payload’.

Some of this metadata is visible in the Properties pane of a records but only as descriptive terms not as metadata defined against a specific schema.

The (mostly overlooked and under-reported) outcome of the introduction of XML-based documents was that a document could be stored anywhere and be found again based on the embedded metadata – as opposed to finding it through  metadata that was created and managed separately from the record (for example, in a document management system). For some reason, however, the predominant and persistent model for document management has been to store metadata about a document separately from the document.

In most document and records management systems since the late 1990s, digital records (emails included, if they are saved to the DRMS) were/are stored in secure file shares while the metadata about the record (including its ‘file’ or ‘container’ identifier) was stored in a separate database. Visually this gives the user the illusion that the records are stored ‘in’ a container even though they are actually stored in a network file share.

This pervasive document management model is conceptually similar to the way computers record metadata about documents stored in a Windows NT File System (NTFS) in the Windows Master File Table (MFT). MFT entries include details of the size, time and date stamps, permissions, and so on. It assumes that the actual location of the record is recorded in the metadata.

How XML-based documents embed metadata

XML-based Office documents (as well as PDFs and image files), however, retain core metadata information within the document itself. The information is accessible regardless of where the document is stored.

Ironically (perhaps) it may be different from any external metadata used to describe the document.

To view the embedded metadata in a Word document you only need to rename it to .zip and then unzip it. Extracting a zipped Word document reveals (in most cases) several folders and one XML file:

  • [trash] – contains ‘dat’ files (may not be present in all documents)
  • _rels – contains the ‘.rels’ XML document
  • customXml – contains a number of ‘item’ and ‘itemProps’ XML documents
  • docProps – contains three very small files: app.xml, core.xml, custom.xml
  • word – contains a range of XML files and additional folders with other XML files.
  • [Content_Types].xml

In one example Word document downloaded from a SharePoint library, the file ‘item4.xml’ in the ‘customXml’ folder contained both XML namespace (xmlns) information as well as the embedded document management elements (highlighted in bold):

A separate xml document also located in the ‘customXML’ folder contained the following core properties, including most of the Dublin Core elements listed above (but note that they are all blank).

Arguably, the body of the record is also a form of metadata, enclosed by the terms <body>text</body>. In the example document downloaded from SharePoint, the body of the document is contained in the file ‘document.xml’ under the ‘word’ folder of the package.

  • xmlns:wps=”http://schemas.microsoft.com/office/word/2010/wordprocessingShape&#8221; mc:Ignorable=”w14 w15 w16se wp14″>
  • <w:body>
  • <w:p w14:paraId=”195D8795″ w14:textId=”77777777″ w:rsidR=”0001502C” w:rsidRDefault=”00880316″>
  • <w:r>
  • <w:t>Test document</w:t>
  • </w:r>
  • </w:p>
  • <w:p w14:paraId=”195D8796″ w14:textId=”77D86E32″ w:rsidR=”006832E2″ w:rsidRDefault=”006832E2″ w:rsidP=”006832E2″>
  • <w:r>
  • <w:t>Lorem ipsum (and the rest of the text, deleted for brevity)</w:t>
  • </w:r>
  • <w:bookmarkStart w:id=”0″ w:name=”_GoBack”/><w:bookmarkEnd w:id=”0″/>
  • </w:p><w:sectPr w:rsidR=”006832E2″>
  • <w:pgSz w:w=”11906″ w:h=”16838″/>
  • <w:pgMar w:top=”1440″ w:right=”1440″ w:bottom=”1440″ w:left=”1440″ w:header=”708″ w:footer=”708″ w:gutter=”0″/>
  • <w:cols w:space=”708″/>
  • <w:docGrid w:linePitch=”360″/>
  • </w:sectPr>
  • </w:body>
  • </w:document>

Other core metadata elements are contained in the ‘core.xml’ file:

Why is this important?

The existence of – and ability to make use of – embedded metadata seems to have been overlooked since the introduction of these types of records over 15 years ago. This may have been primarily because no-one had a system in place to access or use that data in any meaningful way.

Instead, most records continued to be defined by metadata that is created or captured and managed separately from the record itself.

The problems with storing metadata separately from the record are that: (a) the external metadata may be different from the embedded metadata, and (b) the external metadata may unnecessarily limit or restrict the ability to see the record in different contexts.

For example, one person may assign a specific metadata term, such as a function from the Business Classification Scheme (BCS) to the digital record, or assign it to a specific ‘container’. Some time later, another person may try to find the same record but discover it is not in the same file, or assigned to the same function term. They are likely to be looking for the record in or from a completely different context.

The only way they may be able to find it is by doing a general search that includes the body or content of the records, something I found to be the case in real life scenarios where users couldn’t find the records they were looking for based on metadata searches.

Of course, metadata is still important, but my point is the difference between embedded metadata that can be added when the document is saved to a document library, and external metadata that is stored separately from the digital record.

Being able to leverage the metadata embedded in records, wherever they are stored, provides a much more powerful ability to leverage this information, similar to the way the application of metadata to web pages facilitates access.

Records Description Framework

A core part of the world wide web is the application of metadata to web pages to facilitate their discovery in a highly connected world. The core elements of this metadata are defined in the World Wide Web Consortium (W3C)’s Resource Description Framework, or RDF.

To quote the World Wide Web (W3) consortium:

‘RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.’ (Source: https://www.w3.org/RDF/)

It is perhaps not surprising that Microsoft named the analytic engine behind Office 365 the Microsoft Graph.

According to Microsoft:

‘Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a memberOf relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.‘ (Source: https://developer.microsoft.com/en-us/graph/docs/concepts/overview)

microsoft_graph

The RDF model is also used in knowledge management applications such as Protege that supports the creation and use of RDF/XML ontologies.

Implications

In my opinion, the implications of XML-based office content (which has been around for over 10 years now) are quite important for records management theory and practice.

While, like traditional EDRM systems, documents are visually displayed ‘in’ the document library, each document retains its own originally assigned metadata even if it is downloaded – unless the user uses the ‘Check for Issues’ – ‘Inspect Document’ option from the Info panel to remove them.

The ability to store metadata properties directly in the document facilities that ability to locate and retrieve documents that have the same, similar or related properties, via the Microsoft Graph, in the same way that web pages use RDF triples, allows otherwise unconnected resources to be linked and presented to the user (subject to any security controls) automatically based on their specific context.

In other words, instead of records being locked to a specific container based on their metadata being stored in a database, records could be discovered and linked wherever they are located based on their embedded metadata.

Relevance of W3 XML schema to Office 365 content

The use of RDF-based metadata embedded in Office documents in Office 365 means that this data can be used to link resources in a way that supports the discovery of the resources. It allows for cross-linking of information. Documents with metadata payloads are one of the many resources that can be connected in this way.

For example, ‘… a user can be connected to a group through a ‘memberOf’ relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.’ (Source: https://developer.microsoft.com/en-us/graph/docs/concepts/overview)

‘Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.’ (Source: https://www.w3.org/RDF/)

Posted in Delve, Information Management, Microsoft Graph, Office 365, SharePoint Online, Yammer

Knowledge Management in Office 365

A few articles in the past few weeks, and some internal discussions, prompted some thinking around how Office 365 can support knowledge management (KM) – however that may be defined.

What is Knowledge Management?

According to many knowledge management sources online, knowledge management appeared around 1990, and paralleled the rise of document management. Both appear to have arisen as computers appeared (from the mid 1980s) and digital ways of capturing and managing information took hold, and records management was still primarily focused on the management of paper records.

An early (1994) definition for the term ‘knowledge management’ suggested that it was ‘… the process of capturing, distributing, and effectively using knowledge’ (Davenport, 1994. Koenig, 2012)

Bryant Duhon expanded on this somewhat imprecise definition in his 1998 article ‘It’s All in our Heads’ (my emphasis):

‘Knowledge management is a discipline that promotes an integrated approach to identifying, capturing, evaluating, retrieving, and sharing all of an enterprise’s information assets. These assets may include databases, documents, policies, procedures, and previously un-captured expertise and experience in individual workers.’ (Duhon, 1998)

A key element was capturing the knowledge acquired by individuals.

Koenig (2012) noted that ‘Perhaps the most central thrust in KM is to capture and make available, so it can be used by others in the organization, the information and knowledge that is in people’s heads as it were, and that has never been explicitly set down.’

Explicit/implicit versus tacit knowledge

Generally speaking, there is a difference between explicit and implicit knowledge, the information that is recorded, and ‘the information and knowledge that is in people’s heads’ (and walks out doors when people leave).

The latter is defined generally as tacit knowledge. That is, information that is ‘understood or implied, without being stated’, from the Latin tacitus, the past participle of tacere ‘be silent’. (https://en.oxforddictionaries.com/definition/tacit)

I have worked with the issue of how to access and capture the knowledge in the heads of departing employees since around 1984, when I was first made aware that the departure of some very senior and/or long-term staff meant that we would lose access to the information they knew, gained not only from learned knowledge but also in many cases from many decades of personal experience.

At the time it was not my responsibility to worry about it, but I saw attempts to conduct interviews and document procedures and processes with departing (or already departed) employees.

This pre-digital era activity stuck in my head – was interviewing the departed employees the only way to get this information out of their heads?

(As a side note I learned that it was important to interview and talk to my ageing parents and their siblings about their memories and experiences before those memories were lost forever).

Enter the computer age

I consider myself lucky to have been witness over a generation to the change in working practices from paper to digital.

The start of the digital era from the mid 1980s and ubiquitous access to computers on desktops, person to person emails, network file shares and personal folders created another related dilemma – even if the information was created (or captured) by a user, how could it be accessed?

Users were encouraged to put this information in repositories – mostly document management systems – but the fact that email and information on file shares were stored in different servers meant that unless users would actively move emails to a document management system, that information remained hidden away.

What was needed was a way for users to create and store information – emails, documents – wherever they wanted to put it, and for that information to be accessible, restricted only by relevant security controls.

The only systems that seemed to really do this effectively were eDiscovery tools. Perhaps this was not surprising, as the survival (and financial viability) of a company might depend on the ability to find the information that was required.

The rise of smart phones and ubiquitous, always-on, digital communication within the past 10 years has only added to the types of knowledge available and the methods used to capture it.

In my opinion, traditional recordkeeping practices have not kept up and often remain rooted in the idea that knowledge can be stored in a single location or container. How does one capture instant messages sent via encrypted messaging services in a records container?

Microsoft Graph

Microsoft introduced the Microsoft Graph in 2015. The image below demonstrates how the Graph connects content created and stored through the Office 365 (and connected) environment/s.

microsoft_graph.png

The image above should resonate with most people who work in an office. We send emails, create documents or data, set tasks, make appointments, attend and record meetings, have digital conversations, send messages, connect with colleagues, maintaining personal profiles.

The Microsoft Graph collects and analyses this information and presents it to users based on their context. According to Microsoft:

‘Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a member of relationship, and to another user through a manager relationship. (The Graph) can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.’

(Source for image and text: https://developer.microsoft.com/en-us/graph/docs)

According to Tony Redmond, Microsoft Graph’s REST-based APIs provide ‘… a common access approach to all manner of Office 365 data from Exchange and SharePoint to Teams and Planner’. The Graph Explorer, a newly introduced user interface, extends the ability to access information, wherever it lives. (https://developer.microsoft.com/en-us/graph/graph-explorer)

How does a person access this knowledge?

In my opinion, two key points about tacit knowledge are that:

  • It can be captured easily, just as other digital applications capture information about us, including by what we click on or search for.
  • It can be accessed without a person necessarily having to search for it.

Most of us by now are familiar with the way Facebook, LinkedIn, eBay, Amazon and so on capture information about our interests and present suggestions for what we might like to do next. It does this by understanding our context

Organisational knowledge management should be the same. Users should go about their business using the various digital applications available to them and other users should be able to see that information or knowledge because they have an interest in the same subject matter, or need to know it to do their work.

Users should be presented with information (subject to any security restrictions) because it relates to their work context or interests. They should not have to go looking for knowledge (although that is an option, just as finding a friend in Facebook is an option), knowledge should come to them.

How does Office 365 do this?

Most Office 365 enterprise or business users will have one or two ways to access this information:

  • Delve (may require a higher licence such as E3 for enterprise clients)
  • The One Drive for Business ‘Discover’ option.

The ‘Discover’ option allows a user to explore further, to see what others are working on. The response I get to Discover is both positive and slightly startled – the latter because it will be possible to know what others are actually doing.

Why is this important?

The ability to access and ‘harness’ collective knowledge in this way is essential to modern day workplaces.

To quote Microsoft:

‘As the pace of work accelerates, it’s more important than ever that you tap into the collective knowledge of your organisation to find answers, inform decision making, re-purpose successes and learn from lessons of the past’. (Moneypenny, 2017)

Serendipitous discovery

In his 2007 book ‘Everything Is Miscellaneous: The Power of the New Digital Disorder’, David Weinberger spoke about three types of order:

  • The first order is the order of physical things, like how books are lined up on shelves in a library.
  • The second order is the catalogue order. A catalogue typically refers to a physical order; it is still physical, but one can make several catalogs of the same physical order. Weinberger’s prime example is the card catalog of libraries.
  • The third order of order is the digital order, where there is no limit to the number of possible orderings. The digital order frees itself from physical reality, and in it, everything can be connected and related to everything else: Everything is miscellaneous.

The phrase ‘herding cats’ always comes to mind in relation to digital information. It resists order or compartmentalisation.

Further, your order is not my order, my way of browsing or searching may not correspond with your logic for storing or describing it (especially on network file shares!).

The internet pioneered serendipitous discovery. It is now completely taken for granted when, as noted above, we are are offered suggested friends in Facebook, jobs in LinkedIn, purchases on eBay and so on. We are presented this information because the application has collected information about what we clicked on, what jobs we do (or did), who our friends are, and what we like to search for.

The idea that our work environment can do the same thing and present information automatically based on our context (information finds us) is sometimes surprising for people used to the second order of things.

 

Davenport, Thomas H. (1994), Saving IT’s Soul: Human Centered Information Management.  Harvard Business Review,  March-April, 72 (2)pp. 119-131. Duhon, Bryant (1998), It’s All in our Heads. Inform, September, 12 (8). Quoted in Koenig (2012).

Duhon, Bryant (1998), It’s All in our Heads. Inform, September, 12 (8), pp. 8-13.

Koenig, Michael (4 May 2012), What is KM? Knowledge Management Explained, http://www.kmworld.com/Articles/Editorial/What-Is-…/What-is-KM-Knowledge-Management-Explained-82405.aspx, accessed 21 July 2017

Naomi Moneypenny (17 May 2017), Harnessing Collective Knowledge with SharePoint and Yammer, https://techcommunity.microsoft.com/t5/SharePoint-Blog/Harnessing-Collective-Knowledge-with-SharePoint-and-Yammer/ba-p/70164, accessed 21 July 2017

Redmond, Tony (20 July 2017), Exploring Office 365 with the Graph Explorer, https://www.petri.com/exploring-office-365-graph-explorer, accessed 21 July 2017

Weinberger, David, (2007) ‘Everything Is Miscellaneous: The Power of the New Digital Disorder’