Archive for the ‘Microsoft Graph’ Category

Metadata Payloads in the Digital World

March 19, 2019

For at least twenty years, a core tenet of both document and records management has been the metadata that defined records. A number of metadata schema were developed over the years, including the well-known Dublin Core (http://dublincore.org/documents/dces/) that defined 15 core metadata elements for digital content:

  • Contributor
  • Coverage
  • Creator
  • Date
  • Description
  • Format
  • Identifer
  • Language
  • Publisher
  • Relation
  • Rights
  • Source
  • Subject
  • Title
  • Type

Introduction of XML based documents

Parallel with the development of metadata schema, the introduction of XML-based documents (e.g., .docx, odb) from the early 2000s introduced a new way of both structuring and describing documents. Instead of being external to the document, metadata could be embedded within the document, making it effectively a type of ‘metadata payload’.

Around the same time that XML-based documents were introduced, I wrote about the ‘Semantic Office’. The Semantic Office drew on the same ideas developed and implemented for the ‘Semantic Web’. Conceptually, the idea was quite simple – just as web pages would contain their own embedded metadata in the form of Resource Description Framework (RDF) triples (subject – predicate – object, e.g., sky – is – blue), common office documents such as Outlook, Word and Excel could carry their own embedded metadata ‘payload’.

Some of this metadata is visible in the Properties pane of a records but only as descriptive terms not as metadata defined against a specific schema.

The (mostly overlooked and under-reported) outcome of the introduction of XML-based documents was that a document could be stored anywhere and be found again based on the embedded metadata – as opposed to finding it through  metadata that was created and managed separately from the record (for example, in a document management system). For some reason, however, the predominant and persistent model for document management has been to store metadata about a document separately from the document.

In most document and records management systems since the late 1990s, digital records (emails included, if they are saved to the DRMS) were/are stored in secure file shares while the metadata about the record (including its ‘file’ or ‘container’ identifier) was stored in a separate database. Visually this gives the user the illusion that the records are stored ‘in’ a container even though they are actually stored in a network file share.

This pervasive document management model is conceptually similar to the way computers record metadata about documents stored in a Windows NT File System (NTFS) in the Windows Master File Table (MFT). MFT entries include details of the size, time and date stamps, permissions, and so on. It assumes that the actual location of the record is recorded in the metadata.

How XML-based documents embed metadata

XML-based Office documents (as well as PDFs and image files), however, retain core metadata information within the document itself. The information is accessible regardless of where the document is stored.

Ironically (perhaps) it may be different from any external metadata used to describe the document.

To view the embedded metadata in a Word document you only need to rename it to .zip and then unzip it. Extracting a zipped Word document reveals (in most cases) several folders and one XML file:

  • [trash] – contains ‘dat’ files (may not be present in all documents)
  • _rels – contains the ‘.rels’ XML document
  • customXml – contains a number of ‘item’ and ‘itemProps’ XML documents
  • docProps – contains three very small files: app.xml, core.xml, custom.xml
  • word – contains a range of XML files and additional folders with other XML files.
  • [Content_Types].xml

In one example Word document downloaded from a SharePoint library, the file ‘item4.xml’ in the ‘customXml’ folder contained both XML namespace (xmlns) information as well as the embedded document management elements (highlighted in bold):

A separate xml document also located in the ‘customXML’ folder contained the following core properties, including most of the Dublin Core elements listed above (but note that they are all blank).

Arguably, the body of the record is also a form of metadata, enclosed by the terms <body>text</body>. In the example document downloaded from SharePoint, the body of the document is contained in the file ‘document.xml’ under the ‘word’ folder of the package.

  • xmlns:wps=”http://schemas.microsoft.com/office/word/2010/wordprocessingShape&#8221; mc:Ignorable=”w14 w15 w16se wp14″>
  • <w:body>
  • <w:p w14:paraId=”195D8795″ w14:textId=”77777777″ w:rsidR=”0001502C” w:rsidRDefault=”00880316″>
  • <w:r>
  • <w:t>Test document</w:t>
  • </w:r>
  • </w:p>
  • <w:p w14:paraId=”195D8796″ w14:textId=”77D86E32″ w:rsidR=”006832E2″ w:rsidRDefault=”006832E2″ w:rsidP=”006832E2″>
  • <w:r>
  • <w:t>Lorem ipsum (and the rest of the text, deleted for brevity)</w:t>
  • </w:r>
  • <w:bookmarkStart w:id=”0″ w:name=”_GoBack”/><w:bookmarkEnd w:id=”0″/>
  • </w:p><w:sectPr w:rsidR=”006832E2″>
  • <w:pgSz w:w=”11906″ w:h=”16838″/>
  • <w:pgMar w:top=”1440″ w:right=”1440″ w:bottom=”1440″ w:left=”1440″ w:header=”708″ w:footer=”708″ w:gutter=”0″/>
  • <w:cols w:space=”708″/>
  • <w:docGrid w:linePitch=”360″/>
  • </w:sectPr>
  • </w:body>
  • </w:document>

Other core metadata elements are contained in the ‘core.xml’ file:

Why is this important?

The existence of – and ability to make use of – embedded metadata seems to have been overlooked since the introduction of these types of records over 15 years ago. This may have been primarily because no-one had a system in place to access or use that data in any meaningful way.

Instead, most records continued to be defined by metadata that is created or captured and managed separately from the record itself.

The problems with storing metadata separately from the record are that: (a) the external metadata may be different from the embedded metadata, and (b) the external metadata may unnecessarily limit or restrict the ability to see the record in different contexts.

For example, one person may assign a specific metadata term, such as a function from the Business Classification Scheme (BCS) to the digital record, or assign it to a specific ‘container’. Some time later, another person may try to find the same record but discover it is not in the same file, or assigned to the same function term. They are likely to be looking for the record in or from a completely different context.

The only way they may be able to find it is by doing a general search that includes the body or content of the records, something I found to be the case in real life scenarios where users couldn’t find the records they were looking for based on metadata searches.

Of course, metadata is still important, but my point is the difference between embedded metadata that can be added when the document is saved to a document library, and external metadata that is stored separately from the digital record.

Being able to leverage the metadata embedded in records, wherever they are stored, provides a much more powerful ability to leverage this information, similar to the way the application of metadata to web pages facilitates access.

Records Description Framework

A core part of the world wide web is the application of metadata to web pages to facilitate their discovery in a highly connected world. The core elements of this metadata are defined in the World Wide Web Consortium (W3C)’s Resource Description Framework, or RDF.

To quote the World Wide Web (W3) consortium:

‘RDF extends the linking structure of the Web to use URIs to name the relationship between things as well as the two ends of the link (this is usually referred to as a “triple”). Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.’ (Source: https://www.w3.org/RDF/)

It is perhaps not surprising that Microsoft named the analytic engine behind Office 365 the Microsoft Graph.

According to Microsoft:

‘Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a memberOf relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.‘ (Source: https://developer.microsoft.com/en-us/graph/docs/concepts/overview)

microsoft_graph

The RDF model is also used in knowledge management applications such as Protege that supports the creation and use of RDF/XML ontologies.

Implications

In my opinion, the implications of XML-based office content (which has been around for over 10 years now) are quite important for records management theory and practice.

While, like traditional EDRM systems, documents are visually displayed ‘in’ the document library, each document retains its own originally assigned metadata even if it is downloaded – unless the user uses the ‘Check for Issues’ – ‘Inspect Document’ option from the Info panel to remove them.

The ability to store metadata properties directly in the document facilities that ability to locate and retrieve documents that have the same, similar or related properties, via the Microsoft Graph, in the same way that web pages use RDF triples, allows otherwise unconnected resources to be linked and presented to the user (subject to any security controls) automatically based on their specific context.

In other words, instead of records being locked to a specific container based on their metadata being stored in a database, records could be discovered and linked wherever they are located based on their embedded metadata.

Relevance of W3 XML schema to Office 365 content

The use of RDF-based metadata embedded in Office documents in Office 365 means that this data can be used to link resources in a way that supports the discovery of the resources. It allows for cross-linking of information. Documents with metadata payloads are one of the many resources that can be connected in this way.

For example, ‘… a user can be connected to a group through a ‘memberOf’ relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.’ (Source: https://developer.microsoft.com/en-us/graph/docs/concepts/overview)

‘Using this simple model, it allows structured and semi-structured data to be mixed, exposed, and shared across different applications. This linking structure forms a directed, labeled graph, where the edges represent the named link between two resources, represented by the graph nodes. This graph view is the easiest possible mental model for RDF and is often used in easy-to-understand visual explanations.’ (Source: https://www.w3.org/RDF/)

Advertisements

Knowledge Management in Office 365

July 21, 2017

A few articles in the past few weeks, and some internal discussions, prompted some thinking around how Office 365 can support knowledge management (KM) – however that may be defined.

What is Knowledge Management?

According to many knowledge management sources online, knowledge management appeared around 1990, and paralleled the rise of document management. Both appear to have arisen as computers appeared (from the mid 1980s) and digital ways of capturing and managing information took hold, and records management was still primarily focused on the management of paper records.

An early (1994) definition for the term ‘knowledge management’ suggested that it was ‘… the process of capturing, distributing, and effectively using knowledge’ (Davenport, 1994. Koenig, 2012)

Bryant Duhon expanded on this somewhat imprecise definition in his 1998 article ‘It’s All in our Heads’ (my emphasis):

‘Knowledge management is a discipline that promotes an integrated approach to identifying, capturing, evaluating, retrieving, and sharing all of an enterprise’s information assets. These assets may include databases, documents, policies, procedures, and previously un-captured expertise and experience in individual workers.’ (Duhon, 1998)

A key element was capturing the knowledge acquired by individuals.

Koenig (2012) noted that ‘Perhaps the most central thrust in KM is to capture and make available, so it can be used by others in the organization, the information and knowledge that is in people’s heads as it were, and that has never been explicitly set down.’

Explicit/implicit versus tacit knowledge

Generally speaking, there is a difference between explicit and implicit knowledge, the information that is recorded, and ‘the information and knowledge that is in people’s heads’ (and walks out doors when people leave).

The latter is defined generally as tacit knowledge. That is, information that is ‘understood or implied, without being stated’, from the Latin tacitus, the past participle of tacere ‘be silent’. (https://en.oxforddictionaries.com/definition/tacit)

I have worked with the issue of how to access and capture the knowledge in the heads of departing employees since around 1984, when I was first made aware that the departure of some very senior and/or long-term staff meant that we would lose access to the information they knew, gained not only from learned knowledge but also in many cases from many decades of personal experience.

At the time it was not my responsibility to worry about it, but I saw attempts to conduct interviews and document procedures and processes with departing (or already departed) employees.

This pre-digital era activity stuck in my head – was interviewing the departed employees the only way to get this information out of their heads?

(As a side note I learned that it was important to interview and talk to my ageing parents and their siblings about their memories and experiences before those memories were lost forever).

Enter the computer age

I consider myself lucky to have been witness over a generation to the change in working practices from paper to digital.

The start of the digital era from the mid 1980s and ubiquitous access to computers on desktops, person to person emails, network file shares and personal folders created another related dilemma – even if the information was created (or captured) by a user, how could it be accessed?

Users were encouraged to put this information in repositories – mostly document management systems – but the fact that email and information on file shares were stored in different servers meant that unless users would actively move emails to a document management system, that information remained hidden away.

What was needed was a way for users to create and store information – emails, documents – wherever they wanted to put it, and for that information to be accessible, restricted only by relevant security controls.

The only systems that seemed to really do this effectively were eDiscovery tools. Perhaps this was not surprising, as the survival (and financial viability) of a company might depend on the ability to find the information that was required.

The rise of smart phones and ubiquitous, always-on, digital communication within the past 10 years has only added to the types of knowledge available and the methods used to capture it.

In my opinion, traditional recordkeeping practices have not kept up and often remain rooted in the idea that knowledge can be stored in a single location or container. How does one capture instant messages sent via encrypted messaging services in a records container?

Microsoft Graph

Microsoft introduced the Microsoft Graph in 2015. The image below demonstrates how the Graph connects content created and stored through the Office 365 (and connected) environment/s.

microsoft_graph.png

The image above should resonate with most people who work in an office. We send emails, create documents or data, set tasks, make appointments, attend and record meetings, have digital conversations, send messages, connect with colleagues, maintaining personal profiles.

The Microsoft Graph collects and analyses this information and presents it to users based on their context. According to Microsoft:

‘Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a member of relationship, and to another user through a manager relationship. (The Graph) can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.’

(Source for image and text: https://developer.microsoft.com/en-us/graph/docs)

According to Tony Redmond, Microsoft Graph’s REST-based APIs provide ‘… a common access approach to all manner of Office 365 data from Exchange and SharePoint to Teams and Planner’. The Graph Explorer, a newly introduced user interface, extends the ability to access information, wherever it lives. (https://developer.microsoft.com/en-us/graph/graph-explorer)

How does a person access this knowledge?

In my opinion, two key points about tacit knowledge are that:

  • It can be captured easily, just as other digital applications capture information about us, including by what we click on or search for.
  • It can be accessed without a person necessarily having to search for it.

Most of us by now are familiar with the way Facebook, LinkedIn, eBay, Amazon and so on capture information about our interests and present suggestions for what we might like to do next. It does this by understanding our context

Organisational knowledge management should be the same. Users should go about their business using the various digital applications available to them and other users should be able to see that information or knowledge because they have an interest in the same subject matter, or need to know it to do their work.

Users should be presented with information (subject to any security restrictions) because it relates to their work context or interests. They should not have to go looking for knowledge (although that is an option, just as finding a friend in Facebook is an option), knowledge should come to them.

How does Office 365 do this?

Most Office 365 enterprise or business users will have one or two ways to access this information:

  • Delve (may require a higher licence such as E3 for enterprise clients)
  • The One Drive for Business ‘Discover’ option.

The ‘Discover’ option allows a user to explore further, to see what others are working on. The response I get to Discover is both positive and slightly startled – the latter because it will be possible to know what others are actually doing.

Why is this important?

The ability to access and ‘harness’ collective knowledge in this way is essential to modern day workplaces.

To quote Microsoft:

‘As the pace of work accelerates, it’s more important than ever that you tap into the collective knowledge of your organisation to find answers, inform decision making, re-purpose successes and learn from lessons of the past’. (Moneypenny, 2017)

Serendipitous discovery

In his 2007 book ‘Everything Is Miscellaneous: The Power of the New Digital Disorder’, David Weinberger spoke about three types of order:

  • The first order is the order of physical things, like how books are lined up on shelves in a library.
  • The second order is the catalogue order. A catalogue typically refers to a physical order; it is still physical, but one can make several catalogs of the same physical order. Weinberger’s prime example is the card catalog of libraries.
  • The third order of order is the digital order, where there is no limit to the number of possible orderings. The digital order frees itself from physical reality, and in it, everything can be connected and related to everything else: Everything is miscellaneous.

The phrase ‘herding cats’ always comes to mind in relation to digital information. It resists order or compartmentalisation.

Further, your order is not my order, my way of browsing or searching may not correspond with your logic for storing or describing it (especially on network file shares!).

The internet pioneered serendipitous discovery. It is now completely taken for granted when, as noted above, we are are offered suggested friends in Facebook, jobs in LinkedIn, purchases on eBay and so on. We are presented this information because the application has collected information about what we clicked on, what jobs we do (or did), who our friends are, and what we like to search for.

The idea that our work environment can do the same thing and present information automatically based on our context (information finds us) is sometimes surprising for people used to the second order of things.

 

Davenport, Thomas H. (1994), Saving IT’s Soul: Human Centered Information Management.  Harvard Business Review,  March-April, 72 (2)pp. 119-131. Duhon, Bryant (1998), It’s All in our Heads. Inform, September, 12 (8). Quoted in Koenig (2012).

Duhon, Bryant (1998), It’s All in our Heads. Inform, September, 12 (8), pp. 8-13.

Koenig, Michael (4 May 2012), What is KM? Knowledge Management Explained, http://www.kmworld.com/Articles/Editorial/What-Is-…/What-is-KM-Knowledge-Management-Explained-82405.aspx, accessed 21 July 2017

Naomi Moneypenny (17 May 2017), Harnessing Collective Knowledge with SharePoint and Yammer, https://techcommunity.microsoft.com/t5/SharePoint-Blog/Harnessing-Collective-Knowledge-with-SharePoint-and-Yammer/ba-p/70164, accessed 21 July 2017

Redmond, Tony (20 July 2017), Exploring Office 365 with the Graph Explorer, https://www.petri.com/exploring-office-365-graph-explorer, accessed 21 July 2017

Weinberger, David, (2007) ‘Everything Is Miscellaneous: The Power of the New Digital Disorder’