Posted in Artificial Intelligence, Classification, Electronic records, Information Management, Microsoft Viva, Products and applications, Records management

Are auto-generated topic cards the future for aggregations of records?

Humans have natural instinct for grouping, classification and categorisation of things. It helps us find what we are looking for and gives us a sense of satisfaction, whether it be household items, computer storage, or much broader social and population groupings.

Humans have created and kept records ever since we developed a way to record them, on stones, clay shards, papyrus, bamboo sheets, velum, paper and various other means. Multiple records were aggregated in ways that made sense to the people who created or kept them and wanted to find them again.

The introduction of computers at work from the late 1980s/early 1990s began the decline of traditional ways of aggregating records about a particular subject together in a physical ‘file’, although that practice has persisted to the present day because it was and still is easier to refer to. Lawyers (or more often the legal clerks) still attend digital courtrooms armed with printed copies of (usually digital) evidence and other materials for this reason.

Lawyers off to court – Image credit Sven Vik (NYC TV News Videographer)

The ‘problem’ of digital aggregations

While physical files provided the ability to store anything (printable) about a given subject in the one location, digital ‘files’ (or aggregations) suffered from the fact that emails and other content are created or stored in completely different locations.

The only way to keep emails together with other content about the same subject was for end-users to copy them to a network file share folder location or a digital recordkeeping system. In almost every case, the original email remained in the mailbox where it might still have an active life. Some email mailboxes became a primary (or alternative) storage location for both emails and attachments (as did some desktops!).

Keeping all digital records about a given subject in a single aggregation was never an easy task. It was never possible to be sure that everything was captured because it relied on end-users.

The email mailbox – SharePoint conundrum

In the same way that organisations decided to store copies of emails in network file shares or EDRM system, it was easy to see SharePoint as the replacement for both.

But Microsoft have never made it easy to ‘natively’ copy an email from Outlook to SharePoint. There isn’t even a download option for emails. Emails can be dragged and dropped to synced document libraries, and various third-party products exist, but the process usually relies on end-users (a) to copy the emails and (b) to copy them consistently. Neither of these can be guaranteed.

And, of course, the records created and captured in Microsoft 365 is not just in Outlook mailboxes and SharePoint. A number of other apps create content that could records (for example Yammer conversations, Teams chats, calendar entries, Planner tasks, even Whiteboard diagrams). Few of these records can be saved to SharePoint.

So, are digital aggregations impossible?

There is nothing stopping organisations doing whatever they can or want to group related records together. In Microsoft 365, the most logical way to do this is in SharePoint document libraries (the ‘Files’ tab in Teams channels). An entire SharePoint site (the ‘Files’ tab in MS Teams channels) provides a form of meta-grouping; that is, multiple document libraries grouped by the SharePoint site/Team.

But if we stand back for a moment, to look at the (Microsoft 365) forest, what we see is not just individual trees (SharePoint sites, Exchange mailboxes and so on). Just as in a forest the roots of all the trees connect via mycorrhiza networks, sometimes known as ‘wood wide webs’, something similar happens in Microsoft 365 (and many other online systems, including Facebook).

Trees networking

The equivalent of networks in these systems are the ‘graphs’.

Like other graphs, the Microsoft Graph draws on all the rich data created and stored by end-users, in this case across the Microsoft 365 ecosystem – our corporate relationships, who we connect with and how, what we are communicating or writing, what we like, the way we use our time and so on. The graph learns what is popular or trending and makes suggestions (while respecting permissions) as to what we might want to see or know about.

Project Alexandria and Viva

According to a post in the Microsoft Research blog published in April 2021 and titled ‘Alexandria in Microsoft Viva Topics: from big data to big knowledge‘, Project Alexandria is ‘a research project within Microsoft Research Cambridge dedicated to discovering entities, or topics of information, and their associated properties from unstructured documents’.

The blog post also noted that ‘Alexandria technology plays a central role in the recently announced Microsoft Viva Topics, an AI product that automatically organizes large amounts of content and expertise, making it easier for people to find information and act on it’.

The Alexandria pipeline – from unstructured text to structured knowledge (From the blog post above)

The outcomes sound similar to traditional ‘manually’ created aggregations, although they don’t replace them. In fact, the more that content is manually curated, the more likely that Viva Topics can accurately connect them and other related content that might otherwise be missed.

While Viva Topics might appear to primarily focussed on supporting knowledge management outcomes and is currently limited to content stored in SharePoint, the technology has potential implications for records management. In particular, the age-old issue of how to find all information about a given subject (or know that a pre-defined aggregation contains all relevant information).

Viva Topic cards

As noted already, there is nothing stopping organisations from creating aggregations in ways that make sense to them and their end-users. SharePoint document libraries are the most logical form of aggregation that also happen to allow complex metadata, versioning and other features typically associated with EDRM systems. SharePoint document libraries are just one of several ways that content may be aggregated; Exchange mailboxes are another.

But, in most organisations, potentially relevant information AND records is frequently hidden from view in personal mailboxes and OneDrive accounts, in Teams chats, and in other applications (e.g., Planner). Viva Topics has the potential to leverage this information.

Once set up (as described in Set up Microsoft Viva Topics) , Microsoft Viva begins to work its magic, discovering topics. An example of a discovered topic (from ‘Manage topics at scale in Microsoft Viva Topics‘ is shown below.

While Topics are still limited to SharePoint content and people, there is potential to extend this model even further by including details about emails, chat messages or other content across the Microsoft 365 ecosystem – even if that information cannot be seen. For example:

  • Topic Name
  • Suggested people (perhaps grouped by AD manager or business area)
  • Suggested files and pages (you can see)
  • Authors of (n number of) emails that are related to the topic with an indication of volume over given periods (e.g., ‘251 emails in the past 6 months’) or a graphic representing this activity
  • Names of Teams that contain (n number of) chat messages related to the topic.
  • Participants in Teams 1:1 chats that contain (n number of) messages related to the topic.
  • Volume and date range of other related content (e.g., Tasks, Whiteboards, Forms, Yammer conversations).

Could Topic cards be the new aggregations?

Topic cards have the potential to resolve the age-old dilemma of digital aggregations, but they are unlikely to replace pre-defined ways to aggregate records including by copying emails to SharePoint document libraries. Those older methods will continue to exist for a long time.

But more importantly, they have the potential to draw out or highlight content that would otherwise be hidden from view – even if that content remains inaccessible.

When configured, Viva Topics already appear in search results, enhancing search outcomes.

It is only a matter of time before the probabilistic programming techniques of Project Alexandria, with expert human curation, begins to provide the type of high precision knowledge base construction for all relevant content about a given subject, first described by Microsoft researchers in May 2019.

Perhaps they may even support or link with retention and disposal processes, highlighting records due for disposal within a given period or even preventing their premature disposal.