Skip to content
  • About Andrew Warland

Records about the world

All about managing records and information and some of my photos

Tag: HTML

Posted in Electronic records, Records management, Semantic Office, XML

A Semantic Office

Posted on January 23, 2010February 1, 2010 by Andrew Warland
Abstract
The basic concepts and theory behind the idea of a ‘semantic web’ are already available in XML-based documents produced in Word 2007 and Open Office. This has potential implications for the way records could be managed within organisations, leading to the possibility of a ‘semantic office’.
Background
The conceptual basis for a ‘semantic web’ was first defined by Tim Berners-Lee and Mark Fischetti in their 1999 book ‘Weaving the Web’ (HarperSanFrancisco. chapter 12. ISBN 9780062515872.).
At its core, the concept behind the Semantic Web is creating the ability for computers to understand the meaning (‘semantics’) of information in a web page, by presenting that information in a machine understandable format.  In very simple terms, this can be achieved through the use of pre-defined, agreed terms to describe informational elements contained within a web page as individual elements of data.  This has led to the suggestion of a ‘World Wide Database’ (Nova Spivack, 2005).
Until recently, the primary way to present information that could be readable in a browser was to define it using hypertext mark-up language, or HTML (current standard version HTML 4), which (like all mark up languages derives from the original Standard Generalised Markup Language (SGML).
While HTML is an effective way of presenting and formatting information read by a browser, it (and its successors including XHTML and more recently HTML 5) does not allow for the definition of all individual informational elements contained within the web page.  Extensible mark-up language (XML, released in 1998), Web Ontology Language (OWL) and Resource Description Frameworks (RDF, released in 1999), on the other hand, allow individual informational elements to be defined.  The framework for defining information in this way is described below.
The concepts behind the description of individual elements within a web page has its origins in object oriented programming languages of the late 1980s and early 1990s.  It can be argued that its origins can be traced further back to the very early 1980s and the development of relational databases that could link, analyse and present data.
In 1988, B Pernici from the Politecnico di Milano presented an article to the Conference on Office Information System in Palo Alto.  The article, ‘Supporting OIS design through semantic queries’ , proposed a semantic query language to assist in the retrieval of information from conceptual schemas in office systems.
This was, possibly, the first reference to the Semantic Office.
Developments within the field of semantic representation of knowledge appear to have focussed on two main areas from the mid 1980s – the management of data within database systems, and the management of informational or data elements within web based information.
In the early to mid 2000s, possibly with this background and other drivers, both Microsoft and Open Office began to work on new XML-based document file format standards.  Microsoft’s version was called ‘Office Open XML’ and was published in November 2008 as ISO/IEC 29500:2008 (also released as ECMA-376 Office Open XML File Formats – 2nd edition, in December 2008); OpenOffice.org released ‘OpenOffice.org XML’, published in November 2006 as ISO/IEC 26300:2006 Open Document Format for Office Applications (OpenDocument) v1.0.
Interestingly, Office 2007 is not apparently entirely in compliance with the ISO/IEC 29500:2008, but Microsoft expects this will be the case with Office 2010. (http://www.microsoft.com/interop/letters/ChrisCapOpenLetter.mspx)
Defining Semantics
The broad concept behind the Semantic Web is that a document (in this case a web page) can contain elements of information (or data) that are described within pre-defined ‘categories’, therefore allowing other data described in the same category but in other documents (web pages, or documents on a server accessible through the same web page) to be found, retrieved but more importantly used in potentially completely different contexts.
In a wide sense, it should allow accessible information in any part of the internet to be retrieved and used in this way.  It turns what was unstructured information into structured information.
As described on the main page of http://semanticweb.org/wiki/Main_Page, ‘The Semantic Web is the extension of the World Wide Web that enables people to share content beyond the boundaries of applications and websites.’
At the core of the Semantic Web is agreement on how information should be described, in the form of metadata (‘information about information’).  One of the most well known set of metadata is the Dublin Core, which consists of 15 simple elements including : Title, Creator, Subject, Description, Date.
Metadata sets are often conceptually the same thing as ontologies, taxonomies, and even (to hark back to its origins) data dictionaries, often when metadata sets become quite complex with relationships with other sets and all terms are used interchangeably, often depending on the context of the person.  Web Ontology Language (OWL) is a technology for developing ontologies.
Agreed sets are known as schema; the schema used in a Semantic Web web page are defined at the beginning as XML name spaces or XMLNS (for example ‘xmlns:dc=http://purl.org/dc/terms/’).  The presence of this at the top of a web page means that the web page contains metadata elements drawn from Dublin Core within it somewhere. (For example: <span property=”dc:title”>How to Publish Linked Data on the Web</span>)
One ontology that has become very common on the net in recent years is ‘Friend of a Friend’, or FOAF.  FOAF is a way to describe people and their relationships in an agreed format.   FOAF includes metadata elements such as:  Person, name, nick, homepage, weblog, knows, interest, plan, based_near, age, OnlineAccount, Group, member, and so on. This would appear in a web page like this: <h1 property=”foaf:name”>Andrew Warland</h1>
These agreed metadata sets, ontologies or schema are presented on web pages in XML format as shown above using the agreed Resource Description Framework (RDF) for describing information.  RFD in its simplest form consists of ‘triples’ based on: subject, predicate, object.
For example: ‘The title of the book is the Semantic Office’.  Here, ‘Book’ is the subject, ‘title’ is the predicate’, and ‘The Semantic Office’ is the object. To define this in a web page, we would first have to choose an appropriate schema. In this case, Dublin Core seems appropriate:
xmlns:dc=”http://purl.org/dc/elements/1.1/”&gt;
Within the body of the web page, we would then include:
<dc:title>The Semantic Office</dc:title>
The relationship to Office Documents
To see the XML contents of a Microsoft docx or xlsx document, simply rename the format from docx or xlsx to zip.  You can then open the zipped package and explore the contents.
The following shows the XML-based content embedded within a recent sample docx document.  As you can see, this way of presenting information is identical to the way it is presented in Semantic Web formatted documents.
Within the folder ‘docProps’:
-xmlns:dc=http://purl.org/dc/elements/1.1/
-<dc:title>Test document</dc:title>
Within the folder ‘word’:
-xmlns:ve=http://schemas.openxmlformats.org/markup-compatibility/2006
-xmlns:o=”urn:schemas-microsoft-com:office:office”
-xmlns:r=”http://schemas.openxmlformats.org/officeDocument/2006/relationships&#8221;
-xmlns:m=”http://schemas.openxmlformats.org/officeDocument/2006/math&#8221;
-xmlns:v=”urn:schemas-microsoft-com:vml”
-xmlns:wp=”http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing&#8221;
-xmlns:w10=”urn:schemas-microsoft-com:office:word”
-xmlns:w=”http://schemas.openxmlformats.org/wordprocessingml/2006/main&#8221;
-xmlns:wne=”http://schemas.microsoft.com/office/word/2006/wordml&#8221;
and within the document a range of pre-defined information built around the schemas listed above.
What are we seeing here?  Microsoft Word documents presented in almost the same way that web pages in the Semantic Web are formed.
If we included recordkeeping metadata in the same structure, we are achieving, potentially, an internal electronic office environment, accessible via a browser interface, that allows information to be stored, accessed and used in the same way information in the semantic web is used.
So how do we achieve a Semantic Office?
According to Microsoft, this type of additional information is a customised extension of the document information properties, and can only be added using InfoPath.  But, once it’s there, it can then be utilised again and again in standardised ways.
What does this mean? Potentially, it means that end users can create documents with all the required recordkeeping metadata embedded within the document.
Of course, some of this information is already there in the form of basic document properties.  But, the ability to extend this basic set has ramifications for the way information is created, stored, found, retrieved and used that has very close similarities with the way information is being presented in the Semantic Web.
When we consider how almost every application to manage documents and records is now browser based, the ability to apply Semantic Web and Web 2.0/3.0 tools to this information within the enterprise means that records and the information content of those records could be used in ways that were never previously considered possible.  Some products, including EMC’s Documentum, now include an XML store in addition to the traditional file store and relational database.  (See http://www.emc.com/products/detail/software/xml-store.htm).
For example, instead of consigning documents to pre-defined containers or folders within a file plan, it might instead be possible to define the container, title and classification, and place organisational information about the author and her/his organisational context within the document metadata automatically, as part of the recordkeeping metadata schema.  This, in a sense, is an encapsulated object, and is not new, but the ability to do it through the original document (eg Word) is new.
Copyright – Andrew Warland 2010

References

Konsynski, B.R., Bracket, L.C., and Bracket, W.E., ‘A model for specification of office communications’, IEEE Trans. on Comm., Vol. COM-30, N. 1, Jan. 1982.

Nutt, G.J. and Ricci, P.A., ‘Quinault: an office modeling system’, Computer, May 1981.

David W. Shipman, The functional data model and the data languages DAPLEX, ACM Transactions on Database Systems (TODS), v.6 n.1, p.140-173, March 1981

Pernici, B., Barbic, F., Fugini, M.G., Maiocchi, R., Rames, J.R., and Rolland, C., ‘C-TODOS: An automatic tool for office system conceptual design’, Politecnico di Milano, Electronics Dept., Rep. N. 87-15, 1987.

Li Ding, Lina Zhou, Tim Finin, and Anupam Joshi, How the Semantic Web is Being Used:An Analysis of FOAF, Proceedings of the 38th International Conference on System Sciences, January 2005.

Spivack, Nova. ‘Towards a world wide database’.  Blog post 27 October 2005.  http://novaspivack.typepad.com/nova_spivacks_weblog/2005/10/towards_a_world.html accessed 23 January 2010.

Capossella, Chris ‘An Open Letter from Chris Caposella, Senior Vice President, Microsoft Office’ . http://www.microsoft.com/interop/letters/ChrisCapOpenLetter.mspx

http://xmlns.com/foaf/spec/

Andrew Warland

Recent Posts

  • Classifying records in Microsoft 365
  • The complicated world of Tasks and To Do
  • SharePoint is not an EDRMS
  • The challenge of identifying born-digital records
  • Can Microsoft technology classify records better than a human?
  • Understanding permission groups in Teams and SharePoint
  • Different ways to access content stored in SharePoint
  • A brief history of electronic document and records management systems and related standards
  • Classifying records in Microsoft 365
  • Managing the retention of records in Microsoft 365 with an E3 licence

Follow me on Twitter

My Tweets

Categories

Site tag cloud

admissibility Audit audit events audit logs audit trails authenticity Content Types Digital Disposal evidence Exchange folders google HTML inferences information integrity language Legal linguistics Logs metadata Microsoft Outlook Preservation probabilities records records management reliability Retention Semantic Office semantics Semantic Web SharePoint Sharepoint 2010 SharePoint 2013 SharePoint SharePoin software technology Trails wave XML

Enter your email address to follow this blog and receive notifications of new posts by email.

Archives

  • April 2021 (3)
  • March 2021 (3)
  • February 2021 (2)
  • January 2021 (2)
  • December 2020 (2)
  • November 2020 (2)
  • October 2020 (2)
  • September 2020 (3)
  • August 2020 (1)
  • July 2020 (3)
  • June 2020 (2)
  • May 2020 (5)
  • April 2020 (5)
  • March 2020 (3)
  • February 2020 (7)
  • January 2020 (4)
  • December 2019 (1)
  • November 2019 (2)
  • October 2019 (5)
  • September 2019 (3)
  • August 2019 (7)
  • July 2019 (2)
  • May 2019 (2)
  • March 2019 (2)
  • February 2019 (1)
  • December 2018 (1)
  • August 2018 (1)
  • May 2018 (1)
  • March 2018 (2)
  • October 2017 (2)
  • September 2017 (1)
  • August 2017 (1)
  • July 2017 (3)
  • April 2017 (1)
  • March 2017 (2)
  • December 2016 (1)
  • September 2016 (1)
  • May 2016 (1)
  • December 2015 (1)
  • November 2015 (1)
  • September 2015 (1)
  • May 2015 (1)
  • August 2014 (1)
  • June 2014 (3)
  • October 2013 (1)
  • September 2013 (1)
  • May 2013 (3)
  • March 2013 (2)
  • February 2013 (1)
  • November 2012 (1)
  • October 2012 (1)
  • September 2012 (1)
  • July 2012 (1)
  • June 2012 (3)
  • May 2012 (2)
  • April 2012 (1)
  • March 2012 (1)
  • February 2012 (1)
  • September 2011 (2)
  • November 2010 (1)
  • June 2010 (1)
  • May 2010 (1)
  • January 2010 (2)
  • November 2009 (12)
Blog at WordPress.com.
Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy