Most information management professionals (in the context of this post – records managers, information managers, librarians) are familiar with the use and application of metadata.
The metadata in their domain of work may:
- Form part of the built-in properties of digital records, and remain with it wherever it is stored, as part of its metadata ‘payload’. Commonly this is the title or name, date created, and creator/author.
- Record (usually additional) details about, or provide the context for an object when it is captured or registered in a system (‘point of capture’ metadata). This metadata may include classification terms or numbers, object types, access and security controls, storage location, and the container or aggregation.
- Record various actions and events through the life of the object (‘process metadata’), including when the record was accessed/used, modified or deleted/destroyed, and by whom.
This post discusses how the metadata and the metadata skills and related knowledge of information management professionals are closely related to a broader set of skills and knowledge including enterprise data modelling. Information management professionals might consider learning more about this subject as a career path.
What is data modelling?
Most of the definitions for the term ‘data modelling’ have the same three ‘layers’ (or variations), usually shown in the form of a pyramid:
- Conceptual: A (usually simple) model that shows all the high-level data entities and their relationships across an organisation. For example ‘Customer’, ‘Employee’, ‘Property’, ‘Organisation’. See below for an example.
- Logical: A more detailed model of each entity in the conceptual model, showing the multiple logical entities that exist for each conceptual entity, their attributes and associations (relationships). The attributes for each level 2 entity are more or less metadata fields. See below for an example.
- Physical: A database schema or framework for how data is actually stored in a database. Physical data models are usually very complex and are often regarded as Intellectual Property (IP) by product vendors.
This post focuses only on the first two layers, conceptual and logical.
Elements that make up data models
Both the conceptual and logical data models are made up of entities, attributes and relationships/associations.
- At the conceptual level, entities are the highest level groupings that are related to each other. For example: ‘Services’ or ‘Products’, ‘Vendors’, ‘Employers’, ‘Property’, ‘Customers’, ‘Accounts’ or ‘Organisation’.
- At the logical level, entities are the various data elements that make up the level 1 entity). One way to think of this would be to consider a data entry screen where all the details about an entity must be recorded. For example, what data would or should be captured about an Employee, or a Service? In complex organisations, second level logical models may have several hundred entities or may need be broken down into related sub-entities.
- At the conceptual level, attributes may simply be the logical layer entity names and a definition for each.
- At the logical level, entity attributes define each of the entities. Depending on the complexity of the model, level 2 entities may be single entities (for example, ‘Employment Type’) or grouped (for example, ‘Personal details’ or ‘Contact details’); in these cases, the attributes become the equivalent of metadata or field names, e.g., ‘Surname’, ‘First name’, ‘Gender’, ‘Date of Birth’.
- At the conceptual level, relationships are usually relatively simple. For example, an employee ‘works in’ the organisation; the organisation ‘sells’ services or products to customers who ‘pay’ for them.
- At the logical level, relationships are also relatively simple but there will be more of them as there are more entities. For example, an employee ‘has’ a position in the organisation, and ‘has’ a salary level. The employee ‘has’ personal details (e.g., name, date of birth).
Example Level 1 conceptual model
The following diagram is an example of a conceptual model, showing the high-level entities and the relationships between them. What are the level 1 entities in your organisation?
In many organisations, line of business systems can be mapped to each of these entities because that is where the data for those entities is stored. For example:
- Employee information may be managed in a Human Resources Information System
- Accounts or financial information may be managed in a Financial Management Information System.
- Client information may be managed in a Client/Customer Relationship System and other client-specific systems.
- Property information management may be managed in a Property Management System.
Unstructured data, in the form of records, relating to each of these entities, may be managed in a centralised document and records management system, network file shares, email, or other alternatives.
Example level 2 logical model
The following diagram is an example of an actual logical model for the entity ‘Employee’, showing the multiple entities and the relationships between them. If you have identified the level 1 entities in your organisation, what are the level 2 entities?
If you plan to create data models, it is a good idea to use appropriate software; neither Visio nor PowerPoint are in any way suitable for level 2 data modelling. The level 2 model above was created using the software application ‘Enterprise Architect’ from Sparx Systems. This system allows multiple entities to be created independently and then brought together in a model as required, with relationships automatically indicated. In the model above, the level 1 entity precedes the level 2 entity name. There are three entities: Employee, Common (as these entities are common to multiple level 1 entities), and Organisation. The attributes for each entity indicate whether they form part of a group (e.g., ‘Contact details’), or are metadata attributes in their own right (for example ‘Family Name’ or ‘Date of Birth’ in the ‘Personal Details’ entity, which is a grouping related to the ‘Employee Details’ entity.
In the same organisation, the ‘Client’ level 2 entity data model contained several hundred entities and so several sub-entities (based on the ‘Client type’ entity) were created.
Data models and data dictionaries
The metadata attributes depicted for each entity in the level 2 model show only the ‘field’ name. For example the Personal Details entity includes the field names ‘Family Name’, ‘First Name’, ‘Gender’ etc. The data model does not normally provide further detail.
Instead, details for all entities, including their attributes and associations, can be defined in data dictionaries. The following are examples of the information that should be defined for each level 2 entity attribute:
- Name (e.g., ‘Gender’)
- Data Type
- Text/String (‘free text’).
- Boolean (yes/no)
- Data Format
- Choice (and the actual choice options)
- Relationship (e.g., with other attributes)
This type of detail and form should be familiar to most information management professionals. Data dictionaries can also include other more specific metadata entity details as well, including things like Function, Activity, Document Type and so on, even if these are not represented in actual data models.
The value of data models and data dictionaries
Conceptual and logical data models – and the data dictionaries that describe the details in these models – are essential information artifacts, especially in larger organisations with multiple business systems. They:
- provide an easy to understand, visual conceptual and logical overview of all the data across the organisation;
- can be used in discussions with third-party database vendors, as they define an ideal objective against which acquisition decisions may be made;
- help to understand why there are issues or problems with data quality. For example, a database may allow free text data for important data elements that then cannot be easily analysed;
- help to define issues with or support analytical and business intelligence outputs;
- show that the organisation is serious about managing data consistently and appropriately;
- have the potential to help reduce costs or increase efficiencies. For example, a critical database may allow for free text entry (which takes time and is prone to error) when a choice option would be far more efficient and accurate.
The pathway from metadata to data modelling
As noted at the beginning of this post, most information management professionals have the skills and knowledge required to manage metadata. Information management professionals can draw on these core skills to develop or refine data models for the organisation.
In the first instance, knowing if data models even exist would be a good step. Even if they already exist, a discussion with IT (or the relevant person responsible for managing the data models) could be a good idea.