Data Analysis Principles of Federated Master Data Management

This gives some background on some of the challenges in implementing a Federated Master Data Management (MDM) system including some background on the differences between centralized and federated MDM.

For illustration, a fictitious college system is used.

It is common for organizations to have duplicate information on different systems. For example, student information could be stored in both an SIS and Learning Management System (LMS). As more systems are brought on line, more data is duplicated.

To solve this, organizations build adaptors to extract and transform the data to keep the myriad of systems up to date. This process is known as Data Integration or DI. Anyone that has had even limited IT experience understand the pitfalls that follow when trying to keep a company’s DI process in check. One of the biggest problems is knowing where the truth for any given record is since it is stored on multiple systems. A truth record for a customer is often spread across multiple systems. This truth record or master record is also known as the Master Data Record or MDR.

MDM solves many DI problems by creating a separate system that either holds (centralized MDM) or references the truth data (federated MDM). The MDM data analysts and architects attempt to create a universal schema (data domain or domain) that will work for all systems. For example, if there are ten different applications using a student record the MDM data architect would create a “student” domain that will work for all all ten applications. This is not an easy task and includes analysis techniques and MDM features; some of which will be touched on here.

To further clarify, centralized MDM has a data store that holds the MDRs for the domains in an organization (example domains include students, courses, course-sections, faculty, etc) where federated MDM references the data where it lives (College SIS, LMS, Registration, etc.) and extracts/updates it as needed based on the permission of the entity making the request.

YOUnite is a hybrid MDM product allowing the MDM architects to define domains as either centralized or federated.

Making MDM Easy is Impossible

‘Nuff said.

Always Start by Analyzing the Use Cases

If instead you start by analyzing the data, you will be adding to an already exceedingly arduous process of normalizing data by analyzing data that isn’t relevant to your MDM needs.

Example: A college system uses a Learning Management System (LMS) that also has features for Ed Planning however, the college system uses a separate system for Ed Planning so an MDM analyst would be wasting their precious time if they were to catalog the entire LMS schema since the Ed Planning system in the LMS is not used.

Think in Terms of REST

Asking use case questions in terms of RESTful operations (HTTP GET, PUT, POST and DELETE -- following REST principles) can help keep focus on what can become a very convoluted process of analysis -- if the analysis deviates from this it almost always leads to paralysis. Ultimately federated MDM breaks transactions down into RESTful operations and if you know which operations to avoid then a lot of time can be saved.

Example: The College Application systems never wants to delete a student once they have been added to the system. Since this is the case, analysis for the DELETE request can be ignored with this application.

Establish the Truth

Out of analysis you discover the truth i.e. which systems hold the truth values for a given domain. As you catalogue the entities it is important to note which systems hold the truth since knowing this reduces the amount of analysis required.

Example: In a college system, the truth for the “name” elements (first, last, etc) for the student attribute, is stored in both the College Application system and the College’s Student Information System (SIS). A learning management system (LMS) at college system should receive name and email address updates when made in the College Application system or SIS but, the converse is not true i.e. the College Application system and SIS do not want student name changes made from the LMS (since name changes made at the college should only be handled by staff with the appropriate permissions to do so).

Knowing this, you no longer need to worry about what issues may arise from sending data from the LMS to other systems and focus primarily on how data will flow from either the Application System or SIS into other systems such as the LMS.

MDM Data Analysis is a Multi-Dimensional Cross-Cutting Concern

There is no way around it, you must analyze:

The needs of performing specific operations within each system
Attributes stored in those systems and their data elements

...for each of the required HTTP operations (GET, PUT, POST, DELETE) in a RESTful context.

This will uncovers most of the challenges and meta data needed (metadata is data that is not part of the MDR or that you hoped you would not have to add to the MDR but is required to properly store the data).

Example: Incoming freshmen at a college need to take an assessment test to determine which English and Math courses they should be placed into. The assessment holds raw test scores and the SIS system wants to combine the assessment scores with past college and high school course scores from the student’s transcripts and then, create its own score. In other words, the SIS wants the assessment tests but it does not store the assessment test scores - it only uses them as a function of creating a course placement ranking.

Adaptors are the MDM custom software that connect the application (e.g. SIS, Assessment, etc) to the MDM system. They map data domains (and metadata) to operations in the application and follow protocols about data transformation and data governance i.e. who can see/update what (YOUnite provides fine-grained data governance controls between groups inside an organization).

It is easiest to think in the following terms and build MDM worksheets as follows:

DELETE or GET or POST Entity -> {adaptor1, adaptor2...adaptorN}

PUT Entity?attribute=key&value=value -> {adaptor1, adaptor2...adaptorN}

Ultimately, the MDM architects create a worksheet that contains the required attributes to complete an operation for a given entity for a given adaptor.

If an HTTP Operation Is Not Required for an Adaptor, Don't Analyze It

Example: There is never a situation where the analysts for the College Application system wants the MDM system to create (POST) a new student - they need to maintain control of that process so there is no need to analyze the required elements for a POST /student for the College Application system.

Generally Speaking, All Changes to an MDR Should Generate a Notification to All Adaptors Interested in That Data Domain

If that application tied to an adaptor has a well written RESTful interface it will allow you to register a callback for changes -- if not then you will need to discover a way to detect changes.

Additionally, all new and deleted resources should generate a notification (this is a YOUnite feature).

Example: A college course catalogue system would not get a notification that a student has been deleted from the system but several others would such as the college application system and the college SIS.

If Data Elements Are Used by Only One System, Then Don't Normalize Them Unless They Are Used Inside Another Data Domain

The job of a MDM analyst is to create as little work as possible. A single element added to an MDR has an exponential effect on the complexity of the overall system.

Example: A college system uses an Ed Planning system that tracks meetings between the student and college faculty and staff. Others systems may use the Ed Planning data but if no other systems in the systems use the scheduling system, then the schedulng data can be ignored in respect to MDM.

A Couple of Additional Points

The MDM adaptor might need to read and manipulate non MDR attributes to complete transactions
When building an MDR worksheet you also need a “non MDM” worksheet too to keep track and reference elements not considered for the MDR