Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

It is common for organizations to have duplicate information on different systems. For example, student information could be stored in both an SIS and Learning Management System (LMS).  As more systems are brought on line, more data is duplicated. Disparate systems aren't necessarily a bad thing but often a sign that an organization allows groups to acquire systems that meet their needs.

To solve this, organizations build adaptors to extract and transform the data to keep the myriad of systems up to date. This can be done gradually with small subsets of fields and gradually expanded. This process is known as Data Integration or DI. Anyone that has had even limited IT experience understand the pitfalls that follow when trying to keep a company’s DI process in check. One of the biggest problems is knowing where the truth for any given record is since it is stored on multiple systems.  A truth record for a customer is often spread across multiple systems. This truth record or master record is also known as the Master Data Record or MDR.

MDM solves many DI problems by creating a separate system that either holds (centralized MDM) or references the truth data (federated MDM). The MDM YOUnite MDM handles data by either:

  • Storing it in the YOUnite Data Store
  • Referencing data stored in other systems through YOUnite Adaptors - This is known as federated MDM

The  data analysts and architects attempt to create a universal schema (data domain or domain) that will work for all systems. For example, if there are ten different applications using a student record the MDM the  data architect would create a “student” domain that will work for all all ten applications. This is not an easy task and includes analysis techniques and DI/MDM features; some of which will be touched on here.

To further clarify, centralized MDM has a data store that holds the MDRs for the domains the YOUnite Data Store that holds  domain data in an organization (example domains include students, courses, course-sections, faculty, etc) where federated MDM domains references the data where it lives (College SIS, LMS, Registration, etc.) and extracts/updates it as needed based on the permission of the entity making the request.

YOUnite is a hybrid MDM hybrid  product allowing the MDM the  architects to define domains as either centralized or federated.

The terms MDM, DI and Master Data get used a lot and need clarification:

  • MDM is the process of describing and cataloging data inside of an organization and understanding which stakeholders value which sources of data. 
  • DI is the process of integrating disparate data. This ranges from annual CSV exports and imports between systems to real-time connectors between systems.
  • Master Data is what is considered the source of truth for a given data domain. See "Establish the Truth" below.

Making DI & MDM Easy is Impossible

YOUnite's primary focus has been to make this process as easy and non-intrusive as possible. ‘Nuff said .

Always Start by Analyzing the Use Cases

If instead you start by analyzing the data, you will be adding to an already exceedingly arduous process of normalizing data by analyzing data that isn’t relevant to your MDM needsprocess.  

Example: A college system uses a Learning Management System (LMS)  that . The LMS also has features for Ed Planning however, the college system uses a separate system for Ed Planning so an MDM a data analyst would be wasting their precious time if they were to catalog the entire LMS schema since the Ed Planning system in the LMS is not used.

...

Asking use case questions in terms of RESTful operations (HTTP GET, PUT, POST and DELETE -- following REST principles) can help keep focus on what can become a very convoluted process of analysis -- if the analysis deviates from this it almost always leads to paralysis. Ultimately federated MDM YOUnite breaks transactions down into RESTful operations and if you know which operations to avoid then a lot of time can be saved.

...

Out of analysis you discover the truth i.e. which systems hold the truth values for a given domain. As you catalogue the entities it is important to note which systems hold the truth since knowing this reduces the amount of analysis required.

As the data governance staff works through the process of MDM,  truth is often  defined by the DGS but YOUnite provides the flexibility that allows the ZDS to define effective federated master data. In other words, "what may be truth for one zone or, the organization as a whole(what is defined as master data by the DGS)  may not be master data for another."

Example: In a college system, the truth for the “name” elements (first, last, etc)  for the student attribute, is stored in both the College Application system and the College’s Student Information System (SIS). A learning management system (LMS) at college system should receive name and email address updates when made in the College Application system or SIS but, the converse is not true i.e. the College Application system and SIS do not want student name changes made from the LMS (since name changes made at the college should only be handled by staff with the appropriate permissions to do so).

Knowing this, you no longer need to worry about what issues may arise from sending data from the LMS to other systems and focus primarily on how data will flow from either the Application System or SIS into other systems such as the LMS.

The MDM

...

Process is a Multi-Dimensional Cross-Cutting Concern

There is no way around it, you must analyze:

...

This will uncovers most of the challenges and meta data needed (metadata is data that is not part of the MDR data record or that you hoped you would not have to add to the MDR data record but is required to properly store the data).

Example: Incoming freshmen at a college need to take an assessment test to determine which English and Math courses they should be placed into. The assessment holds raw test scores and the SIS system wants to combine the assessment scores with past college and high school course scores from the student’s transcripts and then, create its own score. In other words, the SIS wants the assessment tests but it does not store the assessment test scores - it only uses them as a function of creating a course placement ranking.  

Adaptors are the MDM DI custom software that connect the application (e.g. SIS, Assessment, etc) to the MDM system. They map data domains (and metadata) to operations in the application and follow protocols about data transformation and data governance i.e. who can see/update what (YOUnite provides fine-grained data governance controls between groups inside an organization).

It is easiest to think in the following terms and build MDM worksheets "Data Domain Worksheets" as follows:

DELETE or GET or POST Entity -> {adaptor1, adaptor2...adaptorN}

PUT Entity?attribute=key&value=value -> {adaptor1, adaptor2...adaptorN}

Ultimately, the MDM data architects create a worksheet that contains the required attributes to complete an operation for a given entity for a given adaptor.

...

Example: There is never a situation where the analysts for the College Application system wants the MDM system YOUnite to create (POST) a new student - they need to maintain control of that process so there is no need to analyze the required elements for a POST /student for the College Application system.

Generally Speaking, All Changes to an

...

Data Record Should Generate a Notification to All Adaptors Interested in That Data Domain

If that application tied to an adaptor has a well written RESTful interface it will allow you to register a callback for changes -- if not then you will need to discover a way to detect changes.

...

If Data Elements Are Used by Only One System, Then Don't Normalize Them Unless They Are Used Inside Another Data Domain

The job of a MDM the data analyst is to create as little work as possible. A single element added to an MDR a federated data domain has an exponential effect on the complexity of the overall system.

Example: A college system uses an Ed Planning system that tracks meetings between the student and college faculty and staff. Others systems may use the Ed Planning data but if no other systems in the systems use the scheduling system, then the schedulng data can be ignored in respect to MDMmodeling student, faculty or college data domains.

A Couple of Additional Points

  • The MDM YOUnite adaptor might need to read and manipulate non MDR attributes to complete transactions

  • When building an MDR worksheet you also need a “non MDM” reference data worksheet too to keep track and reference elements not considered for the MDR. This is data that infrequently changes (e.g. States, Countries, etc) but is commonly cross-referenced by other domains (e.g. customers). A decision should be made where the reference data should reside and consideration should be made to storing some or all of the reference data in the YOUnite data store for performance reasons.