Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


...

This page provides some background on many of the challenges in orchestrating the MDM process when defining data domains in a Federated data management system. The process of moving an organization towards MDM is rigorous but, if done properly, it provides a single interface for synchronization, governance, data event notifications, and a golden-source-of-truth operational data store.

...

Table of Contents

It is common for organizations to have duplicate information on different systems. For example, student information could be stored in both a Student Information System (SIS) and a Learning Management System (LMS). As more systems are brought on line, more data gets duplicated. Disparate systems aren't necessarily a bad thing but they are often a sign that an organization allows groups to acquire systems that meet their needs, which adds complexity and can strain an overall system.

...

  • MDM is the process of describing and cataloging data inside of an organization and understanding which stakeholders value which sources of data. 
  • DI is the process of keeping data up to date between disparate systems. This ranges from annual CSV exports and imports between systems to real-time connectors between systems.
  • Master Data is what is considered the source of truth for a given data domain and for a given department or group (zone) inside the larger organization. See Establish the Truth below.

Making DI & MDM Easy is Generally Impossible 

However, YOUnite's primary focus has been to make this process as easy and non-intrusive as possible.

Start by Analyzing the Use Cases

If you start by analyzing the data and building data dictionaries of all the systems that plan to work with MDM (source systems) you will quickly feel like you are trying boil the ocean. You will be adding to an already exceedingly arduous process of normalizing data. And by analyzing data that isn’t relevant to your MDM process the time to complete the data analysis phase can grow exponentially.  

...

  • Data domains 
  • Adaptor development and capabilities
  • Governance requirements
  • Data event notification needs

Establish the Truth
Anchor
establishTruth
establishTruth

Out of analysis you discover the truth, i.e. which systems hold the truth values for a given domain. As you catalogue the data elements in a data dictionary it is important to note which systems hold the truth for the various stakeholders (zones). Knowing this reduces the amount of analysis required by creating a minimum-possible set of data elements for a given data domain. It's also important to understand that different zones can have a different view of which systems hold the truth values for a given domain; this too must be documented as data elements for a given data domain are catalogued. Allowing different zones to define where their source of truth originates is one of the distinguishing features of YOUnite.

Note: A zone refers to a collection of systems/applications owned by groups inside of an organization. 

As the data governance staff works through the process of MDM, "truth" is often defined by the Data Governance Steward (DGS). But YOUnite provides the flexibility that allows the Zone Data Steward (ZDS) to define effective federated master data. In other words, "what may be truth for one zone or, the organization as a whole (what is defined as master data by the DGS) may not be master data for another."

...

Knowing this, you rule out any concers ovesr sending data from the LMS to other systems and focus primarily on how data will flow from either the Application System or the SIS into other systems, such as the LMS.

Think in Terms of REST

Asking use-case questions in terms of RESTful operations (HTTP GET, PUT, POST, and DELETE -- following REST principles) can help keep analysis focused. Ultimately, YOUnite breaks transactions down into RESTful operations and if you know which operations to avoid then a lot of time can be saved.

Example: The College Application system never wants to delete a student once they have been added to the system. Since this is the case, analysis for the DELETE request can be ignored with this application.

The MDM Process is a Multi-Dimensional Cross-Cutting Concern

There is no way around it; you must analyze the following two areas...

...

Example: Incoming freshmen at a college need to take an assessment test to determine which English and Math courses they should be placed into. The assessment holds raw test scores and the SIS system wants to combine the assessment scores with past college and high school course scores from the student’s transcripts and, from there, create its own score. In other words, the SIS wants the assessment tests but it does not store the assessment test scores - it only uses them as a function of creating a course placement ranking.  

Adaptors are software located within a system that shares data through the YOUnite Data Hub and acts as the connection point between that system and the Data Hub. In the example above, adaptors are DI custom software that connects the application (e.g. SIS, Assessment, etc.) to the MDM system. They map data domains (and metadata) to operations in the application and follow protocols about data transformation and data governance i.e. who can see/update what. YOUnite provides fine-grained data governance controls between groups inside an organization.

...

Ultimately, the data architects create a worksheet that contains the required attributes to complete an operation for a given entity for a given adaptor.

Even Though Data Domains Can Be Modeled as Multi-Dimensional Doesn't Mean They Should Be

The JSON modeling tool with YOUnite is very powerful in that it allows a data architect to create very complex inter-dependencies between data domains, which should be avoided. When designing data domains, relational database principles should be followed. The following points illustrate a couple of pitfalls to avoid when building structurally-complex data domains:

...

To summarize, following sound relational database principles will create a master data ecosystem with data records that are easier to manage and to apply governance to.

If an HTTP Operation Is Not Required for an Adaptor, Don't Analyze It

Example: There is never a situation where the analysts for the College Application system wants YOUnite to create (POST) a new student; they need to maintain control of that process. There is no need to analyze the required elements for a POST /student for the College Application system.

Generally Speaking, All Changes to a Data Record Should Generate a Change Event to All Adaptors Interested in That Data Domain

If an application tied to an adaptor has a well-written RESTful interface, it will allow you to register a callback for changes. If not, then you will need to discover a way to detect changes.

...

Note: If data sychronization is happening outside of MDM there is a good possibility that MDM won't detect it and the benefits of unified data governance and data event notifiations won't be realized. For information on Data Governance and developing an Array Advisory Practice to be communicated to adaptor developers for how to handel updated arrays, see Data Domains: Arrays.

If Data Elements Are Used by Only One System, Then Don't Normalize Them Unless They Are Used Inside Another Data Domain

The job of the data analyst is to create as little work as possible. A single element added to a federated data domain has an exponential effect on the complexity of the overall system.

Example: A college system uses an Ed Planning system that tracks meetings between the student and college faculty and staff. Others systems may use the Ed Planning data but if no other systems in the systems use the scheduling system, then the schedulng data can be ignored in respect to modeling student, faculty, or college data domains.

The Process is Iterative

Start small and gradually conect more applications and services in the organization to the MDM ecosystem.

A Couple of Additional Points

  • The YOUnite adaptor might need to read and manipulate non-MDM data attributes to complete transactions.

  • When building an MDM worksheet you also need a reference data worksheet. This is data that infrequently changes (e.g. States, Countries, etc.) but is commonly cross-referenced by other domains (e.g. customers). A decision should be made where the reference data should reside and consideration should be made to storing some or all of the reference data in the YOUnite data store for performance reasons.

...