MDM Background & Notes

From Lou's Back of the Napkin MDM notes:

What feature are considered best of breed?: 

  • Good data quality
  • proper metadata management
  • agile SOA
  • Performance tuning

What are our needs?

  • Multi-domain support: A domain is data model. A domain can have a narrow focus (e.g. "students") or  broad (e.g. HEDM).  HEDM and SAP are examples of single-domain products that are focused on a given vertical. 
    • Informatica, Orchestra, Riversand is .Net only, TIBCO) - Oracle is not multi-domain
      • Some products may allow for the user to configure a domain but will only support one (perhaps Oracle fits that bill?)
    • Flexible schemas (models) with the ability to create different versions of the same model - the prevents having to rev all apps tied to a given version of a model 
  • Data modeling: The process of cataloging and reconciling the difference between the similar data stored in different systems
    • The domain modeling process should be fairly interactive and flexible. A bad approach would be to have to configure DB tables a better approach would be an interactive UI that allowed the data architect/steward to view the schemas/models of the multiple application being reconciled while building the domain model.
  • Governance models: The resources and processes involved in controlling access to master data.
    • Permissions can become very complicated quickly - think of NTs nightmarish file permission hierarchy. 
    • A permission model simplicity needs to be factored into the decision process
      • Easy to set/view for the data stewards and MDM developers
  • Centralized vs Federated (or both): How is the data stored?  Is it stored in a golden DB that is part of the MDM system or does MDM reach in to where the data is stored and re-create master data on the fly?
    • CCC needs both:
      • The capability of federated enables buyin from stakeholders that aren't comfortable or fully onboard with MDM
      • For performance and simplicity reasons, as stakeholders become more comfortable with MDM, new systems are brought online or systems are upgraded – the stakeholders will consider the golden DB approach.
  • Integration focused vs. information management
  • Synchronization

 

Most vendors are built for a single data domain - CRM or Product. 

  • We would need to break new ground with educational data.

 

Open Source vendors: 

  • Akeneo - Not in gartner quadrant.  Not multi-domain.  Product focused.
  • Pimcore - not in gartner quadrant.  Not multi-domain.  Product & Content Management focused.
  • Talend - not in Gartner Quadrant at all

Feedback from users:

  • significant time and expense involved.

Interesting stuff:

  • Kettle ETL (open source)
  • DataCleaner (commercial open source) for data quality