Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Access the
Latest Data Dictionary web app here (currently QA environment):  CDD
(You will need to login to the VPN aws-vpn-dl-test.ccctechcenter.org)


Panel
bgColor#ffffff
titleColor#ffb600
titleBGColor#002f6d
titleThe CCC Data Dictionary

The CCC Data Dictionary:

  • Is intended to track data and provide a policy for every data structure that will be represented in the data dictionary (ex. a table with columns that points to data instances)
  • Will encompass metadata. Metadata goes beyond the "database of table and field names and formats" to include everything that relates to managing the CCC-wide data, such as the contact person/project for a specific data set, or laws that govern a specific data set's element.
  • Will have a central end goal of ensuring quality data for decision making, which directly improves student paths to success.
  • Is intended to be used by:
    • CCC: IT staff, data stewards, and data governance officers, all who need to integrate their CCC with the entire data warehouse and MDM system
    • CCCCO officials, who need real statistics, reports, and quality data to inform their decision making
    • CCCTC staff who manage and implement the CCC Data Dictionary, keeping it up to date and ensuring it works to meet the needs of all users

The CCC Data Dictionary will provide different views into the different data structures, an opportunity to update data in some cases, and has three aspects to it:

  1. A source registry the source and source definition
  2. A technical data dictionary (column name, length, etc.)
  3. An element data dictionary (metadata)
Data Dictionary AspectDefinitionFunction/Purpose
Data Source Registry

Data source information–a list of the containers that contain the data--such as data that describes the:

  • software application (i.e. CCC Apply, MyPath, MDM, etc.)
  • kind of data source: storage type (SQL RDB, Mongo, and even unstructured data like a PDF, etc.)
  • data type: (3rd Normal Form, RDB, S3 JSON object store, etc.)
  • Allows the CCC Tech Center and CCCCO to see every data source and its details at a glance, allowing efficient search information, informing smarter, future application decisions.
  • Increases data quality for decision making
Technical data

Data source information that includes things like:

  • column name
  • field length
  • data type (i.e. int, char, byte, string, boolean, etc.)
  • link to element data (fname links to first name, for example)
  • Allows the CCC Tech Center and CCC IT staff to quickly identify data types across all applications for all columns, allowing for improved efficiencies in data replication (localized data at the SIS level, for example) and/or future improvements for consistency
  • Receives scripts (the scripts won't write into the data dictionary, as that has public-facing elements)
  • Increases data quality for decision making
Element data

Data source information that consists of:

  • metadata (data that describes your data)
    • anything that relates to the definition, quality, redundancy, access, timing, exchange, and ownership of data (including glossaries, statistics and indicators that are derived from data)*
    • data standards, the common definitions of data elements that every software application and database must use
    • conceptual definitions 
    • tagged elements that link via foreign key to the technical data dictionary (ex.all first name fields across applications link to the data dictionary that has one first name entry ("firstName") that includes the CCC definition of first name, links to the technical data dictionary entry, defining "first name is this, and any new application that has a first name field must be 32 characters long, be data type char, etc.")
    • relationship structures (i.e. parent/child relationship between elements (i.e. the data element "zone" can be viewed as the parent to 17 other elements related to the concept of zone; the concept of Report Center would be a parent to 100 things that define the Report Center)
    • Infrastructure Metrics







* Source: DataSpecs Metadata Podcast

  • Increases data access and quality for decision making: informs decision making with real numbers to help quantify cost (ex. if legislation dictates a change, such as all student first name fields must now be called first_name, we can pull up all instances of first name and report on the cost of making this change)
  • Provides glossary terms where applicable, assisting in end-user education about the application, its purpose, and its data (ex.: We can build a Glossary UI that shows only the glossary fields and their definitions; this would have multiple UI insertions into the same data structure)
  • Improves data management through easier reporting
  • Reduces redundant data
  • Provides controlled data views via permissions 
  • Improves data quality as a whole (ex. researchers and/or the CCC Tech Center can run a report on "first name" against the data warehouse that shows every instance of first name across all applications/data sources, allowing inconsistencies, redundancies, and corrections to be made. (i.e. first name columns called "fname" vs. "first" vs. "firstname," etc.)
  • Allows abstractions that may not have a corresponding technical structure, such as "zones", which has to be defined. There will be technical elements about zones—like an ID–and metadata about zones; each will have a technical and element data dictionary entry.
  • Provides for machine learning (ex. for a glossary of terms that give you a definition of what "zone" is, along with links to other elements that are related to a "zone"; "researchers like you, may also be interested in "zone data steward," "groups," and "ACL", etc.")