Data Warehouse Advisory Group: 08-10-20


Meeting Details

Meeting Date:

Aug 10, 2020

Purpose:

Data Warehouse Advisory Group

Zoom Recording:

https://zoom.us/rec/share/yc11AfLx-DlJRJX9zkrvZ_8fO57ZX6a80yQdqKBcnk1jqlmFtGYP9Jq8pzbIgTOe 

Password: u5@%xK86

Participants:

Crystal Hernandez, Mark Cohen, Steve Klein, Denice Inciong, Alex Jackl, Craig Hayward, Dulce Delgadillo, Dustin Tamashiro, Jake Kevari, Jenni Allen, Louis Delzompo, Z Reisz, Barney Gomez

Agenda

Item

Notes

Item

Notes

1

Discuss the prioritization rubric: CCC Data: Data Source Prioritization - Draft and the List of current and proposed data sources

  • Clarifying question asked by Barney Gomez on the intent of the data source list. We discussed the intent for the proposed data sources spreadsheet is to collect the data sources that the Advisory Group (and other stakeholders) have expressed interest in accessing through the Data Warehouse.

  • Discussed that the prioritization rubric was to define which data sources are a priority to the members of the Data Warehouse advisory group, which would then be subject to CO approval and data governance before we would move forward to bring them into the CCC Data Warehouse.

  • Clarification that sources on (the second tab of the) spreadsheet are not currently available in the data lake or data warehouse, this reflects data that workgroup would like to make available.

  • Decision: CCC Data: Data Source Prioritization spreadsheet adjusted to include a column for "level of access required", so that this information could be captured in the MOU’s necessary to bring this data in to the DW and share it with colleges/districts. As we're prioritizing the data sets, we will be capturing what is the level of access that the community is looking for.

  • Discussed the issue of data quality, decision to adjust the draft prioritization rubric to remove data quality from the scoring for prioritization; instead addressing data quality as a separate issue from prioritization. 

  • Discussion on how to best leverage the IR community knowledge of working with data elements that are newer or seen as poor quality to share that knowledge; along with conversation on need of a staging area or vetting place where people with expertise to work with the data.

  • Barney Gomez introduced the MDM program as a strategy and tool that would be extremely important for the integrity of these data.

  • Discussed splitting out data sources in the spreadsheet to identify different rows for raw from processed data, such as CASAS TOPS PRO data which may exist as raw data, which is then used by two initiatives, and then placed in the WestEd Launchboard. Alex Jackl voiced an opinion that the minute you derive values you create a different data set. The IR community is regularly asked to recreate calculations and are looking for raw data as well as calculated data. Alex Jackl spoke to the data harmonization work and need for dictionaries that document the derived values or metrics.

  • Lou Delzompo spoke of Data Lake used to collect data regardless of quality, with quality to be addressed before moving data to the Data Warehouse, or Data Marts, where these data need to be high quality and usable. Lou also spoke of different technologies that may be leveraged, which address a different technical need based on the needs identified, including the potential for AI to help address data quality.

  • Barney Gomez discussed additional resources being brought on board to support MIS work which may be directed as additional bandwidth to support this work. Barney may discuss these resources at the next meeting.

Proposed agenda for September

  1. Prioritization of proposed data sources

  2. Discussion on how prioritized data sources are to be used

  3. Discuss Change Management

 

Issues/Questions Resolved

Issue/Question

Resolution/Answer

Date Resolved/Answered

Owner

Issue/Question

Resolution/Answer

Date Resolved/Answered

Owner

1

Barney Gomez asked a clarifying question about the intent for the prioritization rubric.

Prioritization rubric was developed to define which data sources are a priority to the members of the Data Warehouse advisory group, which would then be subject to CO approval and data governance before we would move forward to bring them into the CCC Data Warehouse.

Aug 10, 2020

Steve Klein, Mark Cohen

2

Barney Gomez asked a clarifying question about the intent for the proposed data sources spreadsheet.

Clarification that the sources on the spreadsheet are not currently available in the data lake and reflects data that the advisory group would like to make available.

Aug 10, 2020

Steve Klein, Mark Cohen

 

Action Items/Next Steps

Item

Notes

Owner

Item

Notes

Owner

1

Identify and prioritize the data sources that should be brought in to the DL and DW.  This is the exercise we have been working on through the data sources spreadsheet and prioritization rubric

Mark Cohen is working on a survey format to collect ranking of proposed data sources using the prioritization rubric.

DW Advisory Group

2

Identifying how these data would be used in order to address issues of data quality and identify the appropriate applications to be used to access these data. 

This will be an ongoing conversation to identify how these data will be used and whether there are data issues to be resolved prior to making data available in the data warehouse.

DW Advisory Group

3

Request to address Change Management as an agenda item on the next call.

added to Data Warehouse Advisory Group: 09-14-20 agenda.

Mark Cohen, Crystal Hernandez