CCC Data Warehouse - Direct Access User Guide

Updated: July 14, 2021

This guide provides an implementation configuration overview for establishing a direct access connection (ODBC/JDBC) to the CCC Data Warehouse for authorized CCC colleges and districts.

Secure access to CCC Data is provided to institutional researchers, college and district administrators, and other decision-makers across the California Community Colleges, district offices, and the Chancellor’s Office, where these critical data may be used to support instructional and institutional decision-making aligned with the Chancellor's Vision for Success


Contents


About the CCC Data Warehouse

CCC Data Warehouse is developed by the CCC Technology Center in coordination with, and at the direction of, the CCC Chancellor's Office. A part of the Data Services Program initiative from the California Community Colleges Chancellor's Office, CCC Data Warehouse provides the necessary infrastructure to the California Community College System to aggregate data across disparate systems to an enterprise data warehouse. 

 

Direct Access Connection

For authorized Researchers at CCC institutions who want to connect local applications to data sources within CCC Data, the CCC Tech Center (CCCTC) now supports direct access connection to the CCC Data Warehouse as an alternate or additional option to the Data Warehouse (Jaspersoft) Report Server.  Local applications may include an analytics or business intelligence application such as Tableau or Power BI. 


Once connected, your college will be able to:

  • Connect to available data sources through a site-to-site VPN using service account login credentials (provided).

  • Run SQL queries against available data sources.

  • Connect local analytics/BI application through ODBC or JDBC access.

  • Connect CCC Data to a/your local data warehouse by setting up these data as external tables within your local data warehouse.

IMPORTANT: The CCC Data Warehouse is not to be used as a staging ground for data distribution to your local data warehouse. This type of usage will not be permitted as it places an undue load and cost on the CCC Data infrastructure.

Reminder: Direct access is an alternate or additional option to the CCC Data Warehouse Report Server for authorized CCC institutional researchers.

 

Available Data Sources

Direct access connection is supported for the following data sources within the CCC Data Warehouse:

  • CCCApply: Application (Credit and Non-Credit App)

  • CCCApply: International Application

  • California College Promise Grant Application

  • Multiple Measures Placement Service (MMPS)

  • Canvas Data (Opt-in required)

  • Chancellors Office Curriculum Inventory (COCI)

  • Course Identification Numbering System (C-ID)

  • NOVA (CCCCO Only)

  • Launchboard (CCCCO Only)

  • MIS (CCCCO Only)

Table A: Data Availability Chart

Source Data Set

CCCCO Access

College Access

Requires Opt-In for inclusion

Introduced in Release Version

DW Direct
Connection

DW

Report Server

DW Direct
Connection

DW

Report Server

CCCApply Standard Application

 

CCC Data 2.0

CCCApply Noncredit Application

 

CCC Data 2.0

CCCApply International Application

 

CCC Data 2.0

California College Promise Grant Application

 

CCC Data 2.0

Decrypted LGBTQ Data (and Report)

 

CCC Data 2.0

Multiple Measures Placement Service (MMPS)

 

CCC Data 2.0

COCI: Chancellor's Office Curriculum Inventory

 

CCC Data 2.1

C-ID: Course ID Numbering System

 

CCC Data 2.1

CANVAS (Opt-in)

CCC Data 2.1

MIS

 

CCC Data 2.1

NOVA

 

CCC Data 2.1

Launchboard

 

CCC Data 2.1

Data dictionaries for the above data sources are available on the CCC Data Warehouse documentation site

 


Implementation Process

Upon request from an authorized individual at a college or district office (often the Dean overseeing Institutional Research), one or more members of the CCCTC Enabling Services team will work directly with your college IT group to configure a custom site-to-site VPN based on your system, and establish secure direct access connection to the CCC Data Warehouse from your college (or district). Following the successful implementation and query validations, the ES Support team will be able to provide post-implementation (Live) support through the CCCTechnology.info support channel.

 

Process Overview

The process to implement a direct access connection to the CCC Data Warehouse (Redshift), including a summary of the roles and responsibilities for participants, is listed below.

  1. Requesting Access

  2. Establishing the VPN Configuration

  3. Establishing the Account Credentials

  4. Connecting to the (Redshift) Database Using SQL Workbench

  5. Data Access Validation & Testing Queries

 

Participation

Participant

Is responsible for…

Participant

Is responsible for…

College / District Researcher

  • sends request for direct access connection to their ES CRM.

  • identifies their primary IT and Researcher contacts.

  • Vetting the user(s) and confirming authorization of requested scope.

Enabling Services (ES) College Relationship Manager (CRM)

  • handles the incoming request for direct access and initiates the ES processes.

ES Implementation Configuration Engineer (ICE)

  • communicates directly with college primary IT/Network contact and provides access to online documentation.

  • oversees the implementation project, tracks tasks, and communicates status updates.

College / District IT/Network Admin

  • ensures ICE engineer has required system configuration information to support project (see details below).

  • works directly with ICE engineer to implement site-to-site VPN, connection to the database, and account login.

  • provides support to local Researchers working remotely to connect to VPN.

ES Implementation Configuration Engineer (ICE)

  • facilitates configuration of custom VPN and tunnel based on college system information.

  • creates service account and provides secure login credentials to authorized user(s).

  • supports IT contact with connection and initial query testing.

  • provides implementation status updates and upon completion, hands off “live” account to ES Support as needed.

ES Support Services

  • facilitates issue resolution for post-implementation (live) support needs from college primary contacts

Colleges and districts using their own analytics/BI applications are responsible for their own support in the use of these tools.

 


First Step: An authorized individual sends a request to your ES College Relationship Manager (CRM) to implement a direct access connection (ODBC/JDBC) to the CCC Data Warehouse.

Requesting Access

The first step in establishing a direct access connection is for an authorized individual at the college or district (typically the Vice Chancellor or Dean overseeing Institutional Research) to make a request directly to the CRM assigned to your college. Upon approval, your CRM will initiate the implementation process by passing the required information to the ES implementation team to get started on your custom VPN and account credentials.

Please identify and provide the following information to your CRM:

  • Primary Researcher Contact: Name and email address of the Researcher who will be responsible for the account login credentials.

  • Primary IT/Network Admin Contact: Name and email address of the IT/Network Admin contact who will work with ES to establish the site-to-site VPN connection.

  • System Configuration Information: Firewall make, model, and version, and IP information.

Start with Your CRM: The initial request should be made to your College Relationship Manager. If you do not know who your CRM is, send an email to crms@ccctechcenter.org.

Next Step: ES ICE engineer works with your primary IT contact(s) to facilitate the configuration of the VPN and establish the account credentials.


Establishing the VPN Configuration

A secure direct access (ODBC/JDBC) connection between your college or district (MIS scope) and the CCC Data Warehouse Redshift cluster is enabled using a custom site-to-site VPN tunnel and service account with authorized user credentials. The data accessible by the college or district is based on the scope of access authorized to the organization

Setting up the VPN and the service account can happen simultaneously; however the configuration of the VPN should happen first as it often requires more time to implement (approvals, IT resources).

Reminder: Ensure your ES ICE engineer has all necessary system configuration information, including your firewall make, model, and version, and your IP information.

Next Step: ES ICE engineer establishes the service account and provides the account login credentials to primary IT contact.


Establishing the Account Credentials

In order to access the (Redshift) databases, a service account will be established based on the scope of your authorized access (MIS scope). Your ES ICE engineer will provide the account login credentials (username and password), as well as the database endpoint (DNS) and schemas roles (also provided in this document), to your primary IT contact through secure transmission.

Note that the login credentials are a Service Account, while they are issued to an individual they are intended to be used to connect applications (such as Tableau) to the CCC Data Warehouse. Protecting these credentials is the responsibility of the college.

 

Scope of Access

Authorized access to the CCC Data Warehouse for colleges and districts using direct access connection (ODBC/JDBC) is permitted through secure login credentials to your local network VPN.

Configuration of the authorized user’s account credentials are based on the scope of their access, where the scope is defined by their 3-digit MIS code. In this context, the scope of their access is defined as <misScope>. For example, authorized district users will have a scope of access that is represented by their district MIS code (example: 210). College users will have a scope represented by their college MIS code (example: 211).

The <misScope> will appear in the user’s credentials (username) and also in the database schema formats used in SQL queries during the Connecting Using SQL Workbench validations.

 

 

Next Step: College or district establishes their connection to the Data Warehouse (Redshift) databases using SQL Workbench.


Connecting to the Data Warehouse (Redshift Cluster)

This section is designed for authorized colleges looking to gain access to the CC Data Warehouse Redshift cluster databases. AWS Redshift is a secure, cloud-based data warehouse service used for collecting and storing large scale data sets and enables users to analyze data using BI tools.

Connection Requirements

In order to complete the connection process (SQL Workbench install) and access the CCC Data Warehouse (Redshift cluster), you will need to meet the requirements below.

  1. Complete VPN Configuration & Log In from Internal Network: Log in and connection to the Redshift cluster must originate from your local network.

  2. Account Login Credentials: Obtain your account credentials (username and password) and the database endpoint (DNS), provided by your ES implementation team.

  3. Database Connection Strings & Schemas: Ensure the database names, connection strings, and schema roles are correct to access specific databases (see Table B: Application Database Names below.)

Remote Connection: If you are working remotely and the district has authorized it, you will need to establish a connection to the district’s internal network using a separate VPN client the district has provided for this purpose.

Next Step: Follow the instructions to install SQL Workbench and configure database access connection.


Notice: Below are instructions for implementing SQL Workbench to connect to the CCC Data Warehouse. SQL Workbench is supported by the CCC Technology Center, and is recommended to assist with initial data validation but is not a requirement.

Connect Using SQL Workbench

SQL Workbench is a database GUI used for accessing many different databases. The instructions below are similar to those found on the AWS website for SQL Workbench, but more in depth regarding the actual installation steps.


Process Overview:

  1. Prerequisite: Install Java (11 or higher recommended)

  2. Download & Install SQL Workbench

  3. Download Redshift Drivers & Test Connection

  4. Configure Database Connection String

  5. Table B: Application Database Names & Schema Formats

Reference: Visit the AWS website for Connecting to Your Cluster Using SQL Workbench

 

Prerequisite: Install Java

SQL Workbench requires a Java 8 (or higher) runtime environment. You can either use a JRE ("Runtime") or a JDK ("Development Kit").

Build 126 is the last version to support Java 8, following versions will require Java 11 or higher.

See Section 4.1: Installing & Starting SQL Workbench of the SQL Workbench Manual for additional information.

Strongly Recommended: SQL Workbench/J requires Java 8 or later; however, Java 11 or later is highly recommended, especially with High-DPI screens.

 

Download & Install SQL Workbench

Download SQL Workbench/J here: https://www.sql-workbench.eu/downloads.html

Select the generic package for all systems with the optional libraries. We recommend saving this somewhere with an informative name that is easy to access on your local machine.

NOTE: The generic package contains the jar file, the manual (HTML and PDF), shell scripts for Linux/Unix based systems (including MacOS) to start the application as well as a Windows® launcher and sample XSLT scripts.

Screenshot shows the download link for
Generic Package with Optional Libraries


Unzip the downloaded folder and select the sqlworkbench.jar (JAR) file inside to open SQL Workbench.

Screenshot showing the SQL Workbench Jar
file being selected.

Mac Users: Follow these instructions to generate an SQL Workbench executable file: https://www.sql-workbench.eu/macos-binary.html

More details about installing and configuring the application can be found in the manual

You will need to configure the necessary JDBC driver(s) for your database before you can connect to a database. Please refer to the chapter JDBC Drivers for details on how to make the JDBC driver available to SQL Workbench/J

 

The application is now installed and the Select Connection Profile screen appears.

 

Next Step: At this point, you need to download Redshift drivers and test the connection.



Download Redshift Drivers and Test Connection

You will need to download the Amazon Redshift JDBC driver version 2.0 to enable SQL Workbench to communicate with Redshift. The file you need to download is below.

Zip File: JDBC 4.2–compatible driver (without the AWS SDK) and driver dependent libraries for AWS SDK files version 2.0.

 

Once you have downloaded and unzipped the Redshift drivers to a directory of your choice on your machine, go back to your SQL Workbench install and select `Amazon Redshift` from the Driver dropdown.

You should see a prompt asking you to configure the driver. Select ‘Yes’.

 

The ‘Manage Drivers’ menu will come up. Click the Folder icon and navigate to the directory that contains the driver package that you recently unzipped. 

 

Select the Redshift driver, which will be named like ‘Redshift*.jar’ then click ‘Open’. See example in the screenshot below.

Once your driver file is selected, click ‘OK’ on the Manage drivers screen.

 

 

Next Step: Configure the Connection Profile screen with account credentials (username and password) and connection string.

Configure the Database Connection String

With the drive selected, a sample endpoint will autofill in the URL field that will need to be modified, and several other fields will need to be filled out.

Example of new Select Connection Profile screen.

The important fields to configure are:

  • Name: The name for this connector that you will reference later when you want to access this database again. This is for your use only, so make it descriptive enough to where you understand clearly which DB you are accessing. For example, if you are going to query the COCI DB, something like `DWH-COCI` might be appropriate. Signifying the environment in this name is highly recommended.

  • URL: Where to reach the database. Edit the autofilled string here with your custom endpoint (example shown below).

Environment

Endpoint

Production

jdbc:redshift://dwh-prod.ccctechcenter.org:5439/{{database}}

Ex: jdbc:redshift://dwh-prod.ccctechcenter.org:5439/canvas

Replace {{database}} with the specific Database Name shown in Table B below.
Where “database” is the name of the database to which you are trying to connect.

  • Username: the PostgreSQL username you are using to login.

  • Password: the password for the username above. 

 

Place a checkmark in the Autocommit box (as shown in the example below.)

Click `Test` in the lower right to verify you have connected successfully. Then click OK.

Example of Configuring Your Driver Connection

 

Table B: Application Database Names

The value of <misScope> can be derived from the prefix of your Data Warehouse Direct user name which is in the format of <misScope>_<firstInitial><lastName> i.e. 000_jdoe where 000 would represent their MIS scope value. An example of the resulting schema for accessing tables within the Application database = dw_apply_read_000.

Application (Data Source)

Database Name

Query Schema Format

Application (Data Source)

Database Name

Query Schema Format

CCCApply Application

application

dw_apply_read_<misScope>

CC College Promise Grant

bogfw_application

dw_apply_read_<misScope>

CCC International Application

intl_application

dw_apply_read_<misScope>

Multiple Measures Placement Service (MMPS)

mmps

dw_apply_read_<misScope>

Chancellor's Office Curriculum Inventory (COCI)

coci

dw_coci_read_<misScope>

Course Identification Numbering System (C-ID)

cid

dw_cid_read_<misScope>

Canvas

canvas

canvas_rs_<misScope>

Note: Data dictionaries for the above data sources are available on the CCC Data Warehouse documentation site

Troubleshooting: If the test fails, the two most common fixes are:

  1. Verify you have entered the URL, your username, and password correctly.

  2. Make sure you are connected to the Colleges’ VPN.

 

Next Step: Validate your connection and data access.


Data Access Validation & Testing

Once connected to the CCC Data Warehouse with your own tool, colleges are encouraged to conduct a series of data access tests and activities, which may include:

  • Test Query Data Sources

Through local resources, the participant is able to connect to, and run at least one query against each of the available datasets (CCCApply application, CCCApply international application, CCCApply Promise Grant, and Multiple Measures Placement Service).

  • Connect Data to Local Data Warehouse

If a local data warehouse is available, configure a CCC Data Warehouse table as an “external table” in the local database source. This supports data within the CCC Data Warehouse to be connected to your local data warehouse without the need to copy data to the local machine or district server.

CCCTC Enabling Services Support is available to assist with issues connecting to the CCC Data Warehouse.  

 

ES works with college to validate initial data queries.

  • Using provided user documentation (this user guide), college configures BI tool to connect to the Data Warehouse.

  • College will confirm local connection and successful login, as well as perform optional data validation activities.

  • Online documentation is available to support data access validation activities for Researchers.

 

Next Step: Work with your ES ICE engineer to confirm access validation and query tests. Discuss any unfinished implementation steps before hand-off to ES Support.


Canvas DW Direct Connect Service

Colleges may request to access their Canvas data through the CCC Data Warehouse via direct access connection (ODBC/JDBC). The ability to access Canvas data will require the college Canvas Administrator to generate and pass their Canvas API credentials to the CCCTC Enabling Services Implementation Engineer as part of the configuration process. Once received and implemented, the data pipeline will be configured to pass your Canvas data to the CCC Data Warehouse. Following that initial pass, the data will be updated nightly. 

To get started, please send an email requesting Canvas DW Direct Connect access for your college to your College Relationship Manager (CRM) at CCCTC Enabling Services.

Required: Colleges must be “live” with the basic DW Direct Connect service in order to implement Canvas DW Direct Connect access. If your college has not yet configured site-to-site VPN access with the CCC Data Warehouse, please contact Enabling Services to get started. 

Learn more about the Canvas DW Direct Connect service, including the college preparatory requirements before and during the implementation process: Obtaining Your Canvas Data API Access Credentials (for the CCC Data Warehouse).

 


Making Queries

After the database connection is validated, you will be given the following prompt that will allow you to entry SQL Queries.

A tab labeled Database Explorer displays more information about the database that your user has access to. Table names can be found here, as well as some basic additional information about the DB.

Database Explorer Tab Provides More DB Information

 

Finally, you can run SQL against objects your DB user has access to.

SQL Allowed by Your User Can be Run in the Statement Tab

 

Example Query to run on Data Warehouse Data

To get a row count of a table you can run the following query

1 select count(*) from dw_apply_read_523.contact;

You will need to replace the dw_apply_read_523 with the appropriate schema your user has access to. You can also use the database explorer in SQL workbench to navigate the data.

 


CCC Data Warehouse Direct Access FAQs

Q: Can I have more than one VPN connection for my district?

A. Not at this time. If this is needed, please discuss with your CRM.

 

Q: My IT department does not want to create a site-to-site VPN, can I just have an individual VPN connection? 

A. No, the CCC Tech Center is not staffed to handle individual user VPN requests.

 

Q: Can I have more than one individual at the college (or district office) connect to the VPN at a time?

A: While the login credentials are intended to serve as a Service Account for the connection of local applications to the CCC Data Warehouse, individual accounts may be requested for authorized users where this is needed.

More: See the CCC Data Warehouse Frequently Asked Questions for more information and frequently asked questions.