This document provides information regarding establishing direct access (ODBC/JDBC) connection to the CCC Data Warehouse for authorized CCC colleges and districts.
Contents
Overview
CCC Data Warehouse is developed by the CCC Technology Center in coordination with and at the direction of the CCC Chancellor's Office.
A part of the Data Services Program initiative from the California Community Colleges Chancellor's Office, CCC Data Warehouse provides the necessary infrastructure to the California Community College System to aggregate data across disparate systems to an enterprise data warehouse.
Secure access to the data is provided to institutional researchers, college and district administrators, and other decision makers across the California Community Colleges, district offices, and the Chancellor’s Office, where these critical data may be used to support instructional and institutional decision-making aligned with the Chancellor's Vision for Success.
CCC Data Warehouse Direct Access
For authorized Researchers at CCC institutions who want to connect their preferred analytic/business intelligence tools, the CCC (Redshift) Data Warehouse now supports direct connection as an alternate or additional option to the Data Warehouse Report Server. Once connected to the CCC Data Warehouse via direct access connection, your college will be able to:
Connect to the CCC Data Warehouse to accesses available data sources:
Connection to occur through a site-to-site VPN.
Access through service account login credentials provided.
Run SQL queries against available data sources.
Connect local analytics/BI application through ODBC or JDBC access.
Connect CCC Data to a local data warehouse by setting up these data as external tables within your local data warehouse.
Local applications may include an analytics or business intelligence application such as Tableau or Power BI.
Note that the CCC Data Warehouse is not to be used as a staging ground for data distribution to your local data warehouse; this usage will not be permitted as it places an undue load and cost to the CCC Data infrastructure
Available Data Sources
Direct connection supports access to the following data sources within the CCC Data Warehouse:
CCCApply: Application (Credit and Non-Credit App)
CCCApply LGBTQ (AB620)
CCCApply: International Application
CCCApply: College Promise
Multiple Measures Placement Service (MMPS)
Canvas Data (Opt-in required) - with 2.1 release
Chancellors Office Curriculum Inventory (COCI) - with 2.1 release
Course Identification Numbering System (C-ID) - with 2.1 release
NOVA (CCCCO Only) - with 2.1 release
Launchboard (CCCCO Only) - with 2.1 release
MIS (CCCCO Only) - with 2.1 release
Data dictionaries for the above data sources are available on the CCC Data Warehouse documentation site.
Learn more about the CCC Data Warehouse Report Server for authorized CCC institutional researchers.
Source Data Set | CCCCO Access | College Access | Requires Opt-In for inclusion | Introduced in Release Version | ||
---|---|---|---|---|---|---|
DW Direct | DW Report Server | DW Direct | DW Report Server | |||
CCCApply Standard Application |
|
|
|
| CCC Data 2.0 | |
CCCApply Noncredit Application |
|
|
|
| CCC Data 2.0 | |
CCCApply International Application |
|
|
|
| CCC Data 2.0 | |
California College Promise Grant Application |
|
|
|
| CCC Data 2.0 | |
LGBTQ Data (and Report in DW Report Server) |
|
|
|
| CCC Data 2.0 | |
Multiple Measures Placement Service (MMPS) |
|
|
|
| CCC Data 2.0 | |
COCI: Chancellor's Office Curriculum Inventory |
|
|
|
| CCC Data 2.1 | |
C-ID: Course ID Numbering System |
|
|
|
| CCC Data 2.1 | |
CANVAS (Opt-in) |
|
|
|
|
| CCC Data 2.1 |
MIS |
|
|
|
| CCC Data 2.1 | |
NOVA |
|
|
|
| CCC Data 2.1 | |
Launchboard |
|
|
|
| CCC Data 2.1 |
Implementation Process
Upon the request of an authorized individual at a college or district office (often the Dean overseeing Institutional Research), the CCC Technology Center (CCCTC) Enabling Services team will facilitate the process of working with the college IT group to establish and confirm a site-to-site VPN connection between the CCC Data Warehouse and the college (or district office).
Requesting Direct Access
The request is to be made of the colleges College Relationship Manager, if you do not know who your CRM is, send an email to crms@ccctechcenter.org.
Process Overview
The authorized college or district Researcher will contact their College Relationship Manager (CRM) to request direct access to the CCC Data Warehouse, and
College will identify a primary Research contact who will be responsible for the log in credentials
College will identify a primary IT/Network Admin contact who will work with Enabling Services to establish the site-to-site connection
Enabling Services will facilitate an implementation project
Enabling Services will work with the identified IT/Network contact at the college to configure a site-to-site VPN and configure the custom VPN tunnel and provide detailed instructions based on college’s system configuration
College will provide required system configuration information, including firewall make, model and version, and IP information
Primary contacts will receive account credentials and VPN configuration information via secure process.
Enabling Services will provide login credentials to authorized user in secure manner
Enabling Services will schedule an Implementation Checklist meeting via Zoom for college IT and CCCTC to confirm successful setup and connection
College will confirm successful VPN setup and connection tasks
Primary contacts will participate in an implementation checklist meeting with ES and CCCTC (InterVision)
College will confirm local connection and successful login, as well as perform optional data validation activities
CCCTC will provide online documentation to support data access validation activities for Researchers
Colleges and districts using their own analytics/BI applications are responsible for their own support in the use of these tools.
Account & VPN Configuration Requirements
To facilitate setup of a CCC Data Warehouse Direct Access account and VPN tunnel, the following information must be provided to your Enabling Services engineer during the implementation process:
Identify a local IT/Network Admin resource to serve as the primary contact
Work with the CCCTC to configure a site-to-site VPN connection
Support your local Researchers working remotely to connect to the VPN
Data Warehouse Account Credentials
CCCTC Enabling Services will configure the login credentials to the CCC Data Warehouse, which will be provided to the primary IT contact at your college through a secure transmission.
Note that the login credentials are a Service Account, while they are issued to an individual they are intended to be used to connect applications (such as Tableau) to the CCC Data Warehouse. Protecting these credentials is the responsibility of the college.
Data Access Validation & Testing
Once connected to the CCC Data Warehouse with your own tool, colleges are encouraged to conduct a series of data access tests and activities, which may include:
Query Data Sources
Through local resources, the participant is able to connect to, and run at least one query against each of the available datasets (CCCApply application, CCCApply international application, CCCApply Promise Grant, and Multiple Measures Placement Service).
Connect Data to Local Data Warehouse
If a local data warehouse is available, configure a CCC Data Warehouse table as an “external table” in the local database source. This supports data within the CCC Data Warehouse to be connected to your local data warehouse without the need to copy data to the local machine or district server.
CCCTC Enabling Services Support is available to assist with issues connecting to the CCC Data Warehouse.
Canvas DW Direct Connect Service
Beginning spring 2021, colleges may request to access their Canvas data through the CCC Data Warehouse via direct access connection (ODBC/JDBC). The ability to access Canvas data will require the college Canvas Administrator to generate the Canvas API credentials and securely pass the values to the CCCTC Enabling Services Implementation Engineer as part of the configuration process. Once received and implemented, the data pipeline will be configured to pass your Canvas data to the CCC Data Warehouse. Following that initial pass, the data will be updated nightly.
To get started, please send an email requesting Canvas DW Direct Connect access for your college to your College Relationship Manager (CRM) at the CCCTC Enabling Services.
Required: Colleges must be “live” with the basic DW Direct Connect service in order to implement Canvas DW Direct Connect access. If your college has not yet configured site-to-site VPN access with the CCC Data Warehouse, please contact Enabling Services to help you get started.
Learn more about the Canvas DW Direct Connect service, including the college preparatory requirements before and during the implementation process: How to Obtain & Submit Your Canvas Data API Access Credentials for the CCC Data Warehouse.
Connecting to Redshift Cluster with SQL Workbench
Overview
This document is designed for those looking to gain access to the CCCTC Datawarehouse Redshift cluster databases.
SQL Workbench is a database GUI for logging into many different DBs. This guide will cover logging into the Redshift Database cluster using SQL Workbench.
Prerequisites
You will need a username and password to login to the database itself, as well as the DB Endpoint (DNS) which will be provided in this document. For you to be able to connect to the RedShift DB, you first need to be logged on into the Colleges’ VPN.
Installing SQL Workbench
We will be following similar instructions to the ones on AWS's website to install SQL Workbench, so feel free to refer to those.
https://docs.aws.amazon.com/redshift/latest/mgmt/connecting-using-workbench.html
The following is a bit more in depth on the actual installation steps.
Install Java
Java install documentation can easily be found for all major platforms simply by looking online. Java 8 is compatible with the currently build of SQL Workbench (Build 127), but future builds will require Java 11, so it might be a good idea to install the latter.
Installers for the latest Java can be found at the link below:
https://www.sql-workbench.eu/downloads.html
Download the installer for your platform and run it to initiate the setup/installation wizard.
Download SQL Workbench
You can find the download page for SQL Workbench below:
https://www.sql-workbench.eu/downloads.html
Select the generic package for all systems with the optional libraries. It is probably a good idea to put this somewhere with an informative name that is easy to access on your machine.
Figure 1 Download with Optional Libraries
Unzip the download folder and open the `.jar` file that is inside to open SQL Workbench. If you are on Mac, follow the instructions below to generate an SQL Workbench executable:
https://www.sql-workbench.eu/macos-binary.html
Figure 2 - Select the Jar
Figure 3 - First Time Logging In
Figure 4 - Make sure Autocommit is checked.
Download Redshift Drivers and Test Connection
You will need to download Redshift DB Drivers to enable SQL Workbench to communicate with Redshift. You can find the drivers for this here:
https://docs.aws.amazon.com/redshift/latest/mgmt/configure-jdbc-connection.html#download-jdbc-driver
If you did not install Java 1.11.118 or later above (Java 11), then you will need to download the drivers that included the dependent libraries for AWS SDK. It doesn’t not hurt to have these, so it might be a good idea to just download that version if you are unsure.
Once you have downloaded and unzipped the Redshift drivers on your machine, go back to your SQL Workbench install and select `Amazon Redshift` from the DB Drivers dropdown. You should see a prompt asking you to configure the driver. Select ‘Yes’.
The ‘Manage Drivers’ menu will come up. Click the folder icon and navigate to the drivers you installed earlier.
Select the Redshift driver, which will be named like ‘Redshift*.jar’.
With the drive selected, a sample URL will autofill and several connection fields will need to be filled out. The important fields are:
Name: The name for this connector that you will reference later when you want to access this database again. This is for your use only, so make it descriptive enough to where you understand clearly which DB you are accessing. For example, if you are going to query the COCI DB in Pilot, something like `DWH-COCI-Pilot` might be appropriate. Signifying the environment in this name is highly recommended.
URL: Where to reach the database. Edit the auto filled string here to match this:
Environment | Endpoint |
Pilot | jdbc:redshift://dwh-pilot.ccctechcenter.org:5439/DATABASE |
Production | jdbc:redshift://dwh-prod.ccctechcenter.org:5439/DATABASE |
* Where DATABASE is the name of the database to which you are trying to connect.
Username: the PostgreSQL username you are using to login.
Password: the password for the username above.
Click `Test` in the lower right to verify you have connected successfully. Then click OK.
Application (Data Source) | DB Name |
---|---|
CCCApply Application | application |
CC College Promise Grant | bogfw_application |
CCC International Application | intl_application |
Multiple Measures | mmps |
Troubleshooting: If the test fails, the two most common fixes are,
1.Verify you have entered the URL, your username, and password correctly.
2. Make sure you are connected to the Colleges’ VPN.
Making Queries
After the database connection is validated, you will be given the following prompt that will allow you to entry SQL Queries.
A tab labeled “Database Explorer” will allow you view more information about the database that your user has access to. Table names can be found here, as well as some basic additional information about the DB.
Figure 4 - Database Explorer Tab Provides More DB Information
Finally, you can run SQL against objects your DB user has access to.
Figure 5 - SQL Allowed by Your User Can be Run in the Statement Tab
Example Query to run on Data Warehouse Data
To get a row count of a table you can run the following query
select count(*) from dw_apply_read_523.contact;
You will need to replace the dw_apply_read_523
with the appropriate schema your user has access do. You can also use the database explorer in SQL workbench to navigate the data.
To Run Queries on Data Warehouse Data
To view the application database tables, run the following query once logged in to the (Redshift) Data Warehouse:
select schemaname,tablename from SVV_EXTERNAL_TABLES order by tablename;
All of the read-only table names for your selected application will display, for example:
CCC Data Warehouse Direct Access FAQs
Q: Can I have more than one VPN connection for my district?
A. Not at this time. If this is needed, please discuss with your CRM.
Q: My IT department does not want to create a site-to-site VPN, can I just have an individual VPN connection?
A. No, the CCC Tech Center is not staffed to handle individual user VPN requests.
Q: Can I have more than one individual at the college (or district office) connect to the VPN at a time?
A: The login credentials are provided as a Service Account and intended to be used by authorized users to connect local resources to the CCC Data Warehouse.
More: See the CCC Data Warehouse Frequently Asked Questions for more information and frequently asked questions.