2013-24: Develop Web Service to Link Statewide Student Identifier (SSID) System with CCCApply
Problem Statement or Business Need
Intersegmental need to build a module into CCCApply that interfaces with the California Department of Education's SSID database to pull SSID data from the k-12 world and populate it into our world (SB1298 says we need to do this).>> SSID data would be stored in Apply database, and various data from the application would be passed back to CDE.
Proposed Solution
Scope of Data Sharing
The data sharing exchange contemplated in this Scope of Data Sharing shall be referred to as the “SSID Data Lookup.” The CCCCO will use its online system “CCCApply” to connect to the CDE’s master look-up service in the CDE’s statewide data management system, the California Longitudinal Pupil Achievement Data System (CALPADS). When students input certain personally identifiable data elements (detailed below) into CCCApply, CALPADS will then attempt to match the information with the K-12 data and, if a match is found, the CDE will share back with the CCCCO the student’s K-12 Statewide Student Identification Number (SSID). Once the student completes enrollment with CCCApply, the CCCCO will share back with the CDE verification of the student’s completed enrollment.
This exchange of data will allow the CDE and the CCCCO to be able to create a way to link information on students that are or were enrolled in the CDE’s CALPADS system with the information on students enrolled in the California Community College system. While the actual data elements to be shared for now are only those data elements set forth in Section III, it is contemplated that the parties will amend this Scope of Data Sharing in the future to allow the CDE and the CCCCO to share additional data elements between the agencies.
Justification for Data Sharing
The exchange of the data is the first step in being able to conduct cross-sector data sharing between the CDE and the CCCCO. This will allow both agencies to be able to audit or evaluate particular federal and/or state-supported education programs and/or to enforce or comply with federal legal requirements related to those programs.
Specifically, the CDE requires the linkage between the two data systems in order to prepare and conduct the following work:
- to meet legislative reporting expectations governing career pathway programs such as the 2006 Federal Perkins Act requirement to report data on the placement of CTE Concentrators in postsecondary education or training;
- the California Career Pathways Trust (Education Code sections 53010 through 53016); and to evaluate and improve various career and college preparatory programs by monitoring the progress of high school students as they graduate from high school and matriculate to a community college.
Similarly, the CCCCO requires the linkage between the two data systems in order to prepare to conduct the following work: (set forth the specific reporting requirements that it needs to meet pursuant to meet federal legal requirements or audits or evaluations that it needs to conduct, etc. citing specific statutory requirements where applicable[JI1] )
Requirements Summary
# | Title | Importance | Notes |
---|---|---|---|
1 | Interface with State API | ||
2 | New SSID field will be stored in Apply Submitted apps database |
Data Elements to be Shared
For now, the CCCCO will share the following information with the CDE, as provided by students applying through the CCCCO’s common electronic application, CCCApply:
- First Name <firstname>
- Last Name <lastname>
- Date of Birth <birthdate>
- Full High School CDS Code of Last HS Attendance CDE lists this fields as: <HighSchoolCDScode> (We will create the field with name in line with our fields: <hs_cds_full>
- CCCID <CCCID>
- Collected SSID (optional) <SSID>
The CDE will then share back with the CCCCO the following matched information, if available:
- The Student’s SSID
- CCCID
The CCCCO will then share the following information with the CDE.
- Verification of Student Enrollment[JI1]
[JI1]To CCCCO: please provide a definition of your version of student enrollment data. Our joint technical solution will be forthcoming soon.
Change Specifications
- Technical Specifications Of Data Sharing
CDE Representational State Transfer (REST) application program interface (API) will be made available to the CCCCO’s CCCApply system to allow the sharing of the CCCCO and CDE data identified in Section III. The following is a short description of the API.
- The RESTful API will use the OAuth 2.0 standard to control access to the API.
- Data in transit will be encrypted through an HTTPS protocol.
- The RESTful API will be limited to a maximum of 5,000 connections per hour.
- The use of the RESTful API will be limited to the exchange of data defined in this agreement.
- Input parameters are defined in Table 1.
- Output parameters are defined in Table 2.
Table 1. API Input Parameters
These are the fields that CCCApply will pass to the "SSID" API.
Data Field | Description | Data Type | Length | Format | Required |
SSID | SSID | String | 10 | ||
CCCID | CCC Identification Number | String | 7 | X | |
First Name | Student’s First Name | String | 30 | X | |
Last Name | Student’s Last Name | String | 50 | X | |
Birth Date | Student’s Birthdate | String | 10 | YYYY-MM-DD | X |
CDS Code | High School CDS Code of Attendance | String | 14 | X |
Requirement 1: Develop two new downloadable data fields to store in Apply submitted applications database:
<SSID> - State Student Identification Code
<> = Full HS CDS Code
Requirement 2: Develop data specifications for new data fields. Add to Standard Application Data Dictionary.
Requirement 3: Create JIRAs for adding new fields to Administrator & Report Center.
Requirement 4: Add fields to Standard Download Client
Table 2. API Output Parameters
These are the fields that will be passed back from the "SSID" API and stored in Apply database.
Data Field | Description | Data Type | Length | Format |
SSID | SSID | String | 10 | |
CCCID | CCC Identification Number | String | 7 |
Requirement 5: Ensure <SSID> can be stored in Submitted Applications database.
Business Rules for "SSID Data Lookup"
Linking SSID and CCCApply Through Web Look-up Service
Business Rules for Technical Soluton
1. Student submits an application for admission.
a) Application is implied consent to permit CDE to share student data with CCCCO in order to evaluate the efficacy of educational programs.
2. CCCApply calls a CDE web service (RESTful web service) passing (FirstName, LastName, BirthDate, HighSchoolCDScode, CCCID, SSID (optional).) NOTE: Process uses exact match on all required fields.
Input Definitions
Data Field | Description | Data Type | Length | Format |
SSID | Statewide Student Identification Number | String | 10 | |
CCCID | California Community College Identification Number | String | 7 | |
FirstName | Student’s First Name | String | 30 | |
LastName | Student’s Last Name | String | 50 | |
BirthDate | Student’s Birthdate | String1 | 10 | yyyy-mm-dd |
CDSCode | Student’s County-District-School Code | String | 14 |
a) SQL Server Date datatype, but for the purposes of an HTTP input, defined as string.
3. CDE uses the data to match the student.
a) CDE stores the CCCID with their student record (if match is found).
b) If match, CDE returns the SSID and CCCID to CCCApply via the web service. Else HTTP 400 response returned with “No student found” message.
Output Definitions
Data Field | Description | Data Type | Length | Format |
SSID | Statewide Student Identification Number | String | 10 | |
CCCID | California Community College Identifier | String | 7 |
4. CCCApply stores the SSID along with the CCCApply Application data as a field that downloadable by the colleges.
FAQs
- QUESTION: With the attached business rules, what happens when the data sent by CCC Apply is not a perfect match?
ANSWER: In our initial discussions with Tim and his staff, we decided that we will do an exact match using the data provided. After we have gone through a cycle, we would take a look at the match rates and see if anything should be modified. FYI, I had one of my staff do a quick analysis of CALPADS data. We found that with the 12M+ student records that currently exist in CALPADS, 98.081% of the records have a unique combination of first name, last name and DOB. If you change the combination to first initial, last name and DOB, 93.789% of the records are unique. Of course, this does not take into account slight variations in spelling of names or misspellings, which leads us to the next question.
- QUESTION: Does CDE have an algorithm to look at non-perfect matches that are highly likely to be the same student, or will this always be a perfect match or bust lookup?
ANSWER: As of now, we will only be looking at a perfect match. We haven’t had any discussion with Tim and his staff on looking for partial matches or variations in names. With our first cut, we are not using any built-in or third party phonetic algorithms (e.g. Soundex), but can implement it into the solution. With the initial discussions we had, we were going to go through an exact match process to see the accuracy and make adjustments. Making modifications to the matching process will be very straightforward.
- QUESTION: If we use first initial, lastname, birthdate, and CDS code, would that get us a better match?
ANSWER: I don’t know if we would get a better match, especially if we get multiple matches. Below is the result for the first initial match. If we want to look at increasing the potential match rate, CALPADS has alias fields that can be used to match against. Even with exact matches, this should increase the percentage. We should probably have a discussion on the details of the match and what we changes we want to introduce. If we use Soundex (SQL Server), then we will be able to introduce phonetic matching to the names, which should increase the match rate. Keep in mind that as we get a little “fuzzier” on our criteria, we will be increasing the odds of getting false positives.
First Initial + Last Name + Birthdate
Result Count | Name Count | % of total | Running % |
1 | 11080627 | 93.789% | 93.789% |
2 | 603303 | 5.106% | 98.896% |
3 | 92121 | 0.780% | 99.675% |
4 | 25139 | 0.213% | 99.888% |
5 | 8504 | 0.072% | 99.960% |
6 | 3072 | 0.026% | 99.986% |
7 | 1112 | 0.009% | 99.995% |
8 | 363 | 0.003% | 99.999% |
9 | 119 | 0.001% | 100.000% |
10 | 44 | 0.000% | 100.000% |
11 | 6 | 0.000% | 100.000% |
12 | 5 | 0.000% | 100.000% |
13 | 1 | 0.000% | 100.000% |
14 | 1 | 0.000% | 100.000% |
Top
Changes to Data Download File
The new fields being created for this work are:
<ssid>
<hs_cds_full>
<col[0]_cds_full>
The <ssid> is a RESTRICTED field and will NOT be added to the College Download Client; the other two CDS fields will be downloadable.
Development/Requirements Action Items
- Legal issues reference in documentation (from Patty based on conversation with Tim) have those been resolved?
- What is being passed to CCCAppy from CDE?
- Where do we want to store this? Will this be downloadable to colleges?
- Do they have the API set up and ready to go?
- Do they have a sandbox environment that we can test in? How do we test this?
- The specs received from CDE discuss exact matching. Does their API require exact match data from us?
- Is there a technical contact we can work with on this?
Top
Supporting Documentation
Notes from Emails and Conversations
2/25/15: Santa Rosa JC has inquired about the status of this via email. Per Tim , add to the next Steering Meeting (April 2015) agenda to revisit.
<< Per Tim's email response to SRJC: This has come up. The SSID has been assigned to students who attended HS since about 2006.
CSU requests this field as an option in CSU Mentor. The problems they experience is bad or no data because there is no verification mechanism and students don't know what it is or where to find the number.
CCCCO has been in negotiations with CDE for them to stand up a web service that we could use to match students against their CALPADS database and return the SSID to us, but so far they have not been able to get this to be a priority.
Since bad data may be worse that no data, the last time it came up to CCCApply Steering, it was tabled.>>