Development: Spam Filter Web Service

Of the 2.1 million applications submitted through CCCApply each year, the vast majority of them are valid - submitted by legitimate applicants that want to attend a California community college. These applications contain personal identifiable data and other critical information that needs to get to the college as quickly and safely as possible. However for the percentage of applications that are bad and that are submitted through CCCApply for nefarious purposes with the intent to commit fraud, we've developed a system that will analyze, flag, suspend, and ultimately, block the fraud attempt through a spam filter web service and user interface.

Development of the spam filter web service and user interface began in early 2017 to assist colleges in making accurate and informed decisions on whether an application is fraud or not. The tool consists of three main components: the post-submission web service, the machine-learning model and prediction service, and the user interface to review and confirm identified fraud.

This page talks about the development project, what it includes, and how it operates.

Post-Submission Web Service Process

At the end of the CCCApply application process, after all the application data has been entered by the student and the applicant has confirmed - under penalty of perjury - that the data being submitted is valid and correct, the "Submit" button is clicked to push the application data to the college that the applicant is applying to. Everything that happens after that point is considered the post-submission process and is the point at which the application is routed to the college via the Download Client or through the College Adapter (Project Glue) for real-time integration with the college student information system.

With the development of the Spam Filter Web Service, every application will now be intercepted after submission and routed to the spam filter machine learning model and prediction service to see if the data meets the criteria that constitutes it as spam or fraud.

The applications that are legitimate and do not meet the criteria for spam are quickly passed through to the college.

For the applications that are frauds, however, the model extracts the data and looks for "identifiers" which are then fed into machine learning algorithm for full analysis. The prediction service then calculates a probability of how confident it is that the application is bad; in other words it "suggests a level of confidence" between 1 and 100. The closer the number is to 100, the more likely it is fraudulent. This is called the Confidence Threshold.

At the heart of the web service is the machine learning, continuous training model that does NOT make any decisions, it just predicts whether an application meets the "identifiers" that have been collected by the model based on thousands of applications already confirmed as fraud by the colleges.

Read more about the Machine Learning Model and Prediction Service here.

Workflow Process

The post-submission workflow looks like this:

Application is submitted to CCCApply
Application is stored with a fraud status flag set to PENDING
Application is posted to the prediction service where model is applied
Prediction service returns the probability rating that the application is fraudulent or not.
Based on the probability rating, the fraud status flag is updated with “Checked Fraud” or “Not Checked Fraud”
Applications set with “Checked Fraud” are sent to the Suspension folder (User Interface) awaiting confirmation by A&R Staff

Spam Filter User Interface

The Spam Filter User Interface is the hands-on utility that allows college staff to manually confirm the applications that have been predicted to be fraud are removed from the submission pipeline before they reaches the college's Download Client or the College Adapter (Project Glue). The user interface is built into the new CCCApply Administrator 2.0, a full feature parity upgrade to the legacy CCCApply Administrator (1.0) for the CCCApply Standard Application only.

As we learned above, every application that is submitted through CCCApply is analyzed by the machine learning model in the prediction service. If the prediction service believes that the application does meet the criteria for fraud, it will update the fraud status field flag from "Pending" to "Checked Fraud" and sends it to the suspension folder (User Interface) to be confirmed by the college's Admissions & Records staff.

Applications that do not meet the criteria in the prediction service are flagged as "Not Checked Fraud" and moved forward to the college for download.

Spam Filter Summary Table

Each application that is flagged with "Checked Fraud" will display in the "Spam Filter" summary table in the CCCApply Administrator 2.0. Each row/application has a checkbox, which the user can select individually or in bulk. Once a row/application, or set of applications, is selected, two buttons appear labeled "Confirm Spam" and "Mark as Valid" - giving the college full control over whether application(s) are moved to the continuous training model to grow the machine learning algorithm, or for legitimate applications that have been flagged in error, are removed from the suspension folder and placed back in the post-submission pipeline to be downloaded by the college.

The Spam Filter User Interface tool has been built into the new CCCApply Administrator 2.0 which is being released to Pilot for testing on June 28 and will be live in Production on July 27, 2018. Read more about the new CCCApply Administrator 2.0 system release here.

User Interface Workflow

The workflow process for the user interface looks like this:

College staff monitor suspension folder via user interface in CCCApply Administrator
Suspended applications are reviewed by college staff for confirmation
College staff make the final determination: Fraud or Not Fraud
If “Fraud” - Then fraud status flag changed to “Confirmed Fraud”
If “Not Fraud” - Then fraud status flag changed to “Confirmed NOT Fraud”
“Confirmed Fraud” flag calls Apply Spam API
Applications that are NOT fraud are sent immediately to the Download Client
Confirmed Fraud/NOT Fraud applications are passed back to the ML model for continuous learning

Spam Email Notifications

As part of the User Interface workflow process, service monitoring has been implemented to notify the college that one or more applications has been flagged as fraudulent and is sitting in the User Interface for their confirmation (processing). If even one application has been predicted to be fraud and moved to the suspension folder, the college will receive an email notification alert reminder.

It is the responsibility of each college to monitor incoming email notifications, as well as processing their suspension folder (User Interface) on a regular basis. Though the prediction service is calculating the probability ratings at a 98.99% accuracy rate, there is still a possibility that a legitimate application may get caught in the spam user interface; just like out own spam email filters.

IMPORTANT: The Spam email notifications are sent out once per day if one or more applications are awaiting confirmation in the Spam Filter User Interface. Email notifications will be sent to the "Admissions Office Email" address field in the "College Information" module in the CCCApply Administrator 2.0, which is accessible in the header from any application screen in the Administrator. Colleges should either update this important contact field with an appropriate email address - to ensure the spam email notifications are sent to the Admissions Office staff member responsible for monitoring the spam filter, or ensure that "forwarding" in applied to that email address to a more appropriate email contact.

IMPORTANT NOTE

Setting Up the Email Notification Recipient

The Spam email notifications are sent out once per day if one or more applications are awaiting confirmation in the Spam Filter User Interface. Email notifications will be sent to the "Admissions Office Email" field in the "College Information" module in the CCCApply Administrator 2.0, which is accessible in the header from any application screen in the Administrator. Colleges should either update this important contact field with an appropriate email contact and address - to ensure the spam email notifications are sent to the correct Admissions Office staff member's email address responsible for monitoring the spam filter, or add email "forwarding" to that email address to the most appropriate email contact.

Post-submission Development

Download client:
The major change to the download client is that applications will not be available to download unless they have a fraud_status of either LEGACY, NOT_CHECKED, CONFIRMED_NOT_FRAUD or CHECKED_NOT_FRAUD.

Export for training:
The Apply team will develop a new tool that can be used to export applications. This tool will dump applications into a CSV file, PGP encrypt the file and copy it to an S3 bucket for Infiniti. The file will contain application data and the fraud status for each application. Infiniti will use this file to perform ongoing training of their prediction model.