Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Of the 2.1 million applications submitted through CCCApply each year, the vast majority of them are valid - submitted by legitimate applicants that want to attend a California community college. These applications contain personal identifiable data and other critical information that needs to get to the college as quickly and safely as possible. However for the percentage of applications that are bad and that are submitted through CCCApply for nefarious purposes with the intent to commit fraud, we've developed a system that will analyze, flag, suspend, and ultimately, block the fraud attempt through a spam filter web service and user interface. 

Table of Contents
maxLevel3
minLevel2

...

Development of the spam filter web service and user interface began in early 2017 to address the rise in fraudulent applications coming in through CCCApply in a way that would to assist colleges in making accurate and informed decisions on whether an application is fraud or not.  was to build a   The tool consists of three main components: the post-submission spam filter web service and user interface that would intercept each application in the post-submission pipeline and route them through the machine , the machine-learning model and prediction service to  , and the user interface to review and confirm identified fraud. 

This page talks about the development project, what it includes, and how it operates.

...

Post-Submission Web Service Process

At the heart end of the web service is a machine learning, continuous retraining model that does NOT make any decisions, it just predicts whether an application meets the "identifiers" that have been collected by the model based on thousands of applications already confirmed as fraud by the colleges.

The Spam Filter User Interface gives the colleges the ability to review each application that has been identified as fraud, and through continuous learning allows the model to grow and learn based on the college's determinations.

User Interface

Post-submission -

Model does the data extraction which looks for the identifiers - that data is then fed into the Machine Learning model - done in real-time and copied to the database to be put through the algorithm for analysis)

The Prediction service will tell us whether the application is fraud or not; CCCApply application process, after all the application data has been entered by the student and the applicant has confirmed - under penalty of perjury - that the data being submitted is valid and correct, the "Submit" button is clicked to push the application data to the college that the applicant is applying to. Everything that happens after that point is considered the post-submission process and is the point at which the application is routed to the college via the Download Client or through the College Adapter (Project Glue) for real-time integration with the college student information system. 

With the development of the Spam Filter Web Service, every application will now be intercepted after submission and routed to the spam filter machine learning model and prediction service to see if the data meets the criteria that constitutes it as spam or fraud.

The applications that are legitimate and do not meet the criteria for spam are quickly passed through to the college.

For the applications that are frauds, however, the model extracts the data and looks for "identifiers" which are then fed into machine learning algorithm for full analysis. The prediction service then calculates a probability of how confident it is that the application is bad; in other words it "suggests a level of confidence" between 1 - and 100.  The close to 1 - closer closer the number is to 100, the more likely it is fraudulent.Originally we were saying Yes or No, but we want to put that determination into the college’s hands This is called the Confidence Threshold

AFter the prediction results -

Threshold is X amount - then goes to the Suspend process  (there is a key threshold that put it in or not)
If service says it’s clean (higher than x or lower) - it will go back to Apply

Prediction service calls the API after confidence level is known (either way)

Application Submission Process

To solve this, we will need to update the way that applications are submitted.  This functionality will be enabled per-college. The workflow will look like;

...

Info

At the heart of the web service is the machine learning, continuous training model that does NOT make any decisions, it just predicts whether an application meets the "identifiers" that have been collected by the model based on thousands of applications already confirmed as fraud by the colleges.

Read more about the Machine Learning Model and Prediction Service here.

Submission Process

The post-submission workflow looks like this: 

  1. Application is submitted to CCCApply
  2. Application is stored with a fraud status flag set to PENDING
  3. Application is posted to a the prediction service where model is applied
  4. Prediction service returns a the probability rating that an app the application is fraudulent or not.
  5. Based on the probability rating, the fraud status flag is updated with “Checked Fraud” or “Not Checked Fraud”
  6. Applications set with “Checked Fraud” are sent to the Suspension folder (User Interface) awaiting confirmation by A&R Staff

Prediction Service

...

...


Spam Filter User Interface

...

If no - the application does NOT meet the criteria for fraud, it continues to through the submission pipeline - to the CCCApply download client or to the Glue API gateway.

Confidence Threshold

Suspension Process 

Spam Filter User Interface

Once an application is marked as 

  1. College staff monitor suspension folder via user interface in CCCApply Administrator
  2. Suspended applications are reviewed by college staff for confirmation
  3. College staff make the final determination: Fraud or Not Fraud
  4. If “Fraud” - Then fraud status flag changed to “Confirmed Fraud”
  5. If “Not Fraud” - Then fraud status flag changed to “Confirmed NOT Fraud”
  6. “Confirmed Fraud” flag calls Apply Spam API
  7. Applications that are NOT fraud are sent immediately to the Download Client
  8. Confirmed Fraud/NOT Fraud applications are passed back to the ML model for continuous learning

...