Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Development of the spam filter web service began in early 2017 to address the rise in fraudulent applications coming in through CCCApply in a way that would assist colleges in making accurate and informed decisions on whether an application is fraud or not.  was to build a post-submission spam filter web service and user interface that would intercept each application in the post-submission pipeline and route them through the machine learning model and prediction service to  This page talks about the development project, what it includes and how it operates.

Web Service: Post-Submission Process

At the heart of the web service is a machine learning, continuous retraining model that does NOT make any decisions, it just predicts whether an application meets the "identifiers" that have been collected by the model based on thousands of applications already confirmed as fraud by the colleges.

The Spam Filter User Interface gives the colleges the ability to review each application that has been identified as fraud, and through continuous learning allows the model to grow and learn based on the college's determinations.


User Interface

Post-submission -

Model does the data extraction which looks for the identifiers - that data is then fed into the Machine Learning model - done in real-time and copied to the database to be put through the algorithm for analysis)

The Prediction service will tell us whether the application is fraud or not; in other words it "suggests a level of confidence" between 1 - 100.  The close to 1 - closer to 100, the more likely it is fraudulent.

Originally we were saying Yes or No, but we want to put that determination into the college’s hands.  

AFter the prediction results -

Threshold is X amount - then goes to the Suspend process  (there is a key threshold that put it in or not)
If service says it’s clean (higher than x or lower) - it will go back to Apply

Prediction service calls the API after confidence level is known (either way)

Application Submission Process

To solve this, we will need to update the way that applications are submitted.  This functionality will be enabled per-college. The workflow will look like;

  1. Application is submitted to Apply
  2. Application is stored with the fraud status flag set to PENDING
  3. Application is posted to a prediction service where model is applied
  4. Prediction service returns a probability rating that an app is fraudulent or not.
  5. Based on probability rating, the fraud status flag is updated with “Checked Fraud” or “Not Checked Fraud”
  6. Applications set with “Checked Fraud” are sent to the Suspension folder awaiting confirmation by A&R Staff

Prediction Service

Each application submitted will pass through the prediction service, which, based on the machine learning model and continuous retraining model, will see if the application meets the criteria for fraud.  

If yes, the application does meet the criteria - by a XYZ percentage - it is moved to the suspension folder which feeds into the Spam Filter User Interface in the new CCCApply Administrator.  

If no - the application does NOT meet the criteria for fraud, it continues to through the submission pipeline - to the CCCApply download client or to the Glue API gateway.


Confidence Threshold

Suspension Process 


Spam Filter User Interface

  1. College staff monitor suspension folder via user interface in CCCApply Administrator
  2. Suspended applications are reviewed by college staff for confirmation
  3. College staff make the final determination: Fraud or Not Fraud
  4. If “Fraud” - Then fraud status flag changed to “Confirmed Fraud”
  5. If “Not Fraud” - Then fraud status flag changed to “Confirmed NOT Fraud”
  6. “Confirmed Fraud” flag calls Apply Spam API
  7. Applications that are NOT fraud are sent immediately to the Download Client
  8. Confirmed Fraud/NOT Fraud applications are passed back to the ML model for continuous learning



Post-submission Development

Download client:
The major change to the download client is that applications will not be available to download unless they have a fraud_status of either  LEGACY, NOT_CHECKED, CONFIRMED_NOT_FRAUD or CHECKED_NOT_FRAUD.

Export for training:
The Apply team will develop a new tool that can be used to export applications.  This tool will dump applications into a CSV file, PGP encrypt the file and copy it to an S3 bucket for Infiniti.   The file will contain application data and the fraud status for each application. Infiniti will use this file to perform ongoing training of their prediction model.

  • No labels