Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In addition to the steps we've taken to strengthen the security of our CCCApply system, including additional firewall protections, blocking TOR and other known bad actors, and implementing pre-submission configuration changes that would prevent probable fraud applications from being submitted if they meet certain criteria, after the first wave of fraud was reported in late 2016, CCCApply contracted with a machine learning data research team to conduct an extensive research analysis on the fraud applications we collected from the colleges.

Research Objectives

Infiniti commenced a multi-phase research project

The research team initiated their review with the following objectives: 

  • To compile the data and do exploratory data analysis
  • To identify trends and patterns in the incoming data 
  • To identify tools and techniques used by spammers
  • To better understand the motivations by spammers

Early Research

In order to understand why we were seeing an influx of fraud, and to better understand trends and commonalities in the data being submitted through these applications, the machine learning data team collected and analyzed several thousand examples submitted by the colleges reporting the spam.

Based on that initial review, we initiated a multi-part data analysis (without using any student personal information). In the first data review, we focused on one college that provided a large number of bad applications between June 1, 2016 - August 15, 2017; the second analysis looked at all other colleges who provided examples of bad applications in the same time frame; and the third pull looked at all remaining colleges and submitted application data. It was important to compare the bad applications to good applications in order to start detecting trends and patterns in the fraudulent "formula".

After reviewing all three data pulls, even without including personal identifiable information, we learned a great deal.

The majority of bad applications identified were submitted in under 3 minutes, with the majority of those being submitted in under 2.5 minutes. This information alone told us that robots are likely submitting applications using keyboard strokes;

Of the applications identified as frauds, other patterns were prevalent:

  • Time to completion:  2.25 minutes (average)
  • Permanent Address State: NOT California
  • Current Mailing Address State:  NOT California
  • Gender: Male
  • Race: White
  • HS Ed Level:  No high school completion
  • Interest in Financial Aid:  NO

Research Objectives & Outcomes

By identifying characteristics common in the fake applications collected by colleges, such as volume, average submission time, patterns in the submitted data, and user profiling - and comparing that information to non-fraud applications, we are able to take steps to prevent this threat through enhanced security, short-term stop gap fixes as needed, and the development of a spam filter web service. These aren't the only solutions, but as we continue to better understand the motivations behind these attacks, these can be used as part of an overall enhanced security strategy.





Image Modified



Info

One of the outcomes of the machine learning research study was to build a spam filter web service with user interface to prevent bad data from getting to the colleges and continuously re-training the prediction service model. 

...

We are also working with the CCCApply Steering Committee to better understand the motivations of these spammers. What are they after? 

Research Outcomes

Data Trends Identified in Fraud Applications

By recognizing the characteristics of spam applications, such as volume, average submission time, patterns in the submitted data, and user profiling - and comparing that information to non-fraud applications, we are able to take steps to prevent this threat through enhanced security, short-term stop gap fixes as needed, and the development of a spam filter web service. These aren't the only solutions, but as we continue to better understand the motivations behind these attacks, these can be used as part of an overall enhanced security strategy.

Early Research

In order to better understand trends and patterns within these fraud applications so thaThe Tech Center has contracted with a Machine Learning Research organization to better understand the make-up of fraud applications. In order to combat these frauds, we have to be able to identify them as they are coming in. To start we've pulled data for a two-part data analysis (without using any student personal information): the first data pull focused on one college that provided a large number of bad applications between June 1, 2016 - August 15, 2017; the second data pull looked at all other colleges who have provided examples of bad applications in the same time frame; and the third pull looked at all the remaining colleges and submitted application data. We need to compare the bad applications to good applications in order to start detecting trends and patterns in the fraudulent "formula".

After reviewing all three data pulls (again, no personal information was used in this analyses) we learned quite a bit already:

The majority of bad applications identified were submitted in under 3 minutes, with the majority of those being submitted in under 2.5 minutes. This information alone tells us that robots are submitting applications using keyboard strokes;

Of the applications identified as frauds, other patterns were prevalent:

...




Research Outcomes: What We've Learned

...