Soon after the first wave of fraud applications were identified in June 2016, the CCC Technology Center took immediate steps to strengthen the security of the CCCApply system and protect our students' personal identifiable data (read more about all the ways we are addressing fraud in CCCApply). Meanwhile, we contracted with a machine learning data research team to perform data analysis on several thousand fraud applications examples that were collected from the colleges that initially reported the spam. Research ObjectivesThe objectives for the research project were simple:
Additional objectives were added based on the recommendations and outcomes of the research, including commencing a small pilot of four colleges to get feedback and better understand their workflow processes, as well as develop a process for collecting data throughout the design and development phase of the project. | ![]() |
Based on what they learned in the initial review, the research team conducted a multi-part data analysis of all submitted applications (without using any student personal information). In the first review, the focus was on one college that provided a large number of bad applications between June 1, 2016 - August 15, 2017. The second review looked at all other colleges who provided examples of bad applications in the same time period; and the third pull looked at all remaining colleges' submitted application data. It was important to compare the bad applications to good applications in order to start detecting trends and patterns in the fraudulent "formula".
After reviewing all three data pulls, even without including personal identifiable information, we learned a great deal.
The majority of bad applications identified were submitted in under 3 minutes, with the majority of those being submitted in under 2.5 minutes. This information alone told us that robots are likely submitting applications using keyboard strokes;
Of the applications identified as frauds, other patterns were prevalent:
By identifying commonalities across all the fraud applications submitted by colleges - such as volume, average submission time, patterns in the submitted data, and user profiling - and then comparing that information to non-fraud applications, the research team was able to make some high-level recommendations, including short-term fixes and long-term solutions, that we could start implementing immediately. The recommendations included:
These recommendations were all approved as part of an overall enhanced security strategy for 2018-2019.
Spam Pilot Project One of the recommendations from the research study was to organize a small pilot of colleges that can work with our support engineers and provide feedback throughout the research and development efforts. The pilot colleges will also collaborate on best practices and other workflow changes that can be shared back with the other colleges. enhancements to help prevent fraud applications from getting back to the colleges through their download system to prevent bad data from getting to the colleges and continuously re-training the prediction service model. |
One of the recommendations from the research develop a spam filter web service that would prevent these the bad applications from getting back to the colleges through their download system to prevent bad data from getting to the colleges and continuously re-training the prediction service model.
Spam Filter Web Service One of the outcomes of the research study was the recommendation to develop a spam filter web service that would prevent these the bad applications from getting back to the colleges through their download system to prevent bad data from getting to the colleges and continuously re-training the prediction service model. |
Meanwhile, we continue to work with the machine learning team and several colleges in a pilot project to build and train the algorithm with any bad applications submitted by colleges. The email tomorrow will also specify how colleges can submit their fraud applications to the Tech Center for this purpose (we need them formatted in a specific way and ensure colleges know not to include any student personal identity information.
We are also working with the CCCApply Steering Committee to better understand the motivations of these spammers. What are they after?