Some of these methods include knowledge discovery in databases (KDD), data mining, machine learning and statistics.
[1] In general, the primary reason to use data analytics techniques is to tackle fraud since many internal control systems have serious weaknesses.
For example, the currently prevailing approach employed by many law enforcement agencies to detect companies involved in potential cases of fraud consists in receiving circumstantial evidence or complaints from whistleblowers.
Data fraud as defined by the Office of Research Integrity (ORI) includes fabrication, falsification and plagiarism.
As a result, effective collaboration between machine learning model and human analysts is vital to the success of fraud detection applications.
[9] In supervised learning, a random sub-sample of all records is taken and manually classified as either 'fraudulent' or 'non-fraudulent' (task can be decomposed on more classes to meet algorithm requirements).
[13] Hybrid knowledge/statistical-based systems, where expert knowledge is integrated with statistical power, use a series of data mining techniques for the purpose of detecting cellular clone fraud.
Specifically, a rule-learning program to uncover indicators of fraudulent behaviour from a large database of customer transactions is implemented.
A mismatch – an order placed from the US on an account number from Tokyo, for example – is a strong indicator of potential fraud.
[19] Banks can prevent "phishing" attacks, money laundering and other security breaches by determining the user's location as part of the authentication process.
[20] A major limitation for the validation of existing fraud detection methods is the lack of public datasets.
[21] One of the few examples is the Credit Card Fraud Detection dataset[22] made available by the ULB Machine Learning Group.