Traditionally, fraud discovery has been a tedious manual process. Common methods of discovering and preventing fraud consist of investigative work coupled with computer support. Computers can help in the alert using very simple means, such as flagging all claims that exceed a pre-specified threshold. The obvious goal is to avoid large losses by paying close attention to the larger claims.
As a drawback, the details of such practices, in particular the thresholds used, will become apparent to fraudster, who may then adjust to it and file claims just below the threshold. More sophisticated monitoring means may employ additional thresholds such as claim rates, as well as other measures that are particularly designed to capture fraudulent practices. As a benefit of this "computer monitoring" approach, fraud may be detected before payment is made which is much more effective than trying to retrieve the payments later.
The strengths and weaknesses of the 2 approaches are complementary - the automated computer monitoring approach lacks the knowledge of what actually happened but has the consistency and impartiality of automatic monitoring which is lacking in the manual monitoring. Consequently, both approaches may be needed. Both approaches suffer, however, because most of the actual investigative effort is done by manual exploration of cases. If computer monitoring could be enhanced, so that suspicious claims are not just simply flagged but can identify fraudulent cases with some reliability then this would help to expedite the fraud discovery process.
Big Data analytics and Data Mining offers a range of techniques that can go well beyond computer monitoring and identify suspicious cases based on patterns that are suggestive of fraud. These patterns fall into three categories.
- Unusual data. For example, unusual medical treatment combinations, unusually high sales prices with respect to comparatives, or unusual number of accident claims for a person.
- Unexplained relationships between otherwise seemingly unrelated cases. For example, a real estate sales involving the same group of people, or different organizations with the same address.
- Generalizing characteristics of fraudulent cases. For example, intense shopping or calling behavior of items/locations that have not happened in the past, or a doctor’s billing for treatments and procedures that he rarely billed for in the past. These patterns and their discovery are detailed in the following sections. It should be mentioned, that most of these approaches attempt to deal with "non-revealing fraud". This is the common case. Only in cases of "self-revealing" fraud (such as stolen credit cards) will it become known at some time in the future that certain transactions had been fraudulent. At that point only a reactive approach is possible, the damage has already occurred; this may, however also set the basis for attempting to generalize from these cases and help detect fraud when it re-occurs in similar settings (see Generalizing characteristics of fraud, below).
The unusual data refer to three different situations: unusual combinations of other quite acceptable entries, a value that is unusual with respect to a comparison group, and an unusual value of and by itself. The latter case is probably the easiest to deal with and is an example of "outlier analysis". We are interested here only in outliers that are unusual, but are still acceptable values; an entry of a negative number for the number of staplers that a procurement clerk purchased would simply be a data error, and presumably bear no relationship to fraud. An unusual high value could be detected simply by employing descriptive statistics tools, such as measures of mean and standard deviation, or a box plot; for categorical values the same measures for the frequency of occurrence would be a good indicator.
Somewhat more difficult is the detection of values that are unusual only with respect to a reference group. In a case of real estate sales price, the price as such may not be very high, but it may be high for dwellings of the same size and type in a given location and economic market. The judgment that the value is unusual would only become apparent through specific data analysis techniques..
The unexplained relationships may refer to two or more seemingly unrelated records having unexpectedly the same values for some of the fields. As an example, in a money laundering scheme funds may be transferred between two or more companies; it would be unexpected if some of the companies in question have the same mailing address. Assuming that the stored transactions consist of hundreds of variables and that there may be a large number of transactions, the detection of such commonalities is unlikely if left uninvestigated. When applying this technique to many variables and/or variable combinations, the presence of an automated tool is indispensable. Again positive findings do not necessarily indicate fraud but are suggestive for further investigation.
Generalizing Characteristics of Fraud
Once specific cases of fraud have been identified we can use them in order to find features and commonalities that may help predict which other transactions are likely to be fraudulent. These other transactions may already have happened and been processed, or they may occur in the future. In both cases, this type of analysis is called "predictive data mining".
The potential advantage of this method over all alternatives previously discussed is that its reliability can be statistically assessed and verified. If the reliability is high, then most of the investigative efforts can be concentrated on handling the actual fraud cases, rather than screening many cases, which may or may not be fraudulent.