(2) Fraud Detection In The Financial Services Industry

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View (2) Fraud Detection In The Financial Services Industry as PDF for free.

More details

  • Words: 7,096
  • Pages: 24
Driving Performance Improvements in the Health Service

Fraud Detection in the Financial Services Industry

A SAS White Paper1

Table of Contents Introduction: The Cost of Fraud ................................................................................... 1 Fraud in Financial Industries ........................................................................................ 1 Fraud: Characteristics, Types and Examples ............................................................. 2 False Insurance Claims................................................................................................ 2 Healthcare Fraud.......................................................................................................... 3 Non-insurance Fraud.................................................................................................... 3 Credit Card Fraud ..................................................................................................... 4 Mortgage Fraud ........................................................................................................ 4 Stock Market Fraud .................................................................................................. 4 Money Laundering .................................................................................................... 4 Traditional Techniques of Fraud Discovery................................................................ 5 Fraud Detection with Data Mining ................................................................................ 6 Approach 1: Unusual Data ........................................................................................... 6 Approach 2: Unexplained Relationships ...................................................................... 7 Approach 3: Generalizing Characteristics of Fraud ..................................................... 8 Summary of Data Mining Fraud Detection Techniques ............................................. 9 Case Study: Detecting Mortgage Fraud .................................................................... 10 Exploratory Analysis................................................................................................... 10 Predictive Modelling ................................................................................................... 11 Association and Sequence Analysis .......................................................................... 11 Link Analysis............................................................................................................... 12 Conclusions ................................................................................................................ 12 Case Study 2: Credit Card Fraud................................................................................ 13 Data Model ................................................................................................................. 13 Analysis ...................................................................................................................... 13 Conclusion.................................................................................................................. 15 Summary....................................................................................................................... 16 Appendix: Enterprise Miner ........................................................................................ 17 References and Further Reading ............................................................................... 18 Topical ........................................................................................................................ 18 SAS Institute White Papers ........................................................................................ 19

Fraud Detection in the Financial Services Industry was written by Julian Kulkarni and Ed Walker, based on a SAS Best Practices paper by Bernd Drewes.

Fraud Detection in the Financial Services Industry

Introduction: The Cost of Fraud Nobody can put a figure on the full and exact costs of financial fraud, which is defined as the use of criminal deception for financial advantage. This definition admits a wide variety of interpretations. So estimates differ widely. All we know for sure is that the costs are high and probably getting higher. The US Coalition Against Insurance Fraud estimates the annual cost to be $85 billion, which it describes as “a hidden tax of more than US $1,000 per family each year on the costs of goods 1 and services.” A 1998 report Taking Fraud Seriously by the Australian Institute of Chartered 2 Accountants estimated the annual cost of fraud in Australia at US $3.5 billion. A 1997 report by 3 the Association of British Insurers estimated UK insurance claims fraud at § 595 million, while a 4 US insurance association estimated global insurance fraud costs at US $17 billion. US mortgage industry professionals and the FBI estimate that “between ten and 15 percent of loan applications 5 contain material misrepresentations”. In 1996 one in every ten Americans had been the victim of a credit card fraud, and half feared 6 they would become victims. Already by 1992 the cost of credit card fraud in the US alone was US 7 $864 million, and while the percentage of transactions that are fraudulent is falling, the rapid growth in the volume of transactions – including now Internet-based transactions – puts the likely current global figure upwards of several billion dollars. In this white paper we consider some specific examples of fraudulent practice that affect the financial services industry, and look at some tools and techniques for anticipating and uncovering fraud. In particular we look at Fraud Detection solutions from SAS Institute, part of the SAS Solution for Customer Relationship Management. Readers seeking a more detailed description of the methods and case studies are advised to contact SAS Institute and enquire about the data mining Best Practices Papers.

Fraud in Financial Industries The costs of insurance fraud alone are probably now in the region of US $ 100 billion per year in the United States, and there is little reason to suppose that it is less of a concern in other regions including Europe. Typical fraudulent practices include fake accidents, fraudulent disability and property claims (including arson), false medical billing and theft of insurance premiums (for example, by unauthorized insurance companies). Detection of fraud is generally hampered by the need for highly skilled investigators ploughing their ways through backlogs of computer data, with successive findings triggering new questions and requiring painstaking searches; in the meantime fraud and abuse continue. Here are some examples that have recently been presented in news stories: 1

Coalition Against Insurance Fraud, July 1999 Melbourne Age, 17 November 1998 BBC News, 7 December 1997 4 Pennsylvania Association of Mutual Insurance Companies, quoted by Lititz Mutual Insurance Company January 1998 5 Robert J. Sadler, The Mortgage Mart, July 1999 6 The Detroit News, January 31 1996 7 Federal Trade Commission 2 3

1

Fraud Detection in the Financial Services Industry

In one area, vehicle accidents were staged and the fraudsters collected on property and medical claims. Investigators conducted database searches in order to find common denominators of organized fraud groups. They found that people involved in seemingly unrelated accidents had received medical treatment from the same provider. Further investigations revealed that many of the claims were part of an organized scheme where people were hired to participate as a victim in an accident. Forty-two such cases were identified that had generated $700,000 in fraudulent claims. Fraudulent medical claim. A slip-and-fall injury claimant demanded payment of US $200,000, hired an attorney, but refused to disclose her identity, her medical records, or the complete nature of her injury. Using the claimant’s Social Security number to check database for past claims revealed that the claimant had filed 12 bodily injury claims using the same Social Security number but different names. Further data mining revealed a relationship among the claimant’s treating physicians (chiropractors) and attorneys. Outcome: the insurer, which had offered a settlement before the investigation, withdrew its offer, and the claimant’s attorney withdrew from the case. Employees’ compensation. A claimant sought US $220,000, alleging multiple injuries from an auto accident had left him unable to work. Database checks revealed that the claimant had filed a compensation claim for the same injuries a few months after the accident. A separate check of employment records established that the claimant held a position with a new employer similar to the position he held when the auto accident occurred. When confronted with these facts, the attorney representing the claimant immediately withdrew the lost-wage claim.

Fraud: Characteristics, Types and Examples As the foregoing cases suggest, there is a list of typical scenarios that can be found in fraud cases in the financial industry. The main areas are as follows.

False Insurance Claims The fraud rate for such claims is estimated at ten percent and uses a variety of schemes, some of which are listed below. Most of these cases are difficult to detect, because they represent “nonrevealing” fraud, in other words there is no explicit event (such as a stolen credit card) that will eventually identify the claims as fraudulent. Investigative efforts such as the uncovering of repeat behaviour or connections between conspiring individuals, is usually needed. Here are some typical scenarios:

2



A man uses his wife’s jewellery as the basis for a policy taken out on behalf of another lady friend. The friend then claims that “her” jewellery has been stolen.



A person insures a vehicle and arranges to have it disappear. He then reports it stolen and collects the insurance, perhaps also selling parts of the vehicle (airbags are very popular currently).



Garage repairers install used parts rather than new parts, but bill for the latter.

Fraud Detection in the Financial Services Industry



A car causes an accident by stopping suddenly in front of another car. The passengers then make medical and property claims.



A head injury is claimed by someone who has not actually had an accident; injuries are certified and treated by one of several doctors who were part of the scam.

Healthcare Fraud Today’s claims payment system must deal with a wide range of medical procedures and practices and a growing number of public and private health care programmes. There is the typical problem of too much data and not enough information. As a result it has been said that healthcare fraud works best when automated billing works perfectly. There is also often a thin line between ordering more tests than are medically necessary, making the detection of system abuses difficult for non-expert investigators. Common scenarios include: •

Charging for unnecessary services such as performing $400,000 worth of heart and lung tests on people suffering from no more than a common cold.



Administering more expensive blanket screening tests, rather than tests for specific symptoms.



Billing for services not rendered such as diagnostic interventions that never took place. An extreme case of this involved the claim to perform a bronchoscopy once a week on a patient, when once per lifetime is the norm.



Billing for treatment by a senior doctor when it was actually performed by trainees.



Unbundling group tests (such as a standard collection of blood tests), and billing the tests at the higher individual rates.



Upgrading tests by billing for a related but more complex and expensive procedure.

Another type of healthcare fraud is wholly unauthorized billing. This often involves setting up a billing company and submitting fictitious bills for treatments done by unsuspecting doctors to unsuspecting patients. After the reimbursement cycle the fraudulent billing company moves on elsewhere. Effected patients and doctors may suffer serious consequences, patients records now showing serious diseases, and doctors being audited for suspected tax fraud.

Non-insurance Fraud In the non-insurance sectors of the financial industry there are also a number of common frauds. The most well known involve stolen credit cards.

3

Fraud Detection in the Financial Services Industry

Credit Card Fraud The methods used in credit card fraud are likely to be quite simple: steal the credit card or the credit card information, then use this to purchase big ticket items. The actual purchase is often preceded by small test purchases (for example at a filling station) in order to verify that the card is operational, without drawing much attention in case it is not. In the case of a stolen card, the fraud is likely to be revealed after a short time, so the purchase pattern tends to start immediately and is intense until the card limit is exhausted. When only the credit card information is stolen, the purchase pattern may be less intense and stretch over much of the billing cycle. The lack of a physically presentable card requires that the purchasing be done remotely, by telephone or electronically. Ingenious schemes may be invented in order to come up with a non-incriminating delivery address, such as staking out a currently non-occupied house.

Mortgage Fraud Mortgage fraud involves the misrepresentation of a real-estate value and its use for fraudulent purposes. In a common scheme houses are bought by a group of people and are then resold among one another at inflated prices; the resulting (virtual) equity is used to take out mortgages and/or is resold to unsuspecting buyers.

Stock Market Fraud Various schemes qualify as stock market fraud: •

Insider trading involves the actual trading of stock based on illegally used knowledge. The knowledge may contain information affecting the stock market price or consist of impending customer orders. In the latter case a person will execute a similar order for himself before executing the customer order (front-running a trade).



A fraudulent advertisement attempts to procure customers by providing misleading information, such as guaranteed rates of return from a stock investment or falsified titles of an investment object (such as “prime bank certificates”) that appear to signal a secure investment.



Driving up stock prices, for example by selling between brokers.

Money Laundering Money obtained through illegal means (such as drugs) is processed by direct deposit into various (often foreign) accounts or by converting it into legal revenue. For the latter purposes a fictitious or real business may be created with the deposited money appearing to be real revenue coming from this business.

4

Fraud Detection in the Financial Services Industry

Traditional Techniques of Fraud Discovery Common methods of discovering and preventing fraud consist of investigative work coupled with computer support and client education. Basically, both computers and clients are helping to alert insurance investigators to suspicious cases, which are then screened by the investigators one after the other. The alerting phase is an important component in detecting fraud; many cases would otherwise go unnoticed, in particular single-time, single-person offences. Computers can help in the alert for example by flagging all claims that exceed a pre-specified threshold. The obvious goal is to avoid large losses by paying close attention to the larger claims. The drawback is that the details of such practices (in particular the thresholds) will become apparent to the fraudster, who may then adjust to it and file claims just below the threshold. More sophisticated monitoring may employ additional thresholds (such as claim rates) as well as other measures designed to capture fraudulent practices (for example age constraints, such as identifying the case of chiropractic treatment for a six-month old baby). As a benefit of this “computer monitoring” approach, fraud may be detected before payment is made, which is much more effective than trying to retrieve the payments later. Education of consumers or clients is the other means of alerting investigators to suspicious cases. In health insurance, a method as simple as sending the beneficiary the claim statement from the doctor may be effective. It should, for example, detect those cases of fraud where health service numbers or health insurance cards are stolen and used for fraudulent billing. On the other hand, even in this clear setting there are problems. People making reports on apparently inflated claims run the risk of antagonizing their doctors. They may also misunderstand the claim and fail to make reports, or make mistaken reports. Lastly, providers in many countries may have a long billing cycle, which will make it difficult for patients to recall the specifics of a situation. The strengths and weaknesses of the two approaches are complementary. The automated computer monitoring approach lacks knowledge of what actually happened but offers a degree of objectivity and certainty which is sometimes lacking in the patient’s monitoring. Consequently, both approaches may be required. Both approaches suffer, however, because most of the actual investigative effort is done by a manual exploration of cases. This concerns some of the common triggers for suspecting irregularities, such as the absence of witnesses to an accident, lengthy recovery periods, unusual medical treatments, improperly issued insurance policies, cash transactions, lack of Cupertino, excessive demands by claimants, large numbers of patients seen on weekends. Some of the triggers (such as claim forms heavily altered with corrective fluid) are even more difficult to automate. Investigation often takes months, if not years. If computer monitoring could be enhanced, so that suspicious claims were not simply flagged but you could identify fraudulent cases with some reliability, this would help to expedite the fraud discovery process. Fraud detection and prevention can be greatly facilitated by legal reporting requirements that help to make it clear to potential fraudsters that their schemes are unlikely to succeed, and make it easier to detect fraud when it occurs. For example, to counter money laundering, companies dealing in high-value items such as jewelry and vehicles may be required to report more details of the identity of customers and the method of payment. Doctors can be required to provide specific details on patient case histories.

5

Fraud Detection in the Financial Services Industry

Fraud Detection with Data Mining Data mining offers a range of techniques that go well beyond computer monitoring and can identify suspicious cases based on patterns that are suggestive of fraud. These patterns fall into the following categories: 1.

They are unusual in some sense, for example unusual combinations of clinical treatment, unusually high sales prices, or an unusually high number of accident claims.

2.

Unexplained relationships between otherwise apparently unrelated cases, such as a high proportion of accident victims being treated by the same doctor, real estate sales involving the same group of people, different organizations with the same address.

3.

Common characteristics of fraudulent cases such as intense shopping or calling behaviour of items or in locations that have not occurred in the past, or billing for treatment and procedures that a doctor rarely billed for in the past.

These patterns and their discovery are detailed in the following sections. Most of these approaches attempt to deal with “non-revealing fraud”. Only in cases of “self-revealing” fraud (such as stolen credit cards) will it become known at some time in the future that certain transactions had been fraudulent. At that point only a reactive approach is possible, since the damage has already occurred; this may however also set the basis for attempting to generalize from these cases and help detect fraud when it reoccurs in similar circumstances. (Approach 3, below). In order to facilitate fraud discovery in other situations, governments and their agencies are enforcing special reporting requirements in a variety of areas (such as the logging of bank transactions in order to detect money laundering). These requirements generate additional data that can be exploited with data mining.

Approach 1: Unusual Data In the first set of examples above the data is unusual in some respect: unusual combinations of otherwise quite acceptable entries; a value that is unusual with respect to a comparison group; or an unusual value of and by itself. The latter case is probably the easiest to deal with and is an example of “outlier analysis”. We are interested here only in outliers that are unusual, but are still acceptable values, for example a value 12 for the number of accidents reported by a policyholder. The entry of a negative number would simply be a data error in this case, and presumably bear no relationship to fraud. A “legitimate” but unusually high value could be detected by outlier analysis or simply by employing descriptive statistics tools, such as measures of mean and standard deviation, or a box plot; for categorical values the same measures for the frequency of occurrence would be a good indicator. Somewhat more difficult is the detection of values that are unusual only with respect to a reference group. The significance of the number of accidents in the foregoing example was either based on common experience or its reference group was irrelevant for the problem at hand. However, in the case of a real estate sales price, the reference group is highly significant: the price as such may not be very high for houses in general, but it may be high for dwellings of the same size and type in a given location and economic market. These reference groups are implicitly specified, at best, when the data are recorded. The judgement that the value is unusual would only become apparent through data analysis techniques, such as considering 6

Fraud Detection in the Financial Services Industry

neighbourhood data or comparing the sales price to the prices of similar houses. The technique for this situation might therefore involve the determination of a reference value by means of cluster analysis, followed by an outlier analysis to detect cases deviating from this reference value. The detection of unusual combinations of values (such as unusual combinations of diagnostic or therapeutic interventions) may become involved, requiring substantial programming in addition to the data mining routines. Data mining algorithms are much better at the opposite, namely determining common combinations, including combinations over time (such as treatments done to a patient over a certain period). These are called “associations” and “sequences”. However there may be a huge number of combinations that are quite ordinary, even though they may not be quite as common as others. When applied to the problem of identifying potentially fraudulent medical bills, many of the bills may represent rare treatment combinations, perhaps because of the length of the billing cycle rather than the treatment itself. It is more difficult to find combinations that are automatically indicative of fraud. In practice, therefore, you need to rely on medical knowledge in order to focus on particular combinations that should not occur. This requires augmenting the general-purpose association algorithms with post-processing, sifting through associations and sequences and testing for the presence or absence of items in a rulelike fashion. Depending on the rules satisfied, each such bill can then be scored for the probability of fraud. Unusual data is not in itself a reliable indication of fraud. Using data mining in this way will extend the concept of computer monitoring, feeding suspicious cases for manual investigation. In some of these cases the investigation may succeed in proactively avoiding an impending fraudulent event.

Approach 2: Unexplained Relationships In the three examples in (2) above, the unexplained relationships may occur because apparently unrelated records have the same values for some of the fields (such as same doctor or same address), or turn out in fact to be related (for example, a group of people exchanging property). The first possibility is the simpler of the two. The coincidence of values must be genuinely unexpected, discarding such obvious similarities as the same sex or nationality. For example, in a suspected case of money laundering, funds may be transferred between two or more companies. It would be unusual if some of the companies in question had the same mailing address (“mail box” companies). Assuming that the stored transactions consist of hundreds of variables and that there may be a large number of transactions, the detection of such similarities is unlikely if not investigated. From a computational perspective, the problem is in principle simple. It is merely necessary to count the frequency of occurrence of values in the variables selected. Typically in a table of companies, each address should be unique, occurring once. In practice, it may not be so simple to carry this out for large amounts of data and variables with a huge number of values, such as account numbers or addresses. When applying this technique to many variables and/or variable combinations, the presence of an automated tool is indispensable. Again positive findings do not necessarily indicate fraud but suggest that further investigation is necessary. A variant of this technique involves finding “almost duplicate” records. In the case above this might apply to companies with slightly different addresses, but otherwise pretty much identical information. In this situation it may be useful to perform a cluster analysis with a high number of 7

Fraud Detection in the Financial Services Industry

clusters on the variables in question. Obtained clusters can be inspected for suspicious groupings (for instance doctors submitting claims under different names, such as a maiden name) and investigated in greater detail. Other scenarios require the identification of connections between records. In a simple case, the connection may be through a single record, for example cases where supposed victims of car accidents seek treatment by the same doctor. Such a connection can be uncovered by the “filter outlier” technique discussed above, focusing on the “treated by” field of all accident claims and discarding all records where the doctors listed in the “treated by” field only treated a single or very few accident patients. The doctors remaining on the list are scrutinized further. This is an iterative process and may result in some number of false leads; but accident schemes such as this are usually geographically localized and therefore the size of the database and complexity of result management are limited. A more complex case exists when the connection between records occurs through multiple intervening records. For example a situation where a group of people keeps selling houses to one another with increasingly inflated prices at each transaction, as a preparatory step for mortgage fraud. Identifying this collection of records would require navigating from seller to buyer to buyer and eventually closing the circle. As most buyers of a property will have sold their previous property, this can be a data intensive search process. This can be programmed explicitly, for example using SQL to construct a database query. There are also special purpose tools for this sort of “link analysis”.

Approach 3: Generalizing Characteristics of Fraud Once specific cases of fraud have been identified you can use them to help predict which other transactions are likely to be fraudulent. These transactions may have already happened and been processed, or they may occur in the future. In both cases, this type of analysis is called “predictive data mining”. Applying this technique requires a sufficiently large set of transactions to allow generalization and the building of predictors, usually employing one of three data mining tools: regression, decision tees and neural networks. Depending on the tool used, the predictors may be more numerically oriented (regression, neural nets) or may be similar to ordinary rules of thumb, expressed, of course in domain-specific detail. As useful predictors begin to emerge, they can be applied to historical databases and help identify fraudulent transactions that have so far gone unnoticed. With the increase of the collection of identified fraud cases over time, the quality and reliability of the predictors is likely to improve and eventually stabilize. The potential advantage of this method over all alternatives previously discussed is that its reliability can be statistically assessed and verified. If the reliability is high, then most of the investigative efforts can be concentrated on handling the actual fraud cases, rather than screening many cases, which may or may not be fraudulent.

8

Fraud Detection in the Financial Services Industry

Summary of Data Mining Fraud Detection Techniques In the foregoing section, a number of fraud detection goals have been discussed. Table 1 maps these goals to appropriate data mining techniques. Task

Goal

Data Mining Technique

Find Unusual Data

Detect records with globally abnormal values

Outlier analysis

Detect multiple occurrences of values Detect direct link between records Identify Unexplained Relationships

Detect records with abnormal reference values

Cluster analysis and outlier analysis

Determine profile of suspects (e.g. claim hoarders)

Cluster analysis

Detect duplicate and almost duplicate records

Generalizing Characteristics of Fraud

Detect indirect links between records

Social network or link analysis

Detect records with abnormal value combinations

Associations and/or sequences

Find criteria, such as rules, for detecting fraud, based on historical data

Predictive modelling

Score transactions for likelihood of fraud

Table 1: Data mining fraud detection techniques In Table 2, some common fraud scenarios are mapped to particular combinations of data mining techniques that can be used in discovering fraud, for each scenario: Fraud Scenario

Data Mining Techniques Used

Detect over-ordering of medical tests.

Uncover associations between tests that are commonly performed together; this technique will identify doctors who habitually prescribe unnecessary test or treatment combinations. Perform doctor clustering with the above results, identifying groups of doctors with similar prescription or treatment behaviour. Target them for cost-saving education.

Detect fictitious bills: billing by fraudulent agency, unknown to doctors.

Use associations to generate common billing patterns for each doctor; for each new bill issued by a doctor compare each patient’s treatments against the billing pattern and flag, count, and score unusual combinations Use outlier analysis to detect direct links between billing companies, such as multiple occurrences of company address or executive names

Detect fictitious bills: billing for impossible claims (such as sex or age-specific treatments).

Outlier analysis for globally impossible values in different reference groups.

Detecting fictitious bills: billing oneself as a patient (e.g. using maiden name).

Outlier analysis, finding a link between patient’s and doctor’s record by detecting multiple occurrences of values for presumed unique entries, such as telephone numbers and addresses.

Table 2: Data mining techniques used in different fraud scenarios

9

Fraud Detection in the Financial Services Industry

Case Study: Detecting Mortgage Fraud Misrepresentation of real-estate values can be used in property insurance fraud or mortgage fraud. In a common scheme houses are bought by a group of people and are then resold among one another at inflated prices; the resulting (virtual) equity is claimed in property losses or is used to take out mortgages and/or is resold to unsuspecting buyers. A sample analysis discovering this kind of fraud is presented in this section. The first stage is to build a data set consisting of transaction records showing seller, buyer, price, date and a real estate object ID, which can be mapped to or derived from an address. In this example, the concentration of fraudulent transactions revealed at each stage is typical of real findings using Enterprise Miner. We start by supposing that five percent of the original set of transactions are actually fraudulent. There are several approaches that can be taken. An obvious exploratory analysis would focus on properties that have increased in value strongly above average and/or have been subject to multiple sales.

Figure 1: Exploratory analysis and modelling with and without over-sampling

Exploratory Analysis Multiple sales can easily be identified with the Filter outlier tool of SAS Enterprise Miner (see Figure 1). Discarding all objects with fewer than three transactions results in a data set which contains 20 percent fraudulent transactions, a substantial increase over the initial five percent. In the context of a large transaction database, it would still be difficult, however, to identify the fraudulent records. Adding a time constraint on multiple sales will increase the proportion of fraudulent records even more. In order to identify objects with strong increases in value, it is convenient to construct a condensed data set, with one record per object showing starting and ending values over certain time periods or transaction histories. Using the Transform Variables node additional variables, 10

Fraud Detection in the Financial Services Industry

such as value increase per time period and number of sales in the time period considered can be constructed. In this data set fraudulent records are approximately two percent of the data set. Using the Filter Outlier tool to exclude objects that occur in one transaction only, and allowing only high value increases and transaction frequencies per time period result in a concentration of approximately 25 percent fraud cases. In a large data set, this may still be too small for manual investigation; however, a smaller subset can be created and manually analysed in this way. Identified fraud cases can then be generalized in a larger data set using predictive modelling.

Predictive Modelling Predictive modelling attempts to generalize from training data, requiring the presence of historical or preclassified data that has already been analysed, with the fraudulent cases identified. Consolidated data (one record per real estate object) is enhanced by additional variables (such as the ratio of time period to number of transactions that an object was involved in). The flow then splits into two parts, one part working with the entire data set, and the other with a sample thereof. Since the fraudulent cases are rare, sampling in this case provides a means to concentrate their occurrence to the point where a generalization of their properties may become successful. Such is the case below, where a stratified sample using 20 percent of the data results in an increase of concentration of fraudulent cases from two percent to ten percent. The data are then partitioned into training and validation data sets and fed into a Variable Selection node, which assesses the relative importance of the variables with respect to the target. The model could be built using regression, decision tree or neural networking tools. Overall, correct classification is at 97 percent, more importantly all of the fraudulent cases are correctly classified. This analysis shows how exploratory data mining can be used to initiate predictive mining, which in turn will lead to the identification of the unknown fraudulent cases and the associated real estate objects. Knowing the objects will then allow the retrieval of the implicated sellers and buyers in the original transaction file. It is important to realize that the transaction-based data set does not reveal any useful predictors. In this type of fraud, any single transaction looks perfectly legitimate.

Association and Sequence Analysis The data set also contains groups and subgroups of people who are repeatedly involved with ringselling/buying different properties. Association analysis will be able to identify these groups.

11

Fraud Detection in the Financial Services Industry

Link Analysis For the type of fraud discussed here, link analysis would be the most direct way of identifying fraudulent chains. These types of graphs can be created with a combination of data manipulation and plotting using SAS/OR® software (see Figure 2).

Figure 2: Link plot sub-setted to circular activities It needs to be noted, however, that this particular code will only discover fully circular selling transactions, not other types of fraudulent chains, which are better uncovered with exploratory analysis. While link analysis is a useful technique, there are other techniques, such as associations, which often achieve the same results.

Conclusions Data mining can bring different technologies to bear on complex issues, such as the identification of fraudulent transactions. It is usually necessary to use several of these technologies in order to succeed in solving the problem. The exact choice and mix of these technologies depends to a large extent on the specific application as well as characteristics of the available data. In the examples presented above, predictive analysis was employed as well as several types of exploratory analysis. The predictive analysis results in SAS code that can be applied to new data and predict their likelihood of being fraudulent. The various exploratory techniques attempt to directly identify the fraudulent transactions in the data at hand. Most successful in this regard were the association and link analysis techniques that identified all fraudulent transactions.

12

Fraud Detection in the Financial Services Industry

Case Study 2: Credit Card Fraud Data Model This investigation begins with 4736 credit card transactions, containing approximately 20 percent fraudulent cases.

Analysis A standard predictive mining flow is shown in Figure 3, involving the use of the Data Replacement node to impute missing values, the Data Partition node to generate training and validation data sets, and the Variable Selection node to bin variable values, determine predictive interactions and reject unneeded variables. Two modelling tools are used, namely the decision tree and the neural net, their results are compared in an Assessment node.

Figure 3: Predictive modelling The comparison of the results in the Assessment node can be visualized with a variety of charts and tables, the most popular of which is the “lift chart”, shown in Figure 4. In this chart, the data set records are sorted by their likelihood to be fraudulent, as predicted by the two models – the decision tree (middle curve), and the neural net (upper curve). The curves shows that the top ten percent of the data that the neural network predicted to be fraudulent actually contains approximately 78 percent fraudulent data. The corresponding figure for the decision tree is approximately 66 percent. The figures for the top 20 percent are approximately 61 percent and 50 percent respectively. This is much better than the baseline (lower curve) which represents the likelihood that a randomly drawn record is fraudulent. The results also show that the neural network performs better than the decision tree.

13

Fraud Detection in the Financial Services Industry

Figure 4: Lift chart comparing the results A decision tree has the advantage, however, of being much easier to interpret than a neural network. This can be confirmed by looking at the results for the decision tree, as outlined in Figure 5, showing the first three splitting levels. Due to reasons of space, the decision tree has been partitioned at the root, and the left and right half are shown in separate figures.

Figure 5a: First three levels of the left half of the decision tree

14

Fraud Detection in the Financial Services Industry

Figure 5b: First three levels of the right half of the decision tree The most important variables are the credit card usage count, the amounts spent during a day and its ratio to the cumulated time intervals between transactions (in other words focusing on high amounts spent during a short time). Most of the high fraud cases correspond to transactions with high usage and high amounts during a day. In general, the more important tests for the target investigated (here fraud) occur higher up, near the root of the tree. In our case, the most important criterion differentiating fraudulent from nonfraudulent data is the daily usage count, that is to say, how often a card is used per day. The next most important test concerns the daily amount charged, followed by a ratio of amount charged to time passed since the last usage.

Conclusion Using predictive data mining allows the construction of a model that can be used to predict the likelihood of a transaction to be fraudulent. In the case of decision trees, this model can be easily visualized as a sequence of tests or questions. These can be verified by business specialists and implemented in a program, or early warning manual system. Results from a neural net cannot be easily interpreted, but may outperform decision trees, as happened in this case. The comparison between the modelling tools takes places in the Assessment node and allows the selection of the most efficient model for a task. Each modelling tool will generate program code that can be applied to new data, thereby predicting the likelihood of fraud for future transactions, which is the purpose of this investigation.

15

Fraud Detection in the Financial Services Industry

Summary Fraud is broad ranging and in many cases difficult to detect. For financial services companies it represents a serious threat to company profitability and public confidence. As the volumes of data held by financial services companies increases, traditional techniques of fraud detection are timeconsuming and will not uncover many types of fraud. Therefore many companies are looking to their IT departments to assist them. However, most IT departments do not have the technology required to support the more complex investigations, which is a source of frustration for senior managers and IT professionals alike. Fraud Detection is an integral part of the SAS Solution for CRM. It provides financial services companies with ways of identifying or predicting fraudulent transactions. In this white paper we have given a couple of examples of how the solution works in practice. The solution is based on SAS software, the world’s leading software for predictive analysis. The data mining capabilities of SAS software have been packaged in Enterprise Miner, a GUI-based solution that supports joint projects between IT, analytical and other professionals in the financial services sector.

16

Fraud Detection in the Financial Services Industry

Appendix: Enterprise Miner Data mining involves an interactive, iterative procedure in order to generate new information from data. SAS Institute defines data mining as “the process of selecting, exploring, and modelling large amounts of data to uncover previously unknown patterns of data for business advantage.” What is required to structure the data mining process is a framework of data mining tasks and the sequence of these tasks. SAS Institute defines this framework as the SEMMA Methodology. SEMMA (Sample, Explore, Modify, Model and Assess) describes a sequence of steps that may be followed during a data mining analysis. This logical superstructure provides users with a scientific, structured way of conceptualizing, creating, and evaluating data mining projects. The graphical user interface (GUI) and functionality of Enterprise Miner are constructed to support this methodology. As the following figure (Figure1) shows, the Tool window on the left consists of all the analysis options organized according to the SEMMA process. Users can choose the tool nodes either from the Tools window or by customizing their own Tool Bar at the top of the window. By dragging and dropping the tool nodes onto the diagram editor, the user can construct a process flow diagram (PFD) of his own data mining project.

Figure 6 Enterprise Miner’s GUI, supporting the SEMMA Methodology For a detailed description of the SEMMA Methodology, please refer to the SAS Institute White Paper, From Data to Business Advantage: Data Mining, SEMMA Methodology, and the SAS System, Cary, NC: SAS Institute Inc. (1997).

17

Fraud Detection in the Financial Services Industry

References and Further Reading Topical “Unwitting Doctors and Patients Exploited in a Vast Billing Fraud”, New York Times, February 6, 1998. “Fraud Scheme Involving 91 Is Broken Up, Officials Say”, New York Times, October 14, 1998. “Health Care’s Giant: Artful Accounting – A special report. Hospital Chain Cheated U.S. On Expenses, Documents Show” New York Times, December. 18, 1997. “U.S. Auditing Five Hospitals In New York” New York Times, April 5, 1998. Hoffman, Thomas. “Empire strikes back against legacy system” ComputerWorld, Vol. 30, No. 43 (October 1996), 12. Hoffman, Thomas and Nash, Kim S. “Data mining unearths customers,” ComputerWorld, Vol. 29, No. 28 (July 1995), 1, 28. Sparrow, Malcolm K. “License to Steal: Why Fraud Plagues America’s Health Care System” Westview Press, 1996. Way, Paul. “Decision time for decision support” Insurance & Technology, Vol. 21, No. 8 (August 1996), 30-34. Way, Paul. “Managing knowledge: the CIO’s next challenge” Insurance & Technology, Vol. 21, No. 8 (August 1996), 52. Williams, Nia. “Data mining with neural networks” Insurance Systems Bulletin, Vol. 9, No. 7 (March 1994), 3-4.

18

Fraud Detection in the Financial Services Industry

SAS Institute White Papers SAS Institute Inc., (1996), SAS Institute White Paper, SAS Institute’s Rapid Warehousing Methodology, Cary, NC: SAS Institute Inc. SAS Institute Inc., (1999), SAS Institute White Paper, Finding the Solution to Data Mining – A map of the features of SAS® Enterprise Miner™ Software, Version 3, Cary, NC: SAS Institute Inc. SAS Institute Inc., (1999), SAS Institute Best Practice Paper, Data Mining and the Case for ™ Sampling: Solving Business Problems Using SAS Enterprise Miner Software, Cary, NC: SAS Institute Inc. SAS Institute Inc., (1999), SAS Institute Solution Overview, The SAS® Solution for Customer Relationship Management, Cary, NC: SAS Institute Inc. SAS Institute Inc., (1997), SAS Institute White Paper, From Data to Business Advantage: Data Mining, SEMMA Methodology, and the SAS System, Cary, NC: SAS Institute Inc. SAS Institute Inc., (1997), SAS Institute White Paper, Business Intelligence Systems and Data Mining, Cary, NC: SAS Institute Inc.

19

Fraud Detection in the Financial Services Industry

20

Fraud Detection in the Financial Services Industry

21

Driving Performance Improvements in the Health Service

SAS UK & Ireland Wittington House Henley Road, Medmenham Marlow, Bucks SL7 2EB Tel: +44 (0)1628 486933 Fax: +44 (0)1628 483203 www.sas.com/uk

SAS International PO Box 10 53 40 Neuenheimer Landstr. 28030 D-69043 Heidelberg, Germany Tel: (49) 6221 4160 Fax: (40) 6221 474850

SAS World Headquarters SAS Campus Drive Cary, NC 27513 USA Tel: (919) 677 8000 Fax: (919) 677 4444 www.sas.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright ©2002, SAS Institute Inc. All rights reserved.

2

0166UK1001

Related Documents