Machine learning for fraud detection: fighting crime with algorithms

Machine learning for fraud detection: fighting crime with algorithms

August 30, 2022

Andrea Di Stefano

Technology Research Analyst

Some of you may associate the futuristic concept of crime prediction with Tom Cruise's methodical  hand movements in the 2002 sci-fi blockbuster Minority Report. Needless to say, our current high-tech arsenal is far from utilizing the interactive projections of impending crimes portrayed in that movie.

What we can do right now, however, is to identify through machine learning algorithms the hidden clues left by scammers and cybercriminals while preparing or performing their fraudulent attempts, in order to stop them before it's too late.

Let's unveil the secrets of this technology and find out why using machine learning advisory may be the key to countering ever-evolving fraud threats.

Machine learning in fraud detection: an overview

Machine learning-based fraud detection and fraud prevention systems rely on ML algorithms that can be trained with historical data about previous examples of fraud and autonomously understand the characteristic patterns of these events to recognize them once they recur.

Therefore, such tools don't need to be manually instructed with hundreds of rules to detect fraud cases, unlike traditional rule-based solutions which follow a pretty straightforward "if/then" logic and trigger a response when one of the rules is violated.

ML systems can even predict imminent criminal actions by identifying anomalies, namely suspicious and unconventional behavioral patterns that deviate from the norm, which could be clues to upcoming fraud.

Another significant feature of machine learning algorithms is their ability to improve with experience, i.e. to refine their data models (the mathematical representations of previously identified recurring patterns and anomalies) over time, as they are "fed" with new data. This means that, as soon as they encounter a new fraud scenario, machine learning-based anomaly detection systems will update these models with fresh information to quickly adapt to such threats.

This degree of adaptability and understanding context also proves useful in the opposite scenario, namely when it comes to distinguishing between an actual crime and an absolutely safe event. For example, traditional anti-fraud software may flag users who exceed a certain spending threshold with their credit cards or use such cards in unusual locations.

A machine learning algorithm, on the other hand, will also take into account that the owner used the same credit card to purchase a flight to that specific location a few weeks earlier. Therefore, our guy is likely to be a lucky tourist enjoying his holidays in an expensive restaurant instead of a dangerous scammer.

Rule-based vs ML-based fraud detection and prevention

How it works: ML types and algorithms

Machine learning is a vast dominion encompassing several algorithms which may prove more or less accurate depending on the field of application. Before describing some of the most common algorithms leveraged in fraud detection systems, however, we should define the major subcategories into which they can be classified based on their underlying architecture and training approach.

Machine learning vs deep learning

As for the first point, experts distinguish between machine learning as a whole and its more advanced branch, known as deep learning. While we've already mentioned the basic functioning of "traditional" machine learning, it's worth highlighting the peculiar nature of deep learning and its intimate connection with extremely complex sets of algorithms known as deep neural networks.

Such structures, comprising interconnected layers of artificial neurons that replicate the human brain's architecture, can process data flowing from one layer to the next and spot even the most subtle patterns and features hidden within an immense amount of data points. This particular process makes neural networks extremely effective in detecting fraud, but it also requires enormous computing power and large data sets for proper training. And this brings us to the second point, which is training.

Supervised learning vs unsupervised learning

Do machines need a teacher to learn, or can they just observe the surrounding reality? The two main approaches to training and using machine learning algorithms, called supervised and unsupervised learning, answer these questions quite differently:

  • Supervised Learning: An ML-based fraud detection system is trained with large amounts of labeled data, previously annotated with certain labels describing its key features. In our scenario, it may be data from legitimate and fraudulent transactions described with ‘fraud’ or ‘non-fraud’ labels, respectively. These labeled datasets, which require a rather time-consuming manual tagging procedure, provide our system both with the input (transaction data) and the desired output (groups of classified examples), allowing algorithms to identify which patterns and relationships connect them and apply such findings to classify future cases.
  • Unsupervised learning: No clues and no human intervention this time. We fuel algorithms with unlabeled transaction data and let them autonomously group such transactions into different clusters based on their similarities (shared behavioral patterns) and differences (typical vs unusual patterns which can correspond to fraudulent operations). This approach, typically associated with deep learning, is computationally demanding but may be the only choice when facing fraud attempts that have never been met before and therefore previously unlabelled.
Supervised vs unsupervised learning

Fraud detection algorithms

Identifying the top-performing algorithm or a "cocktail" of algorithms to perform fraud-related data analysis is not a simple task, as it may depend on the scenario in which we operate. Here are some of the options on the table:

  • Logistic Regression: A supervised learning algorithm which calculates the probability of one event out of two alternatives, such as "fraud" and "non-fraud" based on a set of relevant parameters.
  • Decision tree: Another algorithm of the supervised learning subset, adopting a tree-like decision-making model in which every bifurcation represents the analysis of a certain metric or condition (spending threshold, location, etc) to determine whether an operation is fraudulent.
  • Random forest: A combination of several decision trees to further expand the amount of data types and conditions examined and identify non-linear relations among multiple variables.
  • Support vector machine: Several systems rely on this supervised learning algorithm for credit card fraud detection because of its excellent performance with large datasets, despite it being computationally demanding.
  • K-Nearest Neighbor: This supervised learning algorithm, which proved quite accurate but difficult to interpret when making a certain decision, can frame the nature of an event (be it fraud or non-fraud) by comparing it with similar occurrences recorded in the past.
  • Neural networks: Thanks to their complex, multi-layered architecture and superior data analysis capabilities, neural networks are the go-to algorithms when it comes to spotting non-linear relations and dealing with unprecedented fraud scenarios through unsupervised learning.

The benefits of ML in fraud detection

The distinct mechanisms driving ML-based fraud detection and fraud prevention systems make them superior to rule-based solutions in several respects.

1. Higher flexibility and reactivity

The rule-based approach is not flexible enough to deal with rapidly evolving fraud patterns. After all, the rule sets are manually coded and built on previous fraud scenarios, so they should be continuously adjusted according to new types of events.

Thanks to their self-learning abilities and sheer processing speed, ML-driven systems can deal with this process far quicker than humans by adjusting machine learning models on their own based on new kinds of threats.

Example: Capgemini

Capgemini reported that customers adopting its ML-based CPP Fraud Analytics software for credit card fraud detection and prevention have benefited from an increase in detection rate between 50% and 90% and a reduction in investigation time for each fraud case up to 70%.

2. Lowering the rate of false positives

The traditional approach tends to follow a "black or white" mindset. This results in a massive amount of false positives that wouldn’t represent a real threat but still need to be manually double-checked with expensive and time-consuming procedures.

Machine learning-based systems, on the other hand, are far more accurate and cost-efficient, as they can take into account a wider amount of variables and figure out the general context. This helps to grasp some nuances, understand the logic behind a suspicious case and avoid false alarms.

Example: Danske Bank

Similar enhancements have been experienced by Danske Bank, which implemented an ML-based anti-money laundering solution leading to a 60% reduction in false positives and a 50% increase in true positives.

3. Wider data pool and scalability

Statistical and rule-based fraud detection methods can process structured data, such as transaction figures, but may struggle when handling unstructured data, for example, written reports, insurance claims, and pictures from IDs and other documents.

In the meantime, machine learning and its related technologies, such as natural language processing and computer vision, can deal fairly well with any kind of data, massively improving the pool of information from which to draw useful insights. Furthermore, ML systems are highly scalable since they get more accurate as they process new data and refine their models. This implies that they actually benefit from wider datasets, unlike traditional methods, which only risk being overwhelmed.

Example: Chola MS

As reported by Accenture, the Indian insurance firm Chola MS provided its surveyors with Samsung tablets to collect survey data (including audio descriptions, written notes, and images) and store it in corporate databases. An ML-powered fraud detection solution autonomously compares this data with customers' emails and photos to identify potential inconsistencies, thereby speeding up the claims survey process.

4. Superior compliance

The relatively low reactivity and flexibility of the rule-based approach, combined with the need for constant human intervention, clashes with the increasingly strict trading, fiscal, and data-management regulations.

Also in this regard, machine learning can lend a hand by ensuring greater speed and accuracy in fraud detection procedures, while minimizing the possibility of human error that may result in investigations or penalties.

Example: Nasdaq

Nasdaq adopted an ML-based solution to recognize fraudulent equity orders, report them to the necessary authorities, and thus ensure transparent markets. Such a system also allows the probing of unfamiliar trading patterns, shedding light on suspicious events which may end up being previously unknown types of fraud.

It [points] out what we call an ‘interesting event’. It’s not necessarily a prohibited activity, but it’s what the model has deemed to be interesting because it’s not normal market behavior.

Mike O'Rourke

Mike O'Rourke

Senior Vice President, Head of Artificial Intelligence and Investment Intelligence Technology, Nasdaq

Get these benefits with Itransition’s guidance

Let’s talk

The current fraud detection landscape

In recent years, the general shift towards a fully digitalized economy seems to have been more of a catalyst than a brake on criminals’ inventiveness and initiative, making machine learning’s aforementioned strengths a potential game changer in the fight against fraud.

According to PwC's 2022 Global Economic Crime and Fraud Survey, for example, 46% of the respondents experienced at least one type of fraud over the previous 24 months. The study also reported a rising threat from external perpetrators, who typically leverage the new technologies adopted by modern businesses, such as ecommerce platforms and social media, as a hidden door to access users’ assets and personal data.

Sadly, the ML-based countermeasures deployed to address this massive issue have been rather mild. According to ACFE's 2022 Anti-Fraud Technology Benchmarking Report, for example, only 17% of organizations surveyed used AI and machine learning to detect and deter fraud.

Such stats portray a fraud detection arsenal that desperately needs some kind of updating, in view of the major changes that are making traditional anti-fraud systems no longer as effective as they used to be. At the same time, the report predicts a positive future trend, with the use of AI and ML expected to more than double over the next couple of years.

Adoption rate of anti-fraud tools

Machine learning in top fraud scenarios

After clarifying the advantages of machine learning over traditional approaches and the current state of ML adoption, let's take a look at some of the main fraud scenarios where this technology can be deployed to keep criminals at bay.

1. Market manipulation

Needless to say, financial institutions have come to understand the synergistic potential between the stock market and machine learning and the benefits of predictive analytics in finance, considering the large sums involved and the required compliance with increasingly strict regulations.

Machine learning-driven systems are commonly adopted to prevent financial fraud by spotting anomalies in stock traders' activity and cross-checking transactions and brokers' data, with the aim of detecting inconsistencies in the information provided.

2. Money laundering

Another type of fraud closely linked to the financial sector concerns money laundering. Again, it proves useful to apply machine learning in banking to track anomalous transactions that could be a clue to criminal activity.

Indeed, machine learning models can be trained with data relating to three possible scenarios: lawful transactions, money transfers flagged as suspicious by bank alert systems, and potential money laundering cases reported to the authorities. For each of these contexts, machine learning systems will analyze information such as the senders’ and receivers' background or their previous transaction histories. This way, they may spot the same patterns once they recur in future scenarios and therefore distinguish between legitimate activities and criminal actions.

3. Credit card fraud

The same logic mentioned above can be followed to recognize electronic payment fraud, a widespread crime taking many forms and encompassing almost any industry, from banking to ecommerce. Unsurprisingly, this phenomenon generated gross global losses of $28.65 billion in 2019 alone, according to the 2020 Nilson Report.

Global card fraud losses

However, machine learning in retail has a good chance of containing this escalation by detecting abnormal account behavior that could be a sign of fraud. Some of the potential factors considered are a growing transaction frequency (especially to purchase premium goods), payment on a card carried out significantly before the due date, associations with different high-risk accounts, multiple payment methods added in a short time, and so on.

Machine learning-driven systems for payment fraud detection can also update card users' behavioral profiles after each transaction, making future predictions more precise and avoiding false positives.

4. Identity theft

Another type of crime strictly connected to the previous one and remarkably widespread in many scenarios, including fraudulent loan applications and ecommerce scams, is identity theft. To deal with this kind of fraud, we can count once again on machine learning-driven analysis of users' habits and transaction data.

Furthermore, we can leverage machine learning-powered computer vision to analyze identity documents or to add additional verification mechanisms such as face recognition and biometrics.

5. Fraudulent insurance claims

Machine learning solutions can also be implemented to strengthen fraud detection in the insurance sector, especially in healthcare. Algorithms can easily identify false and duplicate claims: for example, we may be dealing with a customer who reported an incorrect diagnosis or exaggerated medical coverage costs.

One of the most valuable machine learning techniques in healthcare is natural language processing, which ensures an in-depth analysis of unstructured data such as medical reports. These solutions leverage machine learning to scan documents written by doctors, insurers, or clients, searching for suspicious inconsistencies.

6. Tax fraud

The last point of our roundup concerns tax fraud. As you may expect, machine learning skills in identifying unusual patterns can be easily applied to enhance audit and tax compliance, for example by examining the general ledger in search of anomalous entries which could be the signs of attempted fraud.

Algorithms can spot a wide range of clues easier and faster than human auditors, taking into account various parameters: the monthly variations in companies' gross sales, the relations among different taxpayers, the inconsistencies among purchases or the itemized deductions in an income peer group, and so on.

ML for fraud detection: adoption guidelines

Despite their proven efficiency, ML-based fraud detection systems can be quite demanding in terms of adoption requirements. Here are a few tips to streamline their implementation while overcoming the potential drawbacks of this technology.

1. Feature selection

Sometimes, less is more. A common data preparation practice for streamlining the training phase and model construction is to reduce the number of input variables, which means selecting a smaller subset of key features to train our algorithms while leaving aside redundant or irrelevant attributes. This results in shorter training times and easier model interpretation while mitigating machine learning’s gargantuan computational costs.

Among the features to consider for fraud detection, we may count the customer's IP and email address, preferred payment methods, age of their account, number of failed transactions, average order value, the fraud rate of the issuing bank, and many more.

2. Setting a threshold

Finding a reasonable compromise between security and a hassle-free user experience can be challenging. Indeed, the first goal requires fraud detection software to be particularly strict and block even vaguely suspicious transactions, which may end up being completely legitimate, while the latter implies a wider tolerance range when assessing potential anomalies, which may result in successful fraud. Machine learning's low rate of false positives certainly mitigates this issue, but algorithms are still far from being perfect.

When creating a fraud detection model, it's important to set a threshold, which determines the acceptance/rejection rate and minimum requirements to trigger a response, and therefore represents a tradeoff between true positives (fraudsters blocked), false positives (genuine users blocked), and false negatives (fraudsters not blocked). The right balance depends on the level of risk your business can afford. For example, a fraud detection system might be more tolerant when scanning large volumes of low-value transactions and be more rigorous when probing premium product purchases.

3. Regulatory compliance   

Another set of potentially conflicting elements to balance are ML systems' hunger for data and, on the other hand, compliance with major data protection standards and applicable legislation, especially in highly regulated fields like finance and accounting. That's why any fraud detection solution should be designed in strict accordance with such regulations, including PCI DSS, GDPR, and IFRS.

4. Off-the-shelf solutions

While businesses interested in fully personalized fraud detection software should opt for a bespoke solution built from the ground up, any company looking to reduce upfront costs and implementation time may consider adopting one of the ready-made machine learning-powered systems available on the market. These include Amazon Fraud Detector, IBM Security Trusteer, Sift's Digital Trust & Safety, and Signifyd’s Authorization Rate Optimization.

Amazon Fraud Detector’s operating scheme

Consult our experts to implement machine learning the right way

Machine learning consulting

Machine learning consulting

Itransition provides ML consulting services and helps outline, model and implement machine learning solutions tailored to our client's needs and requirements.

A reasonable balance

The more our society moves towards full digitalization, the more cybersecurity attacks grow in impact and frequency, with fraudsters progressively increasing the complexity of their criminal plans.

Nowadays, scammers and cybercriminals can count on the same tools that legitimate institutions deploy to contrast them, fighting some sort of twisted symmetrical war in which AI and ML are used on the one hand to crack passwords, power adaptive bots, and manipulate datasets, and on the other hand to fuel highly effective fraud detection systems.

The latter have certainly proved to live up to the hype, thanks to their adaptability to new threats, smart context-based data analysis, and real-time identification capabilities. At the same time, ML-based fraud detection is a reliable tool but not a silver bullet, and its notorious appetite for sensitive data, along with its substantial training and computing requirements, should be approached with proper expertise and compliance in mind.