Machine learning in fraud detection and how it fights cybercrime

Machine learning in fraud detection and how it fights cybercrime

June 21, 2021

Andrea Di Stefano

Technology Research Analyst

Way back in 1452, a year before the Ottomans besieged and conquered Constantinople, a talented Transylvanian engineer named Orban offered his services to the Byzantine Emperor Constantine XI. His proposal, which consisted of constructing an innovative type of large-caliber artillery, was rejected due to the impossibility (or unwillingness) to secure the required costs of production.

Thus, Orban approached Costantinople's archenemy, the Ottoman sultan Mehmed II, who decided to hire him and leverage his expertise for the upcoming siege. That choice proved wise, as Orban's bombard greatly contributed to the crumbling of the Theodosian Walls and the ruinous Byzantine defeat.

The obvious lesson we can draw from this event is that any technological innovation should be seized in time because it may end up favoring our enemies instead of us. Even when the innovation we’re talking about is machine learning and our rivals are modern fraudsters ready to exploit the power of AI.

Let's see how mass digitization is paving the way for new, ever-evolving kinds of crime and how machine learning solutions may enhance fraud detection and save us from these threats.

A never-ending battle

In recent years, the general shift towards a fully digitalized economy seems to have been more of a catalyst than a brake on criminals’ inventiveness and initiative. PwC's 2020 Global Economic Crime and Fraud Survey highlighted how fraud is literally everywhere.

According to this investigation, nearly half of the 5,000 respondents across 99 territories experienced at least one fraud over the previous 24 months, while the average amount was six per company. The most common types among the criminal acts reported split almost equally between those committed by internal and external agents, and consisted in customer fraud, cybercrime, and asset misappropriation.

Frequency of fraud events by type

PwC's researchers also pointed out how different types of fraud events had a divergent impact on each sector. Financial services, for example, were the most severely disrupted by customer fraud, while the tech/media/telecom industry appeared to be particularly affected by cybercrime.

Most disruptive fraud events by industry

Anti-fraud countermeasures: cannon shots... or fireworks?

The total losses suffered by the surveyed organizations due to the aforementioned categories of events amounted to $42 billion, with peaks of over $50 million lost by each of the most “unfortunate” companies. Sadly, the countermeasures deployed to address this massive issue have been rather mild, as only 56% of such organizations conducted an investigation on their worst fraud episode, and just one-third of them reported it to their board.

However, it would be unfair to blame the victims. Rather, these trends represent the symptoms of a fraud detection arsenal that desperately needs some kind of updating, in view of the major changes that are making traditional anti-fraud systems no longer as effective as they used to be.

Not to mention the fact that fraud perpetrators can easily exploit the general fragmentation of key fraud detection resources and teams across different organizations or even within single companies. A flaw that makes corporations unable to achieve proper data sharing and coordination against common threats.

Why machine learning, and why now

Nowadays, the most common fraud detection approach relies on traditional rule-based analytical models, which follow a pretty straightforward "if / then" logic. For example, a typical fraud detection system may flag users who exceed a certain spending threshold with their credit cards or use such cards in unusual locations. Unfortunately, this strategy sounds quite limited in a world where exceptional scenarios run high. In fact, it doesn't seem to live up to the latest fraud scenarios and the overall spreading of cybercrime.

That's why many industries, starting with telecommunications and BFSI sectors, are increasing their investments in fraud detection measures and gearing up with new AI-powered tools that should fix the typical problems affecting traditional approaches.

According to P&S Intelligence's 2019 Fraud Detection and Prevention Market Research Report, indeed, the global fraud detection market is set to advance at a CAGR of 15.1% and reach $47.9 billion by 2024, mainly driven by AI and machine learning implementation.

Finding the bad guys with self-learning machines

Machine learning, in particular, may be the key to countering ever-changing fraud threats. This branch of artificial intelligence draws its strength from algorithms that can process huge datasets, spot patterns or cause/effect relations among them, and autonomously build models describing such correlations. These models can therefore be used to get precious insights on current events, forecast future trends, and enhance corporate data-driven decision making.

Another significant feature of machine learning algorithms is their ability to improve with experience, i.e. to refine and update such models over time as they are "fed" with new data. In practical terms, machine learning-powered systems don't need to be manually instructed with hundreds of rules to detect fraud cases. Rather, they can be trained with information about previous examples of fraud and autonomously understand the characteristic patterns of these events to recognize them once they recur.

Machine learning-powered systems can even predict upcoming criminal actions, including new types of fraud, by spotting suspicious and unconventional behavioral patterns that deviate from the norm. Furthermore, as soon as they encounter a new fraud scenario, machine learning-based anomaly detection systems will update their models with fresh information to quickly adapt to such threats.

This degree of adaptability proves useful also in the opposite context, namely when it comes to distinguishing between an actual crime and an absolutely safe event. As for the previous example of a credit card used more than usual in some unexpected place, the machine learning algorithm can take into account that the owner used the same card to purchase a flight to that specific location a few weeks earlier. Therefore, our guy is likely to be a lucky tourist enjoying his holidays in some expensive restaurant, more than a dangerous scammer.

Machine learning vs rule-based approach

How do these features impact fraud detection capabilities and how do they make machine learning different from the rule-based approach in terms of performance?

  • Flexibility: the rule-based approach is not flexible enough to deal with rapidly evolving fraud patterns. After all, the rule sets are manually coded and built on previous fraud scenarios, so they should be continuously adjusted according to new types of events.
    As we've previously seen, that's not the case with machine learning-driven platforms thanks to their ability to adjust models on their own based on new kinds of threat.
  • Rate of false positives: the traditional approach tends to follow a "black or white" mindset. Indeed, it brings to a massive amount of false positives that wouldn’t represent a real threat but still need to be double-checked.
    Machine learning-based systems, on the other hand, can take into account a wider amount of variables and figure out the general context. This helps to grasp certain nuances, understanding the logic behind a suspicious case, and avoiding false alarms.
  • Reactivity: the growth of electronic payments and digital transactions is putting pressure on banking and financial institutions, which must continually identify new types of crimes and update the rules to flag suspicious anomalies.
    Machine learning-driven systems can deal with this process far quicker than humans, thanks to their self-learning abilities and sheer processing speed.
  • Data pool: statistical and rule-based fraud detection methods can process structured data, such as transaction figures, but may struggle when handling unstructured data, for example, written reports and pictures.
    Machine learning, on the other hand, can deal fairly well with any kind of data, massively improving the pool of information from which to draw useful insights.
  • Compliance: the relatively low reactivity and flexibility of the rule-based approach, combined with the need for constant human intervention, clashes with the increasingly strict trading, fiscal, and data-management regulations.
    Also in this regard, machine learning can lend a hand by ensuring greater speed and accuracy in fraud detection procedures, while minimizing the possibility of human error that may result in investigations or penalties.
Rule-based vs machine learning fraud detection
  Flexibility False positives Reactivity Data pool Compliance
Rule-based approach Low adaptability to new threats High rate of false positives Longer reaction time Mainly structured datasets Possible human errors and violations
Machine learning Flexible, adjustable models Smarter, context-based analysis Real-time identification Structured and unstructured data Enhanced compliance

Consult Itransition to implement machine learning the right way.

Get to know our team

Get in touch

Machine learning in top fraud scenarios

After clarifying the advantages of machine learning over traditional approaches, let's take a look at some of the main fraud scenarios where this technology can be deployed to keep criminals at bay.

1. Market manipulation

Needless to say, financial institutions have soon understood the synergistic potential between the stock market and machine learning and the benefits of predictive analytics in finance, considering the large sums involved and the required compliance with increasingly strict regulations. Machine learning-driven systems are commonly adopted to spot anomalies in stock traders' activity but also to cross-check transactions and brokers' data with the aim of detecting inconsistencies in the information provided.

Even Nasdaq adopted a similar solution to recognize fraudulent equity orders and report them to the competent authorities. Such a system also allows probing unfamiliar trading patterns, and therefore shedding light on suspicious events which may end up being previously unknown types of fraud.

2. Money laundering

Another type of fraud closely linked to the financial sector concerns money laundering. Again, it proves useful to apply machine learning in banking to track anomalous transactions that could be a clue to criminal activity.

Indeed, machine learning models can be trained with historic data relating to three possible scenarios: lawful transactions, money transfers flagged as suspicious by bank alert systems, and potential money laundering cases reported to the authorities. For each of these contexts, machine learning systems will analyze information such as the senders’ and receivers' background or their previous transaction histories. This way, they may spot the same patterns once they recur in future scenarios and therefore distinguish between legitimate activities and criminal actions.

3. Electronic payment fraud

The same logic mentioned above can be followed to recognize electronic payment fraud, a widespread crime taking many forms and encompassing almost any industry, from banking to ecommerce. Unsurprisingly, this phenomenon generated gross global losses of $28.65 billion in 2019, according to the 2020 Nilson report.

Global card fraud losses, 2013-2027

Machine learning algorithms have a good chance of containing this escalation by detecting abnormal account behavior that could be a sign of fraud. Some of the potential factors taken into account are a growing transaction frequency, payment on a card carried out significantly before the due date, associations with different high-risk accounts, and so on.

Machine learning-driven systems can also update card users' behavioral profiles after each transaction, making future predictions more precise and avoiding false positives.

4. Identity theft

Another type of crime strictly connected to the previous one and remarkably widespread in many scenarios, including fraudulent loan applications and ecommerce scams, is identity theft. To deal with this kind of fraud, we can count once again on machine learning-driven analysis of users' habits and transaction data.

Furthermore, we can leverage machine learning-powered computer vision to analyze identity documents or to add additional verification mechanisms such as face recognition and biometrics.

5. Fraudulent insurance claims

Machine learning solutions can also be implemented to strengthen fraud detection in the insurance sector, especially in healthcare. Indeed, this technology is one of the catalysts responsible for the massive growth of the insurance fraud detection market, which could reach $ 7.9 billion globally by 2024 according to MarketsandMarkets' estimates.

Insurance fraud detection market’s expected growth

Algorithms can easily identify false and duplicate claims: for example, we may be dealing with a customer who reported a wrong diagnosis or exaggerated medical coverage costs. Regarding this aspect of machine learning in healthcare, a valuable tool to counter criminal attempts is natural language processing, which ensures an in-depth analysis of unstructured data such as medical reports. These solutions leverage machine learning to scan documents written by doctors, insurers, or clients, searching for suspicious inconsistencies.

6. Tax fraud

The last point of our roundup concerns tax fraud. As you may expect, machine learning skills in identifying unusual patterns can be easily applied to enhance audit and tax compliance, for example by examining the general ledger in search of anomalous entries which could be the signs of attempted fraud.

Algorithms can spot a wide range of clues easier and faster than human auditors, taking into account various parameters: the monthly variations in companies' gross sales, the relations among different taxpayers, the inconsistencies among the purchases or the itemized deductions in an income peer group, and so on.

Have a machine learning project in mind?

Contact us

Ultimate weapon or double-edged sword?

The more our society moves towards a full digitalization, the more cybersecurity attacks grow in impact and frequency, with fraudsters progressively increasing the complexity of their criminal plans. Nowadays, scammers and cybercriminals can count on the same tools that legitimate institutions deploy to contrast them, fighting some sort of twisted symmetrical war with no holds barred.

Artificial intelligence and machine learning can be exploited to crack passwords or captchas, to power ever more adaptive bots, manipulate datasets, and more. Are we ready to face this threat? Well, so and so. According to ACFE's 2019 Anti-Fraud Technology Benchmarking Report, only 13% of organizations surveyed used AI and machine learning to detect and deter fraud.

On the other hand, research predicts a positive future trend. Based on MarketsandMarkets' 2020 Fraud Detection and Prevention Market Global Forecast, the global anti-fraud business may grow from $20.9 billion in 2020 to $38.2 billion by 2025, with a major boost given by AI and machine learning.

Fraud detection and prevention (FDP) market growth forecast

However, the technology itself will not be enough. If they really want to defend the city walls from fraudsters' artillery, organizations will have to implement machine learning correctly, setting up an adoption plan, collecting high-quality datasets, and developing the proper expertise.