Machine learning for stock market prediction: a tech overview

Machine learning for stock market prediction: a tech overview

August 23, 2022

Andrea Di Stefano

Technology Research Analyst

In ancient Rome, well-respected priests skilled in the art of divination and known as haruspices examined the entrails of sacrificed animals to predict the future. Nowadays, after realizing that these old-fashioned methods are rather ineffective (as well as quite creepy), investors rely on modern, AI-powered oracles, capable of peering into huge datasets via computer algorithms to forecast the stock market's upcoming trends.

Despite the importance of artificial intelligence as a whole, the absolute protagonist of this new approach to stock prediction and selection that may reshape our way of trading is one of its newest sub-branches, namely machine learning (ML).

Let's discover the secrets of this technology and try to understand if traders' trust in algorithm-driven stock picking and machine learning advisory is well-placed.

Machine learning in stock prediction: the essentials

Machine learning for stock market prediction involves the adoption of self-improving algorithms to forecast the future value of a stock or another financial instrument and provide insights into stock trading and investment opportunities:


Combining data mining and ML algorithms, it's possible to create stock trading software that forecasts stock price fluctuations, volatility, and risks to recommend the most promising stock selection strategies. Such price predictions come from the analysis of numerous factors, including global financial trends, corporate earnings, and investors' sentiment in AI-powered social media.

Portfolio management:

The same algorithm-based approach represents a turning point in choosing the best investment options. ML-powered platforms and tools for AI-powered wealth management can process gargantuan amounts of information, evaluate potential asset allocations, and help investors build a well-balanced portfolio that is likely to increase in value (a similar role is played by machine learning in real estate).


How machine learning in stock prediction works

Machine learning focuses on creating computer algorithms that can automatically improve their performance through experience. Specifically, ML algorithms can recognize patterns and relations among the data they are trained with, build mathematical models concerning such patterns, and use these models to make predictions or decisions without being explicitly programmed to do so.

Furthermore, the more information ML-based systems can process, the more patterns will be detected and the aforementioned models will be more polished, allowing algorithms to upgrade their analytical and forecasting performance.

Such capabilities are invaluable to financial firms. By delving into the depths of big data (including stock trends, corporate performance, financial news, investor behavior, and social media information), these modern “crystal balls” can pinpoint the most subtle, non-linear relationships between all these variables. Based on such findings, they will make realistic stock price predictions and provide market players with useful insights and recommendations on future economic tendencies.

Looking for a reliable ML consulting partner?

Let’s start a project together

Get in touch

The best ML algorithms for stock price prediction

Assessing machine learning algorithms for stock market forecasts is a task that should be approached with due caution for two good reasons. First, research is still ongoing and far from achieving universally accepted results, as the range of algorithms suitable for this purpose is pretty wide, and evaluating their accuracy in a variety of different scenarios may be quite tricky.

Second, FinTech corporations and investment firms are typically reluctant to reveal their trump cards in order to maintain a competitive edge, as pointed out by OECD's 2021 Artificial Intelligence, Machine Learning and Big Data in Finance report. This means that most performance data on different ML-based stock price forecasting methodologies, along with information regarding their actual deployment maturity among self-proclaimed AI-driven private companies, is kept off the radar of independent researchers.

That said, we still can get a general idea of the progress in algorithms' development and implementation from academic studies and learned societies' reports. The 2022 Machine Learning Approaches in Stock Price Prediction article published by the UK-based Institute of Physics (IOP), for example, reviewed several research works focused on different stock prediction techniques:

  • Traditional machine learning encompassing algorithms such as random forest, naive Bayesian, support vector machine, and K-nearest neighbor, along with ML-based time series analysis via the ARIMA technique.
  • Deep learning (DP) and neural networks, including recurrent neural networks, long short-term memory, and graph neural networks.

Following this classification, let’s explore such approaches and related algorithms along with their potential pros and cons.

Traditional machine learning

In this case, "traditional" simply refers to all those algorithms not belonging to a subset of machine learning known as deep learning, which we'll describe shortly.

Still, their traditionalism is not necessarily a flaw, as these algorithms have shown relatively higher accuracy, especially when processing wide datasets, and even more when combined into hybrid models. This fusion among different ML algorithms can easily amplify their potential, since some of them are better at handling historical data, while others give their best when applied to sentiment data. At the same time, these algorithms may be hypersensitive to outliers and fail to properly detect anomalies and exceptional scenarios.

Among the machine learning techniques and algorithms tested by researchers, we can mention:

  • Random forest: A powerful algorithm ensuring optimal accuracy for large datasets and widely used in stock prediction for regression analysis, namely the identification of relationships among multiple variables.
  • Naive Bayesian classifier: An efficient and relatively simple option for investigating smaller financial datasets and determining the likelihood of one event affecting the occurrence of another.
  • Support vector machine: An algorithm based on supervised learning (trained by providing actual examples of inputs and outputs), particularly accurate with massive datasets but not so good for handling complex and dynamic scenarios.
  • K-nearest neighbor: This algorithm uses a rather time-consuming, distance-based technique to forecast the outcome of a certain event based on the records of its most similar historical situations called "neighbors".
  • ARIMA: A time series technique that works best with linear (arranged in a sequential manner) data to forecast short-term stock price fluctuation based on past stock trends like seasonality, but cannot handle non-linear (randomly ordered) data and make accurate long-term predictions.

Deep learning

We may consider deep learning (DL) as the natural evolution of machine learning, since it leverages complex sets of task-specific algorithms, known as artificial neural networks (ANN), to mimic the mechanisms of the human brain and reach a superior level of analysis and context understanding compared to traditional ML systems.

Neural networks are sprawling structures of interconnected units defined as artificial neurons and capable of exchanging data. These nodes are organized in subsequent layers, the first and last of which are known as input and output layers, while those in the middle are called hidden layers.

The most primitive neural networks comprise just a few hidden layers, while the most advanced structures, defined as deep neural networks (guess why it's called deep learning) consist of hundreds of layers traversed by massive flows of data. Each layer involved in conveying and processing such data takes care of recognizing specific patterns or features, providing additional levels of abstraction.

Deep neural network

That's why most researchers show an increasing interest in the potential applications of deep learning algorithms for stock price prediction, with particular emphasis on the top-performing one, which seems to be long short-term memory (LSTM). However, other DL algorithms have proved quite efficient as well. Here’s a brief round-up:

  • Recurrent neural networks: A particular type of ANN in which each processing node also acts as a "memory cell", allowing it to store relevant information for future use and send it back to previous layers to refine their output.
  • Long short-term memory: At the moment, several experts consider LSTM as the most promising stock prediction algorithm. It's basically a type of RNN, but unlike standard RNNs, can process both single data points and more complex data sequences. This makes it well equipped to handle non-linear time series data and to accurately predict highly volatile price fluctuations.
  • Graph neural networks: Their core mechanism implies processing data restructured as graphs, with each data point (such as a pixel or word) representing a node of the graph. This conversion process can be challenging and result in lower processing accuracy, but it allows financial analysts to better visualize and frame relationships between data points.

Be it long short-term memory, recurrent neural networks, or graph neural networks, deep learning algorithms have easily outperformed traditional ML algorithms in terms of stock price prediction capabilities. However, DL systems are extremely data-hungry when it comes to training them and generally require significant data storage capacity and computing power.

ML algorithm implementation guidelines

Machine learning algorithms can be the beating heart (or brain, if you're not that sentimental) of price forecasting for stock selection. However, predictive analytics is a complex ensemble of processes, and algorithms represent just a cog in the engine. Here are some other elements you should consider to properly implement machine learning in the analytical pipeline, starting from data:

1. Data types in fundamental and technical analysis

As we've previously clarified, the datasets for training ML and DL algorithms are typically very large in terms of both volume and variety of data types. Regarding the second point, there are two major research methods prioritizing completely different categories of data. Fundamental analysis tries to determine the intrinsic value of a stock and its future fluctuations by monitoring market and industry parameters and corporate related-metrics, such as market capitalization, dividend, deliverable volume, net profit and loss, P/E ratio, and total debt.

Technical analysis, on the other hand, does not focus on the intrinsic stock value and the factors driving its variations, but instead on stock price and volume trends over time to spot recurring patterns and predict future movements, especially in the short term. These may include the well-known head and shoulders, triangles, cup and handles, and so on.

Examples of stock price patterns

An efficient ML system for stock prediction should take advantage of both methods and be fueled with a full spectrum of data types, encompassing corporate data and stock price patterns, to better frame the financial scenario under consideration.

2. Data source selection

Since data is the real fuel of ML-based stock prediction, finding rich and reliable data sources should be considered as an essential prerequisite to algorithm training. Fortunately, data scientists can count on a vast selection of financial databases and market intelligence platforms, which can be directly connected to a data analytics solution via API-based integrations to create an ongoing flow of data.

Among the most prestigious financial data sources, we may mention Bloomberg, Reuters, Nasdaq, S&P Global,, MarketWatch, Alpha Vantage, Financial Times, and Marketstack.

3. ML-based sentiment analysis

A particularly fascinating trend in ML-powered stock prediction is the so-called sentiment analysis. The assumption of this increasingly widespread approach is that fuelling machine learning systems with purely economic data is not enough to forecast stock trends.

Instead, financial experts should leverage machine learning, combined with text analysis and natural language processing, to identify the sentiment from sources like social media posts or financial news articles, i.e. to understand if such texts reflect a positive or negative opinion on specific financial matters.

These techniques have already been adopted by various financial giants. J.P. Morgan Research has created an ML solution tested on 100,000 news articles covering global equity markets, in order to assist experts in future equity investment decisions. Blackrock, instead, has leveraged text analysis techniques to predict future changes in company earnings guidance.

Example of Blackrock's data mining for sentiment analysis

4. Overcoming training and modeling complications

The training and data modeling process can be even trickier than data collection. First of all, massive datasets also imply an enormous range of variables and extremely long training times. This issue is generally mitigated through feature selection, a procedure performed to select the most relevant variables and therefore shorten the training phase while making the resulting data models easier to interpret.

Another problem comes from overfitting, namely when algorithms are trained too long on a certain financial data set and the model they produce performs very well on that specific set but can't properly handle new data samples. To address overfitting both in stock prediction and in any other ML use case, data is typically divided into training, validation, and test sets to split data modeling in multiple phases, process different samples, and therefore assess and fine-tune the model's performance.

This monitoring and validation process shouldn't end with model deployment but carried out on the ground to ensure that the trained model suits its intended business use and adapts to ever-evolving financial conditions.

The positive impact of ML on stock prediction

Financial firms are hardly new to combining brokers’ gut feelings with the massive use of computers and statistics. In recent years, however, the notoriously volatile nature of the stock market, further destabilized by global-impact events such as the COVID-19 pandemic, has prompted several institutions to explore the opportunities of AI, ML, and predictive analytics in finance. With promising results, we may say.

In its Innovations in Finance with Machine Learning report, for example, J.P. Morgan described an initiative aimed at suggesting the timing and sizing of trades in 2017. An ML-powered system based on the random forest algorithm was fed with a wide variety of data collected from 2000 to 2016, including international interest rates and the Federal Reserve meetings' calendar.

The difference in terms of returns between buying bonds through conventional methods and following an ML-based approach was pretty impressive, as you may see from the graph below. The third and sixth bars show the performance of standard operations without ML, which served as a control. The first and fourth bars, on the other hand, indicate returns from short selling, and the second and fifth bars from both buying and selling with ML guidance.

Performance of ML-enhanced trading vs conventional methods

Other encouraging data comes from a study found in the August 2020 Cerulli Edge Global edition, which showed that the cumulative return of ML-driven hedge fund trading from 2016 to 2019 was almost three times higher than that achieved by traditional hedge fund investments in the same period (33.9% vs 12.1%).

Speaking of hedge funds, OECD's aforementioned study confirmed the superiority of ML-driven trade execution over standard stock trading techniques, pointing out that the AI-powered hedge fund indices reported by the private sector outperformed the conventional ones provided by the same sources. Here's an example comparing AI-based and traditional hedge funds from the Eurekahedge Hedge Fund Index:

Eurekahedge  AI vs Conventional Hedge Fund Index

Considering such outcomes, we can expect a growing reliance on artificial intelligence and machine learning in this sector. In this regard, it’s worth mentioning that, according to Gartner’s estimates, three-quarters of venture capitalists globally will take advantage of AI-based tools to make their decisions by 2025.

Looking to implement ML in your financial workflows?

Machine learning consulting

Machine learning consulting

We can develop and implement custom ML solutions for fintech companies to streamline back-end operations or improve customer experience.

ML-based stock prediction challenges

Despite the massive potential of ML-based stock price prediction, this technology is far from perfect. Once implemented in a real-world scenario, it may give rise to some unexpected distortions, as brilliantly explained by Harvard Business Review.

  • Machines suffer from bias too: The effectiveness of ML-based systems depends on the quality of the information they are trained with. Therefore, insufficiently representative datasets could lead to bias. That’s why it is still essential to rely on data scientists and other qualified professionals to select the right data sources and integrate the judgment of machines with that of humans.
  • The stock market might be too chaotic: Algorithms build their models on historical data to predict financial trends but may struggle to face idiosyncratic, unprecedented scenarios, such as a global pandemic and its tremendous impact on the markets. Moreover, stock exchange data is rather limited, as the modern financial market is relatively young and a good portion of human history is unknown to algorithms.
  • A winning strategy cannot last if everyone follows it: If all traders followed identical ML-prompted investment strategies, they’d end up buying and selling the same equity securities and vaporizing any potential gain. A similar episode, known as the Quant Meltdown, happened in 2007 when a bunch of relevant hedge funds guided by similar quantitative models sold their stock simultaneously and suffered massive losses.

An ongoing, uneven adoption

Despite the growing adoption of machine learning in banking and the stock market, by investigating this trend in detail we might unveil conflicting situations. On the one hand, major asset managers are leading the way with significant investments in ML implementation and financial software development, but also by securing top talent. On the other hand, small businesses may have a hard time keeping up.

Based on CFA Institute, indeed, only 10% of portfolio managers surveyed said they had been using machine learning techniques in the previous 12 months.

Investment strategies of portfolio managers

Similar results were described by Statista in their report on the use of artificial intelligence for portfolio management in 2020, according to which only 6% of asset owners and investment managers surveyed declared applying the technology. Nevertheless, 76% of the respondents showed interest in the future adoption of AI for portfolio management.

No matter how things go, the race is on and the most forward-looking financial players may derive massive benefits from ML-based stock prediction. However, all of these advantages come with some challenges. After all, ML is an amazing technology, but it's not magic, despite previous simplifications involving clairvoyants and crystal balls.

And even if we really wanted to see it as a crystal ball, we'd still need a wizard to look into it, just as we'll still need human financial experts to properly leverage ML solutions in the years to come.