Itransition delivered a BI platform for predictive analytics and helped the customer increase their buyer conversion rate by 8 percent, as well as reduce infrastructure costs by 50 percent.
Our customer is an established ecommerce company providing best-selling brands of clothing, accessories, and home goods. The company ships these products worldwide, and have served over 20 million registered customers.
With more than 200,000 people using their website and mobile app daily, the retailer collects and processes large amounts of data in order to know their customers’ needs. To automate and simplify these processes and make informed decisions, the company decided to build a single platform that would collect user behavior data from their website and mobile app, sort and analyze it.
The retailer planned to use the platform to build predictive user behavior models to forecast buyer conversion rates, product interest, and future sales. Apart from that, the company wanted to increase their conversion rate while reducing their spend on the infrastructure management.
One of our past customers recommended Itransition as a software vendor with a vast expertise in ecommerce and, particularly, retail BI. Taking into consideration our track record of delivered business intelligence services, the retailer chose to approach our team for their BI platform development.
Having delivered several successful retail BI solutions operating around the globe, our team used their expertise to build a centralized BI platform. The solution gathers and analyzes data in a near-real-time mode and provides accurate customer data for further website and mobile app personalization. The delivered solution processes clickstream data, mobile data, server events, and data on email campaign engagement.
Platform Load | |||
---|---|---|---|
10 TB of data processed |
8 million trackable events on the website and in the mobile application |
3.5 million emails in the system |
30 thousand events per minute on the website (on average) |
* Disclaimer: According to the Non-Disclosure Agreement, we cannot reveal the screenshots of the real system. Here we provide similar screenshots created to present an idea of the solution developed by Itransition
Architecture Outlines | |
---|---|
Event Tracking Layer |
Tracking events from different sources (web, mobile, server, etc.) |
Event Collecting Layer |
Collecting both tracked events and operational data from the backend systems (ecommerce, CRM, etc.) |
Event Processing Layer |
Loading, normalizing, filtering, validating, and transforming collected data |
Data Storage Layer |
Storing data that is optimal for statistical analysis and machine learning |
Data Consumption Layer |
Building data marts with view and integration APIs to access the data |
Integration Layer |
Gathering data from third-party sources via connectors, adaptors, and ETL jobs |
The platform supports two major user roles: the retailers’ marketing team and subscribed members.
The marketing team can:
Subscribed members can:
To help our customer provide personalized experience to their online visitors and improve the accuracy of recommendations, we developed a recommendation engine using collaborative filtering. The collaborative filtering algorithm operates on implicit user feedback such as purchases, views, clicks, and other metrics coming from the website, mobile app, or emails.
We chose this methodology as it scales easily to process terabytes of data, so we could run it on more than ten machines at a time.
Our team opted for the alternating least squares (ALS) algorithm, initially used during the Netflix Prize challenge, as it met the project’s scalability and performance criteria. We also used a random forest regressor to predict product scores, calculated as the combination of item clicks, add-to-cart button clicks, and purchases.
Taking into consideration more than 20 million platform users and 9 million SKUs, we selected Apache Spark as the main ETL platform and Spark MLLib, based on the ALS algorithm, to run ML pipelines in production.
Our team also applied computer vision for a number of internal tasks:
Whenever we develop retail BI solutions, we always prioritize quality assurance so that the final solution meets the quality and performance requirements. Therefore, our dedicated QA engineers has been performing ongoing performance testing of the deliverables throughout this two-year project.
The performance testing allowed the testing team to detect multiple stability issues and several critical defects in the components built on top of Apache Hive and Apache Storm, such as a memory leak causing out-of-memory (OOM) errors. Itransition eliminated all the issues and defects by moving the platform to a new technology stack, which helped improve the overall performance of the solution.
To ensure a stable, predictable, and timely delivery process during the retail BI development, our team applied continuous integration and delivery (CI/CD) practices with continuous code review and quality assurance.
One of the project’s key goals was to make the solution painlessly scalable while reducing the overall infrastructure costs. Initially, the solution was based on the Apache stack, including Kafka, Storm, and Hive Streaming. Together with the customer, we decided to host the solution on the Amazon Web Services (AWS) and build it with a serverless architecture.
This approach allowed the development team to make the solution easily scalable and fault-tolerant, as well as ensure the auto-scaling of the resources in place, minimize their idle time, and, as a result, reduce the infrastructure costs.
Itransition developed the data collection layer based on the Hortonworks Data Platform (HDP).
We also optimized costs associated with the Amazon DynamoDB management. With the application’s traffic alternating substantially during the day, it was very difficult to forecast and control it in order to effectively use the provisioned capacity mode. Therefore, we switched to the Amazon DynamoDB on-demand pricing with no planned capacity boundaries. It allowed the customer to avoid situations when the data storage capacity was under- or over-provisioned. The pay-per-request pricing helped the customer cut down the database management costs by almost 50 percent.
Another challenge that our team faced during this retail BI project was to reduce costs associated with training dockerized deep learning (DL) models. For example, the AWS ML platform SageMaker is expensive to train DL models on graphic processing unit (GPU) instances. It doesn’t support Amazon EC2 Spot Instances, which bring the benefit of spare computing capacities available in the AWS Cloud. With EC2 instances, it’s possible to rent virtual servers (instances) for the needed amount of time and then request an on-demand instance. To overcome this pitfall, our team developed a custom framework that allowed building TensorFlow-based models, dockerize them and deploy to EC2 Spot. As a result, we were able to save around 50 percent of costs.
Itransition also integrated the platform with third-party solutions and tools, including:
Itransition delivered a retail-specific BI platform for data collection and analysis, helping the customer understand online user behavior better and increase sales through AI-powered personalization.