BI Platform for a Large Fashion Retailer

Itransition delivered a BI platform allowing for predictive analytics as well as website and mobile app personalization

Problem

Customer

Founded in 2008, our Customer is an established online fashion retail company providing best-selling brands of clothing, accessories and home goods for members only. The company’s website and mobile application publish new offers daily. They are available for the website and application users and valid from several hours to several days.

The company ships the goods worldwide, and their user base includes 18 million people. Around 200 thousand people visit their website on a daily basis. As the Customer works with subscription members only and has such a large amount of data to process in order to know their clients’ needs, the company’s executives and marketing managers came up to a conclusion they had to implement a single platform to collect user behavior information from the website and mobile application, sort and analyze it. Such a platform would provide an opportunity to build predictive user behavior models (buyer conversion, product interest, etc.) and make informed business decisions.

One of our clients, who worked with us in the past and operates in the same field as the Customer, recommended Itransition as a trusted vendor due to our solid expertise in the eCommerce and retail industry as well as proven record of accomplishment in developing web and mobile BI solutions. Thus, our full-time dedicated software development team started working on a brand-new BI platform for a fashion retailer.

Solution

The solution represents a single BI platform used by the company’s marketing team, executives, and website and mobile application users (content personalization, recommendations). It gathers and analyzes data in a small time period (near real-time mode) allowing for predictive analytics as well as website and mobile app personalization.

Platform Load

It processes around 10 TB of data excluding history

There are around 8 million trackable events on the website and mobile application

There are about 3.5 million emails in the system

30 thousand events per minute occur on the website

Types of information processed by the solution:

  • clickstream data;
  • mobile data;
  • server events;
  • email campaign engagement.

* Disclaimer: According to the Non-Disclosure Agreement that we signed with our Customer, we cannot reveal the screenshots of the real system. Here we provide some similar screenshots created with a view to offering the reader an idea of the solution developed by Itransition

Custom Dashboards: Client Activity
Custom Dashboards: Client Activity
Custom Dashboards: Sales and Margin
Custom Dashboards: Sales and Margin
Personal Recommendations Email Generated by the System
A Rough Example of a Personal Recommendations Email Generated by the System

Process

Overall, the team working on the project included 4 Itransition developers plus 8 developers, a system architect and a manager on the Customer’s side. Itransition specialists developed a proprietary solution for data collection based on Hortonworks Data Platform and implemented it.

We performed daily calls, weekly team meetings, and visits to the Customer’s site to ensure the seamless communication necessary for the project’s success. Throughout the project development, our team was involved in the following stages:

  • requirements analysis and specification;
  • design;
  • development;
  • integration;
  • testing;
  • deployment;
  • maintenance and support.
Solution Architecture
Solution Architecture

Marketing and Executive Team

  • Get data on user actions through various channels (website, mobile application, emails, questionnaires).
  • Review more than 100 type of custom reports for the executives and the marketing team of the company on the website usage, orders, various types of viewings and etc.
  • Perform ad-hoc queries on the collected data as needed.

Subscription Members

  • Get a personalized version of website, web application and email/push notifications: a person sees certain banners, site categories and types of goods according to their preferences (viewing and purchases made).

Solution Architecture

Architecture Outlines

Event Tracking Layer

Tracking the events from different sources (web/ mobile/ server/ etc.)

Event Collecting Layer

Collecting of both tracked events and operations data from backend systems (e-Commerce, CRM, etc.)

Event Processing Layer

Loading, normalization, filtering, validation and transformation of collected data

Data Storage Layer

Storing of the data in a manner optimal for statistical analysis and machine learning

Data Consumption Layer

Data marts, views and integration APIs for accessing the data

Integration Layer

Connectors, adaptors and ETL jobs for gathering data from third-parties

General Processing Flow
General Processing Flow
External Systems the Solution Is Integrated With Integration Protocol

Salesforce Marketing Cloud to manage email campaigns

FTP

Attune

REST

Liveintent

FTP/REST

SurveyGizmo

REST

Evergage

REST

Cloud Migration

The solution was based on Kafka/Storm/Hive Streaming stack, and it turned out to have issues with reliability and maintenance. Then our team joined efforts with the Customer’s side and worked closely with their BI/DWH team in order to provide the required solution quality. Together we transformed the platform into a 100% Cloud solution. The platform was migrated to AWS stack (Kinesis Streams/Firehose/Analytics, Lambda, S3) and the system architecture was simplified (became 100% serverless), which provided for better system maintainability and easier scalability.

Migration goals:

  • better scalability;
  • easier maintainability;
  • reasonable infrastructure costs.

Performance targets:

  • data volume: 10+ TB;
  • average update rate: 8 000 000+ events/day;
  • peak throughput: 30 000 events/min.
Before Migration After Migration
  • Scaling was performed in semi-automatic manner
  • Overkill multi-platform engineers were required for basic development and support
  • Some computation power was wasted because of idle time
  • Infrastructure costs were higher in comparison with on-premises deployment
  • Improved overall scalability and fault-tolerance
  • Utilized resources auto-scaling
  • Simplified development and maintenance
  • Minimized resources idle time
  • Reduced infrastructure costs

QA and Testing

Throughout the development process special emphasis was laid on quality assurance and testing practices to ensure the final solution meets the Customer’s specified quality and performance requirements and operates faultlessly. Itransition QA engineer dedicated to the project performed ongoing data-driven and performance testing for 24 months.

The most significant result of the performance testing of the near real-time event-collecting pipeline was related to several critical defects that were found in some of the products in Apache Hive / Storm stacks, e.g. a memory leak leading to OOM errors, as well as multiple stability issues. All the defects were successfully eliminated and issues were fixed, leading the platform’s functioning according to the Customer’s demand.

Results

The Customer was fully satisfied with the work results provided by Itransition specialists. Currently the BI platform for data collection and analysis is implemented and used by the company employees on a daily basis. It provides for better understanding of the user behavior that leads to increased sales and overall company success

Conversion into buyers via personalized emails (by the past quarter) increased by 8%.

Collected user data quantity required for building predictive user behavior models increased by 15%.

Monthly infrastructure costs dropped more than 30 times.