Big Data Analytical Processing

Cloud-based Software-as-a-Service Solution

  • Machine learning, predictive analysis, statistical modeling and model testing
  • Powerful and versatile API, multiple data sources, and applications integration
  • Designed to work on Amazon Elastic Compute Cloud

Context

Striving for cost-effective BI

Business Intelligence remains a key investment area for many companies. Today, to survive in a highly competitive environment, organizations have to deal with bulk business-critical data that previously was not even recognized as important.

The nature, volume and dynamics of that data often make traditional BI tools ineffective or useless at all. That is why businesses expand their BI facilities to leverage data mining techniques to identify valuable information, support decision making and enhance business intelligence in general.

Such smart BI solutions require expensive investments into both software implementation and hardware infrastructure, which is hardly acceptable for enterprises and becomes almost unavailable for smaller businesses. And this is where cloud-based SaaS becomes a perfect option.

Big data analytics on demand

Our customer — a pioneer in SaaS BI — was inspired by the idea of providing big data analytics on demand.

The company planned to launch a brand new software for predictive and sentiment analysis that would overcome the major challenges of big data processing. It was targeted to assist retail and logistics companies in testing statistical models, revealing dependencies between various business parameters, and forecasting decision impact.

Time-to-market goals were vital for our client to stay ahead of the competition in a rapidly changing BI domain and cutting-edge cloud technologies. The project had tight time frame thus requiring intensive development, agile project management and precise coordination.

The company engaged Itransition as a mature technology partner to deliver SaaS product that would enable analytical processing of bulk data uploaded online. Assuming that the application deals with huge data arrays in on-demand mode, high performance and scalability were crucial requirements. The objective should have been resolved by carefully designed solution architecture optimized for cloud development.

Solution

Functionality overview

Keeping in mind best OPD methodologies and practices Itransition developed a software product that serves as an analytical platform providing users with multiple options to process bulk data to receive predictive analysis results.

The solution is designed to work on Amazon EC2 and comprises 3 major blocks: data uploading module, processing kernel and visualization module.

Uploading data

Users are able to upload multiple files containing data via web based interface with drag-and-drop feature. The software supports various file formats such as SVM, CSV, ARFF, etc. Service subscribers have the opportunity to manage cloud infrastructure and identify folders/area to use down to every single file. To enable automatic data uploading Itransition built simple and versatile API that allows integration with various data sources/applications.

Analytical processing

Generally, the software allows users to process data using Classification and Regression Trees Methods (C&RT). The platform provides a set of tools to build, train and test appropriate statistical models. Users can also configure what data files are to participate in model building and identify testing methods for different datasets and files.

Visualization

The analytical output is supported with comprehensive data visualization assisting in results recognition and interpretation.

Due to specifics of big data processing and modeling Itransition utilized special plotting library to enable fast and accurate data visualization.

Technology Highlights

Cloud Computing Based on Amazon Web Services

Amazon Cloud is used as a deployment foundation to provide high on-demand scalability. The objective required from Itransition special cloud development techniques and architectural approach. Amazon EC2 enabled the hosted application to scale up and down within minutes according to the volume of uploaded datasets. The system user can exploit hundreds of server instances simultaneously.

Java

Java was identified as the most relevant technology to deliver the required functionality scope, proper level of scalability and performance. Leveraging its strong Java skills Itransition delivered easy to maintain and expand modular application completely compatible with Amazon EC2.

Hadoop MapReduce Framework

Hadoop MapReduce is a programming model and software framework that provides the solution with the ability to rapidly process vast arrays of data in parallel with large clusters of compute nodes. HDFS (Hadoop Distributed File System) was selected in consideration of its ability to store extremely large data sets, and to stream data at high bandwidth to user applications.

Glassfish High Performance Application Server

Modular architecture minimizes overhead by only starting GlassFish Server modules that are required to serve running application. It supports application clusters high availability and scalability.

Intelligent Customer Billing Technology

Akka billing mechanism tracks and summarizes all performed operations per session and generates a custom bill based on resources consumed (depending, in turn, on processed data volumes and complexity of the models).

Development Process and Results

Project release was delivered in accordance with the Agile methodology that brought rapid, incremental, and efficient application development approach. The major part of the application was assembled in less than a year as the customer intended to launch the platform at the earliest possible date to speed up the return of investments.

Right after the platform production release the client requested a permanent active development support to sophisticate the application and make its functionality more attractive from a business standpoint. A 3.5-year long productive collaboration is still in progress.

Screenshots

Business users are often frustrated by the deployment cycles, costs, complicated upgrade processes and IT infrastructures demanded by onpremises BI solutions. SaaS- and cloud-based BI is perceived as offering a quicker, potentially lowercost and easier-to-deploy alternative, though this has yet to be proven

James Richardson Gartner Research Director

Highlights

  • Solutions

    Business Intelligence

  • Industries

    ISVs

  • Technology

    Java

Quick Start