Data Analytics Solution:from Raw Web Data to Actionable Information

  • Collecting relevant information from heterogeneous online sources
  • Storing, normalizing, processing and searching the data extracted
  • Relation extraction and representation

Context

Our Customer is one of the leading companies in CIS undertaking research, development and manufacturing of high-tech products along with rendering information-focused services. The export-oriented government-owned organization partners with companies from over 30 countries and is looking for the ways to expand its areas of activity.

The company turned to Itransition for development of a web monitoring and natural language analytical processing system that would enable the company’s managers to make informed decisions based on actionable information.

The Customer aimed to deploy the web-based application in their corporate network first, with a view to offering the product to their clients.

Solution

The company previously partnered with Itransition on development of a system based on proprietary technologies. The project outcome prompted our Customer to entrust Itransition specialists with delivery of the current project, Java stack based this time.

A Robust Web Data Mining and Semantic Analysis Solution

The system users – analysts responsible for monitoring information space and preparing reports for top company’s executives – can access all the application features from a single access point – a Liferay portal.

The system monitors the web and extracts data stripped of ads, markups and other irrelevant information from heterogeneous online sources – online media, websites of research agencies, etc., both closed and publicly accessed – to be subsequently processed. Analytical reports and graphs are generated to provide decision-makers with visualized data on the key extracted objects and relations.

The system provides the following functionality:

  • collecting relevant online information from heterogeneous web sources (web pages) according to the predefined criteria and storing the data in an ECM;
  • data normalization (conversion of text and graphics file formats);
  • processing the data kept in the storage: rubricating, rendering, extracting objects (facts, key figures, etc.) and relations from unstructured text using a natural language processing tool;
  • searching the storage for documents and objects, with support for fuzzy string search;
  • generating graphs and analytical reports to reflect and visualize relations among objects.

The system also allows for extracting objects and relations manually.

Ensured security and flawless operation

The system is compliant with a strict security policy. The application security is ensured due to infrastructure distribution – separate security perimeters employed – and managing user access to the system features and data. The web based application is accessed using a secure HTTPS protocol. A supervisor is used to ensure stable system operation through application monitoring.

Results

The Customer was genuinely impressed by Itransition’s proven expertise in Java technology stack and the project outcome.

Challenged by continuously changing project managers on the Customer’s side and respectively shifting priorities, the Itransition team managed to attain all the goals set by the Customer and come up with a sustainable software product. Currently the system is used by a dozen of the company’s employees, providing key decision makers with the right information at the right time, in the right way. Thousands of documents are downloaded daily, with the total of 5-7 million documents stored.

Highlights

  • Solutions

    Business Intelligence

    Document & Content Management

  • Industries

    ISVs

    Public Sector

  • Technology

    Java

Quick Start