ETL System for Big Data Migration

Itransition developed a custom ETL utility, empowering automated extracting and indexing of heterogeneous files from the IBM FileNet P8 data system, data mapping, transformation and saving as suitable formats for the target ERP system.

Problem

PEC stands for Pedernales Electric Cooperative

Customer

PEC is the largest distribution electric co-op in the USA, delivering safe and reliable energy services to 270,000+ accounts across the country. 

The company’s everyday work is tightly knit with the increasingly growing technical and regulatory documentation stored in the corporate ECM system on top of IBM FileNet P8. When the repository’s size exceeded 1 terabyte, the system’s performance declined considerably, resulting in long-lasting downtimes that hindered the staff’s productivity.

Seeking to upscale business processes, the company decided to cut over to a more scalable ERP system and looked for a vendor to migrate the tremendous document repository.

Project Objectives

The Customer’s database migration process was expected to follow the orthodox flow encompassing 3 major steps:

  • Extraction. All data must be securely and accurately retrieved from the source ECM system with as little resources as possible.
  • Transformation. The step includes data aggregation, systematization and conversion to correspond to the rules of the target environment.
  • Loading. The target system friendly data are uploaded to a new database, then undergoing attributing and indexing.

Having found the contractor for the 3rd migration step, the Customer was looking for a software vendor to entrust the legacy data export and conversion. Considering Itransition’s time-proven expertise of big data software development and experience of handling custom-tailored projects, PEC selected Itransition for this task.

Solution

Itransition marshaled its finest resources to build a robust utility for document base migration. The process automation mechanism enables seamless and accurate export and conversion of millions of documents with minimal dependency of manpower. Our team has been converting the database, retaining the original data structure with little to no effort at record-breaking speed: the data are processed 10 times faster than it had been defined in the project documentation to the Customer’s immerse satisfaction.

To prepare 24+ million files of diverse formats for migration, Itransition developed a custom ETL utility, empowering automated extracting and indexing of heterogeneous files from the IBM FileNet P8 data system, data mapping, transformation and saving as suitable formats for the target ERP system.

According to the Customer’s requirements, the solution was designed for migration of the following source data:

  • Images stored in IBM FileNet P8 4.5;
  • 1,024 gigabytes of text documentation;
  • XML load files;
  • Gap/cutover data.

Leveraging extensive experience of building load-resistant web systems and best ETL practices, including incremental loads, scheduling, monitoring and logging, Itransition developed high-performance document base retrieval and transformation software.

Process

The intricate algorithm rests on the open source technologies and avoids resourceconsuming XML configuration. Our specialists selected Spring Framework as the core technology, Apache Cassandra for storing indexes and statistic data and the RabbitMQ platform for queues implementation. The ETL business logic is implemented with Java.

Seeking to assure data accuracy and consistence, we implemented a comprehensive logging mechanism recording the emerging errors along the export process.

The solution was designed in a way that didn’t negatively affect the source system in terms or performance, response time or any kind of locking, so the solution didn’t bother the company’s working process.
With a robust tool developed, our team currently performs smooth export of the legacy data.