April 22, 2021
A data migration strategy in six steps
Data Intelligence Researcher
A few decades earlier, data migration was a simple and straightforward task—you needed to move folders from your old cabinet to a new one, with more drawers. Maybe you needed to think of a more sophisticated filing system, but that was it.
Data migration today seems to stay on the straightforward side and involves transferring data from one system or repository to another. However, in most cases data is so diverse that its migration turns into a much more intricate process. Unfortunately, poorly managed data migration may entail grave data losses and performance blocks, particularly in case of neglecting or oversimplifying the data migration strategy.
Let’s talk about situations when businesses may require data migration and look at an example of an effective data migration strategy.
A simple answer to this question is to boost performance and get a competitive advantage. In practice, data migration is usually on the table when businesses introduce new systems:
However, data migration becomes a truly pressing matter when there’s a need to overhaul the entire infrastructure by moving from a legacy solution, which has become slow, heavy, and inadequate, to the cloud or between different cloud environments (like in these examples of EHR migration). In most cases, cloud migration embraces database, storage and app migration all at once.
To understand in what specific cases businesses require data migration, we asked business intelligence consultants from Itransition to share a few examples from their experience.
A company operating in the email management sector partnered with us to deliver an email archive migration solution and make it compatible with multiple platforms. Within the project, we delivered a tool that supported smooth migration of legacy email archives from/to 25+ archive systems, both on-premises and in the cloud, without compromising data integrity.
As a result, the tool created a competitive advantage for the company by processing large datasets and delivering 10x faster performance than equivalent products.
A US-based multinational company delivering analytics, technology and research services for the pharmaceutical industry approached us to help them keep up with the growing user base of their pharmaceutical data analytics platform and maintain its high performance. The customer’s data assets comprised over 500 million patient records of more than 50 thousand patents. We helped them with a cloud migration strategy and moved their solution from an on-premises server to the cloud.
As a result, the customer grew the user base and decreased infrastructure costs while ensuring high availability, scalability, and enterprise cybersecurity.
As businesses usually put generous investments and efforts into new platforms, data migration can be treated as just another task. In practice, data migration is a complex undertaking where low-quality data can affect core system operations, leading to subpar experiences. The process calls for attention and a strategy of its own.
Naturally, each strategy is designed according to business specifics and needs, but it usually follows a common pattern:
This stage is by far one of the most crucial ones in your data migration journey. It programs your expectations for the entire project duration, provides a clear route, and lays out the metrics for success.
First of all, you need to perform an exhaustive comparison of source and recipient systems. Together with an IT team and data users who will be affected by data migration, locate the points needing adaptation in the new system and define the minimum amount of data necessary for the new system to start running efficiently. In many cases, it’s rarely necessary to move the entire dataset at once but you can move data in sequences or archive some parts altogether.
Data can be migrated following the big bang approach (doing it in one go in a limited period of time) or the trickle one (migrating in phases while the source and target systems run side by side). When you need to move large datasets, the former approach can seem attractive as it promises to get over with migration in a short period of time. In reality, though, it is not a popular choice for most companies (except those really small ones) as it can cause massive downtime of mission-critical systems, while possible errors can incapacitate business for even longer.
So most companies migrate data in iterations in spite of the fact that it’s a much more complicated process requiring a well-thought-out design. It also requires more effort from data users, who have to switch between the two systems during the migration, and from data engineers, who have to monitor what data has been migrated and trigger migration updates in case of any data changes. On the bright side, this approach comes with no serious downtimes or operational interruptions if done right and facilitates spotting and solving any errors and issues early in the process.
Timeline and budget estimations will help you understand whether your project is feasible.
The project timeline is calculated based on a few variables. When you understand the amount of data to migrate, you can estimate the time needed to move that data considering the network bandwidth.
It’s important to understand whether data migration execution will cause any downtime, how to make it as painless as possible (for instance, by planning it for the afterhours or weekends to avoid business continuity interruption), and add this time to your project timeline.
Consult with your data migration team to estimate the time for data migration tool development or customization, data migration execution, testing, post-execution audit, and maintenance. Include a safety period for possible contingencies.
To draft your budget, include data migration solution costs, payments to an in-house or outsourcing team throughout the project stages, compliance validation, and possible downtime. Pay attention to the fact that the more data are to be migrated, the higher the project costs. For this reason, it doesn’t make sense to migrate all at once. Instead, review data to understand what can wait and what should be moved in the first place.
When you don’t plan using the two systems simultaneously after moving and want to fully migrate from your old environment, you should plan its retirement in order to save financial and human resources:
If you move petabytes of data of different formats and sensitivity levels, you will need a tool for automating data migration. To get an instrument tailored exactly to your needs, you will likely have to develop it from scratch. Make sure to include this development period to your project timeline. If you plan to use a platform-based solution, check whether it has the necessary features, like easy data mapping with drag-and-drop tools, extensive ETL capabilities, and workflow orchestration.
Decide who you’re going to bring to your project. Ideally, you should invite both business users and IT consultants to cooperate. Data users who engage in data-driven decision making and understand the structure and meaning of the source data should share their expertise with the IT team when it comes to data types prioritization for migration, advise what formatting can or can’t be changed, and what workflows are needed and which ones can be scrapped. At the same time, developers can explain the new system specifics and what data is needed to get it going.
If there’s a need for a custom data migration tool, you can cooperate with your in-house IT team if they have the necessary competencies and are available for the stretch of the project, or invite software development consultants to design and build this tool.
Once you’ve assessed the project scope, set up your data migration team, and put the necessary infrastructure in place, it’s time to prepare your data for transfer.
The data scope picked for migration should be pulled into a single repository and examined at a granular level for any conflicts, inconsistencies, duplicates, missing chunks, and quality issues. This stage is extremely labor-consuming so it’s usually covered by automation tools. All the discovered issues need to be resolved prior to migration.
You can’t foresee all possible problems that can result in data loss or corruption during migration execution. For this reason, create an additional protection layer and back up all your data, particularly those files you’re going to migrate. If anything goes wrong, you’ll be able to restore necessary data.
Find out who has the right to access, edit, and remove data, and map these roles and access levels to the new system, thus creating the structure for big data governance. This way, your team will know their roles and responsibilities during and after data migration, which will help avoid any misunderstanding, delays, and security gaps.
Prior to ETL operations, ETL developers and data users should come together for a crucial task—data mapping. It requires preparing detailed mapping rules that will match the fields of the source system to those of the target one.
Data mapping gets tricky when you need to decide whether migration should be metadata- or content-driven. Metadata usually describes the location of source data, like a file/field/column/table name, and characteristics of each location type, like character, numeric, or date. Content is stored within those fields and columns.
The metadata-driven approach assumes that content reflects its description, but it’s not always like that. The column described as ‘Email’ can contain a few emails but emails can also be stored in another column under a different name, like ‘Contact Information’. It leads to general assumptions and, by proxy, to mistakes when creating mapping rules and specifications. To avoid mapping errors and multiple iterations, it’s recommended to run content analysis at the stage of data audit and profiling.
At this stage, you need to extract data from the source system, ensure the data is of high quality, transform it into the right format, and finally load it to the target system. For this stage you will need ETL engineers and an ETL solution to automate the process. Let’s illustrate this stage with Itransition’s project where we developed a custom ETL system for big data migration.
We partnered with the largest distribution co-op in the US that delivered energy services to over 270 thousand accounts across the country. The company had an ever-growing database of technical and regulatory documentation stored in a legacy ECM system. At some point, the overflown database started to affect the system performance, which resulted in serious downtimes and low productivity. Itransition was chosen to help the company migrate its massive document repository to a more scalable ERP platform.
We participated in the two stages:
To prepare 24+ million files of multiple formats for migration, we built a robust ETL solution that enabled seamless and accurate export and indexing of millions of heterogeneous documents as well as data mapping, transformation and saving in suitable formats for the target system. We also implemented an efficient logging mechanism that recorded errors during the export, thus assuring data accuracy and consistency.
We were able to convert the database in a record time—10 times faster than it was specified in the project documentation, and didn’t disrupt any business processes during the export and conversion.
Actually, testing should start from the moment you begin manipulating data and continue throughout all further stages, be it design, execution, or post-migration audit. If you pursue a phased migration approach, you need to test each batch of migrated data to spot any issues early and ensure data quality. Once the errors are fixed, test again. When the migration is complete, verify the migrated data by running unit, system, full-volume and batch-application tests prior to going live.
When you test the migrated system in its entirety, you can expose a number of unexpected problems never seen during isolated tests. When everything is fixed and tested again and all stakeholders are satisfied with the results, it’s time to finally go live.
Though being the climax of data migration, going live is not a final destination in a data migration project. When the new system starts working, it’s important to validate the project results and monitor the system performance in the long run. Conduct a comprehensive audit of the system to make sure everything is correct, data quality is high, and no data is missing. Run regular and ad-hoc audits to make sure the system covers the data scope, data is still of high quality, there are no signs of data degradation, and the users are satisfied.
Considering big data and its business impacts, data will continue to be treated as an asset that companies, even small ones, will accumulate and carry with them when moving to different technological environments. With each move, they will need to make sure they transfer valuable data of high quality only. Here’s where a proven data migration strategy coupled with an experienced team and advanced tools makes companies feel they make a step forward rather than creating more challenges.
Review the anatomy of big data governance and learn why it’s an essential component of any big data strategy.
Itransition’s BI team presents the reactive data analytics strategy for uncertain times. Learn more.
Learn about the key benefits of EHR migration to the cloud and the steps to follow in order to get there with minimal pain.
Read how Itransition delivered data migration services for the client’s Windows-based email archive content migration tool.
Find out 5 predictions of the future of big data up to 2025 and its influence on consumers and businesses worldwide according to experts.
Learn how to bolster your investments by utilizing our four-step business intelligence planning process.
Understand the balance between gut feel and data in business through Itransition’s data driven decision making examples.