BI consulting for a CX company

BI consulting for a CX company

Itransition provided consulting services for a global CX company, helping them optimize their Power BI solution and data model and save costs.

Table of contents

Context

5CA is a remote-first global CX company with headquarters in the Netherlands. Founded in 1998, they provide tech solutions and actionable data insights for the gaming and entertainment industries. The company’s global reach spans 95 countries and its fan-made agent team provides services in 23+ languages and has 1.5M monthly interactions.

5CA uses Power BI to generate reports for their clients. They collect large data slices from their clients' heterogeneous sources, including helpdesk tickets, emails, chats, course enrollments, evaluations, and satisfaction surveys. Using this data, 5CA data scientists generate complex insights to evaluate the quality of provided support activities and help drive customer satisfaction, identify clients’ pain points, and improve CX.

Initially, the customer’s entire data warehouse environment (DWH) was hosted on-premises and their ETL setup consisted of a DWH, OLAP cubes, and Power BI. Considering the constantly growing number of clients, the on-premises infrastructure was getting more expensive and harder to maintain. To speed up client onboarding and improve cost efficiency, they decided to migrate to Microsoft Azure and leverage Azure Data Lake integrated with Azure Databricks.

The customer’s data engineers built an ELT data lake based on the Delta Lake architecture, orchestrated data ingestion via the Azure Data Factory service, and transformed data using Azure Databricks. The Power BI team, in turn, aggregated data from Azure Databricks. After migrating to the cloud, the amount of data contained in 5CA’s data cubes required higher Power BI Premium P3-P4 subscription plans instead of the originally used P1. They were also unable to upload their dataset to Power BI to the full extent as the volume of data caused system timeouts, freezes, and crashes. Apart from that, they had issues with the time-consuming incremental data load process which took several hours instead one hour required by SLA.

5CA had a large in-house data team working on this migration initiative. Still, they needed advice from a data engineering consultant who could help them solve the encountered issues and optimize their Power BI solution and data model. They were also limited in time as they needed to migrate the data related to several of their clients within the upcoming months.

After thorough vendor due diligence, 5CA decided to collaborate with Itransition considering our broad expertise in implementing data management solutions, numerous certified Power BI specialists on board, and the ability to jumpstart projects of any scale.

Solution

Knowledge transfer and initial review

Considering how time-sensitive and business-critical the project was, we assigned Itransition’s head of BI practice to conduct the consulting activities. The customer involved their technical experts, including the CTO/DWH architect and the Power BI team lead, who took part in initial catch-ups and then spent a couple of hours per day on discussions with our expert.

These were the project’s goals:

  • Conduct an independent review of the existing technical implementation of the customer’s BI solution
  • Provide solutions to the following issues:
    • fit the Power BI data model within the current Premium P1 license scope, so that the client wouldn’t need to upgrade to more expensive Power BI subscription plans
    • enable faster incremental data loads to multiple tables with 30-60 minutes refresh frequency to meet SLA

As part of the knowledge transfer process, the customer’s BI team described their ELT process and data sources, demoed one of their data cubes among which 5CA’s data volume was divided, shared cubes documentation, provided Power BI report examples, and explained their usage pattern. The total size of the dataset was over 21Gb, with full data load and incremental refresh taking about 5 and 1.5-2 hours, respectively. The number of unique users of the customer’s Power BI solution amounted to 200+ per month and the data was expected to grow around 25% per year.

During the initial analysis, Itransition got an understanding of the system’s general structure and its components and integrations. We also put forward a hypothesis about the cause of issues. Considering 5CA’s tight timelines, we need approximately a month to assess the system and come up with solutions to the encountered issues.

Solution assessment

To investigate the solution’s current state, the customer provided us with access to Azure Databricks and Power BI service. Using a set of code analysis tools, we evaluated the following assets:

  • Power BI data model
  • Tables structure on the Azure Databricks gold layer (final presentation and consumption-ready layer)
  • Databricks service delivery platform (SDP) where data transformation occurs, as well as data transformation notebooks, which enable data transformation
  • Project documentation in the customer’s Azure DevOps wiki

Itransition connected to 5CA’s dataset, inspected the data ingestion process, ran the analysis tools, and manually reviewed the application’s source code. We evaluated Power BI technical quality, estimated solution extensibility, and reviewed the data model, data storage, and technical artifacts provided by the customer’s data team. We also assessed the quality of the development process and established deployment environments.

Working in close collaboration with 5CA’s technical experts, Itransition presented the first evaluation results in just a week. The solution proved well in terms of maintainability, technology choice, security, and scalability, but its reliability and performance needed improvement. The issues were mainly caused by the large size of the data model:

  • Insufficient RAM for the incremental refresh on 5CA’s current Power BI Premium P1 capacity
  • Poor data refresh performance in Power BI, requiring immediate changes to the data model
  • The Databricks gold layer performance could be improved with data partitioning oriented toward data consumption, e.g. organizing data by dates, subject, etc., rather than data ingestion to eliminate the need to process a large array of unstructured data

We strongly recommended addressing the most challenging technical problems and grouped all the issues into the following categories:

  • Critical issues which we recommended resolving before running the system in production
  • Non-critical issues that didn’t have to be addressed immediately, but we still advised considering them when implementing new functionality or further system refactoring
  • Other issues that could be added to the technical debt until there is enough dev time for resolving them

Recommendations

Itransition put together comprehensive documentation with a list of recommendations for the identified issues to help the customer decrease the size of the Power BI data model and improve data refresh time. We also advised 5CA on the optimal team composition to implement these recommendations.

Resolution of critical issues 

  • Removing large columns in the customer’s dataset that were not used for reporting to free up database space. We highlighted such columns, dividing them according to tables and arranging them by size.
  • Specifying Hash or Value encoding available in the tabular model on a case-by-case basis, as they use different compression algorithms which could affect the model size and performance. We provided instructions for using encoding hints and suggested columns for encoding change.
  • Improving incremental refresh settings:
    • Changing the default incremental refresh period set for all entities from 30 days to 1 day for suggested entities, which would not update retrospectively.
    • Rounding the numbers in the incremental refresh column to reduce the amount of cardinal (unique) values, which improves Power BI performance but doesn’t impact the correctness of values.
  • Making the columns with time and date for different time zones standalone tables. This can help reduce the dataset space required while keeping the existing functionality.

Resolution of non-critical issues 

  • Disabling MDX, a query language used via Power BI Service for searching in columns when connecting Excel files to data cubes, in defined columns. This would significantly free up column size without creating inconveniences for the customer. We suggested that the list of columns should be disabled for MDX queries.
  • Changing partitioning on large tables with facts in line with Power BI partitioning (based on the date first) to improve refresh time.
  • Splitting large single values in duration columns into several parts (hours, minutes, seconds) because numeric columns with high cardinality take up a lot of space. We also prepared a list of such columns.
  • Improving work with tags:
    • Keeping only frequently used tags in the Tag table. Replacing single ticket tags with search by ticket IDs or applying categories instead of exact values to them. This would significantly decrease the table size and improve report performance and refresh time.
    • Avoiding the import of the large tag table, used only for specific filters, to Power BI and accessing it via Direct Query instead. This would free up space in the database without affecting data refresh.
  • Clarifying and implementing security requirements. We applied Row-Level security to the dataset to restrict access to clients’ data before reports are distributed to end-users.

Resolution of other issues 

  • Removing large columns with feedback messages containing long HTML strings, which looked unreasonable in Power BI reports, or adding a separate table for each comment and using Direct Query to access such tables. We compiled the list of such columns to be removed or moved to Direct Query.
  • Applying unique IDs to some key columns in Delta tables instead of auto-generating them to eliminate extremely large values from small tables and free up a lot of space in the Power BI model.
  • Separating the single large model into several smaller ones to decrease its size and improve refresh time, because some facts were less dependent on each other and might require a different refresh schedule. We identified several candidates for a separate model but pointed out that it could result in additional maintenance overheads.

Training

Upon audit completion, Itransition provided several training sessions to the customer’s Power BI team. We explained how to optimize Power BI’s performance and cost-efficiency considering the issues the customer encountered previously. During the training, we also described the third-party tools we used to perform system error checks. We demonstrated to 5CA’s Power BI team how to get dataset metrics, set and change certain system parameters, fix errors and avoid them in the future, and finally shared some useful tips.

Results

Following Itransition’s instructions, 5CA managed to reduce time on incremental data loads by approximately 5 times, making up less than 30 minutes and thus meeting their SLA requirements. Power BI optimization also helped them speed up the full data load by eliminating system timeouts.

Moreover, Itransition helped the customer considerably reduce the volume of data in their data model uploaded to Power BI without sacrificing the analytics’ quality. The customer also fit the data model within the scope of the existing Power BI Premium P1 subscription with a large margin, getting three-fold cost savings and being able to further grow their client base without the need to upgrade.