How to tackle data warehouse migration challengesAndika Pratama
We often hear that migration is an uphill battle because the migration strategy was not deliberately considered. Migrating to a modern data warehouse from a legacy environment can require a massive up-front investment in time and resources. There’s a lot to think about before and during the process, so your organisation has to take a strategic approach to streamline the process.
Here are some of the things to consider when migrating a data warehouse to the cloud:
- How to minimise any migration risks or security challenges?
- How much will it cost?
- How do we migrate our data to the target data warehouse?
- How quickly will we see an equal or better performance?
How to minimise any migration risks or security challenges?
It’s easy to consider an on-premises data warehouse secure because it’s on-site and you can manage its data protection. However, if scaling up an on-prem data warehouse is difficult, so is securing it as your business scales.
Google has built-in multiple features to secure BigQuery. For enterprise users, Cloud Identity and Access Management (Cloud IAM) is key to setting appropriate role-based user access to data. You can also take advantage of SQL’s security views within BigQuery and all BigQuery data is encrypted at rest and in transit. You can add the protection of customer-managed encryption keys to establish even stronger security measures. Using virtual private cloud (VPC) security controls can secure your migration path since it helps reduce data exfiltration risks.
How much will it cost?
The cost of a cloud data warehouse has a different structure from what you’re likely used to with a legacy data warehouse. An on-prem system like Teradata may depend on your IT team paying every three years for the hardware, then paying for licenses for users who need to access the system. Capacity increases come at an additional cost outside of that hardware budget.
With the cloud, you’ve got a lot more options for cost and scale. Instead of a fixed set of costs, you’re now working on a price-utility gradient, where if you want to get more out of your data warehouse, you can spend more to do so immediately or vice versa. While cloud data warehouses help reduce or eliminate capital and fixed costs, they are not all the same. You’ll find varying levels of simplicity and cost savings across vendors, so it’s important to check out the operational costs of each data warehouse in relation to its performance.
With a cloud data warehouse like BigQuery, the total cost of ownership (TCO) becomes an important metric for customers when they’ve migrated to BigQuery and Google Cloud’s flexibility makes it easy to optimise costs.
How to migrate all of the data to the target data warehouse?
This question encompasses both migrating your extract, transform, load (ETL) jobs and SAS/BI application workloads to the target data warehouse, as well as migrating all your queries, stored procedures and other extract, load, transform (ELT) jobs.
Actually getting all of a company’s data into the cloud can seem daunting at the outset of the migration journey. Most businesses have a lot of siloed data. That might be multiple data lakes set up over the years for various teams or systems acquired through acquisition that handle just one or two crucial applications. You may be moving data from an on-prem or cloud data warehouse to BigQuery and type systems or representations don’t match up.
One big step you can take to prepare for a successful migration is to do some workload and use case discovery. That might involve auditing which use cases exist today and whether those use cases are part of a bigger workload, as well as identifying which datasets, tables and schemas underpin each use case. Use cases will vary by industry and by job role. So, for example, a retail pricing analyst may want to analyse past product price changes to calculate future pricing. Use cases may include the need to ingest data from a transactional database, transforming data into a single time series per product, storing the results in a data warehouse table and more.
After the preparation and discovery phase, you should assess the current state of your legacy environment to plan for your migration. This includes cataloguing and prioritising your use cases, auditing data to decide what will be moved and what won’t and evaluating data formats across your organisation to decide what you’ll need to convert or rewrite. Once that’s decided, choose your ingest and pipeline methods. All of these tasks take both technology and people management and require some organisational consensus on what success will look like once the migration is complete.
How quickly will we see an equal or better performance?
Managing a legacy data warehouse isn’t usually synonymous with speed. Performance often comes at the cost of capacity, so users can’t do the analysis they need until other queries have finished running. Reporting and other analytics functions may take hours or days which is especially true for running large reports with a lot of data, like an end-of-quarter sales calculation. As the amount of data and number of users rapidly grows, performance begins to meltdown and organisations often face disruptive outages.
However, with a modern cloud data warehouse like BigQuery, compute and storage are decoupled, so you can scale immediately without facing capital infrastructure constraints. BigQuery helps you modernise because it uses a familiar SQL interface, so users can run queries in seconds and share insights right away.
Moving to the cloud may seem daunting, especially when you’re migrating an entrenched legacy system. However, it brings the benefits of adopting technology that lets the business grow, rather than simply adopting a tool. It’s likely you’ve already seen that the business demand exists. Now it’s time to stop standing in the way of that demand and instead make way for growth.