With reports hitting the transactional databases, we were encountering performance issues. We needed to choose a path forward that would lighten the load on the prod database servers and be supportable and modern.
Options We Explored
Here’s a quick description of the options we explored:
- Azure Data Factory – Orchestrates and automates data movement and transformation. You can create workflows, pipelines, and ETL (Extract, Transform, Load) processes using it.
- Databricks – A unified data science, engineering, and analytics platform based on Apache Spark. It simplifies data exploration, preparation, and machine learning workflows, allowing teams to collaborate efficiently. Interactive notebooks make Databricks a versatile tool for scalable data analysis and processing.
- Synapse – Integration of big data and data warehousing in the cloud. It facilitates collaborative analytics and AI-driven insights using serverless and provisioned resources across various data sources. Integrated analytics, warehousing, and data integration are part of Synapse’s unified experience.
- Fabric – An all-in-one analytics solution for enterprises that offers data movement, data lakes, data engineering, data integration, data science, and real-time analytics.
These were my then-current pros/cons. For option 1, we combined ADF with Databricks.
|Pros||1. Terraform/git integration|
2. Flexibility of options
|1. Terraform/git integration|
2. Fully integrated platform
|1. Power BI is already part of|
|Cons||1. Can be complicated|
2. Hard to estimate costs
|1. Can be complicated |
2. Must use integrated platform
3. Hard to estimate costs
4. May get merged into Fabric?
|1. No calculator for costs|
2. Still in preview (at the time)
3. TF/git integration?
For all three options, we assumed:
- Similar costs to build/maintain. Approximately $1000-5000/month/environment.
- Similar time to implement. We weren’t sure of the exact time to build, but thankfully, we were hiring consultants to assist with this. They estimated 150 hours to build out the concept for one Power BI report that accessed three tables.
- Similar time to maintain. The exact time is still to be determined, but I did a rough estimate of 5-10 hours/month, maybe more if we need new development.
We considered using Synapse, but it might cost more in the end, and we wondered if it would soon need to be integrated into Fabric. Alternatively, we briefly considered Fabric but decided it was too soon for us to jump on that bandwagon.
We chose Option 1 – ADF/Databricks because:
- Allowed us the most flexibility instead of using a unified platform
- Pros outweighed cons
- Similar costs and implementation/maintenance timeframes
- ADF/Databricks are both mature options
I’ll post soon about the next parts of setting up your data warehouse using ADF/Databricks.