Data engineers and scientists should apply software development best practices to enhance their processes, particularly on Databricks, which offers valuable integrations. Key focuses include version control, automated testing, and a structured development lifecycle. By adopting these practices, teams can improve quality and reliability in data projects while facilitating faster feature delivery.
PASS 2024 – Databricks Resources for DevX and CICD
Slides PASS 2024 – Best Practices for Development on Azure Databricks from Dustin Vannoy Example Code Repository https://github.com/datakickstart/flights-e2e-azure/tree/pass-summit-2024 Resource links youtube.com/DustinVannoy – CICD Playlist Develop and Deploy Code Easily With IDEs How to Get the Most Out of Databricks Notebooks Databricks Asset Bundles: A Unifying Tool for Deployment on Databricks Best Practices for Unit Testing… Continue Reading
Databricks Asset Bundles: Advanced Examples
This post and video is covering some specific examples people have brought up when defining their Databricks Asset Bundles. The video includes a bit of review, but for more introduction please see my first post on Databricks Asset Bundles. The github repository I use will probably be first to update with new examples, however I… Continue Reading
Databricks CI/CD: Intro to Asset Bundles (DABs)
Databricks Asset Bundles provides a way to version and deploy Databricks assets – notebooks, workflows, Delta Live Tables pipelines, etc. This is a great option to let data teams setup CI/CD (Continuous Integration / Continuous Deployment). Some of the common approaches in the past have been Terraform, REST API, Databricks command line interface (CLI), or… Continue Reading
