Essential Best Practices for Data Engineers on Databricks

Data engineers and scientists should apply software development best practices to enhance their processes, particularly on Databricks, which offers valuable integrations. Key focuses include version control, automated testing, and a structured development lifecycle. By adopting these practices, teams can improve quality and reliability in data projects while facilitating faster feature delivery.


PASS 2024 – Databricks Resources for DevX and CICD

Slides PASS 2024 – Best Practices for Development on Azure Databricks from Dustin Vannoy Example Code Repository https://github.com/datakickstart/flights-e2e-azure/tree/pass-summit-2024 Resource links youtube.com/DustinVannoy – CICD Playlist Develop and Deploy Code Easily With IDEs How to Get the Most Out of Databricks Notebooks Databricks Asset Bundles: A Unifying Tool for Deployment on Databricks  Best Practices for Unit Testing… Continue Reading


Databricks Asset Bundles: Advanced Examples

This post and video is covering some specific examples people have brought up when defining their Databricks Asset Bundles. The video includes a bit of review, but for more introduction please see my first post on Databricks Asset Bundles. The github repository I use will probably be first to update with new examples, however I… Continue Reading


Databricks CI/CD: Intro to Asset Bundles (DABs)

Databricks Asset Bundles provides a way to version and deploy Databricks assets – notebooks, workflows, Delta Live Tables pipelines, etc. This is a great option to let data teams setup CI/CD (Continuous Integration / Continuous Deployment). Some of the common approaches in the past have been Terraform, REST API, Databricks command line interface (CLI), or… Continue Reading