databricks | DUSTIN VANNOY

Claude Code Essentials for Data Professionals

By Dustin Vannoy Jan 8, 2026 / Leave a comment

I believe AI coding is a big part of the future for data professionals—including data engineering, data science, and analytics engineering. This means that adopting AI for development will be critical for career success. Since the Cursor article and video, I’ve been digging into the AI coding space more and using Claude Code as well,… Continue Reading

Cursor with Databricks: AI Enhanced Development

By Dustin Vannoy Sep 29, 2025 / 1 Comment

The tech industry has evolved rapidly and AI coding tools are changing how we develop. For Databricks developers, tools like Cursor IDE offer significant productivity gains when used correctly. The difference between frustration and success comes down to providing the proper context. In this article and video, I explain recommendations to use Cursor with Databricks.… Continue Reading

Essential Best Practices for Data Engineers on Databricks

By Dustin Vannoy Jan 5, 2025 / Leave a comment

Data engineers and scientists should apply software development best practices to enhance their processes, particularly on Databricks, which offers valuable integrations. Key focuses include version control, automated testing, and a structured development lifecycle. By adopting these practices, teams can improve quality and reliability in data projects while facilitating faster feature delivery.

Databricks Asset Bundles: Advanced Examples

By Dustin Vannoy Jun 25, 2024 / 2 Comments

This post and video is covering some specific examples people have brought up when defining their Databricks Asset Bundles. The video includes a bit of review, but for more introduction please see my first post on Databricks Asset Bundles. The github repository I use will probably be first to update with new examples, however I… Continue Reading

Databricks Monitoring with System Tables

By Dustin Vannoy Feb 22, 2024 / 1 Comment

Monitoring is important, so I’ve covered the topic a few times in the past. I’ve talked about collecting your Spark application logs and Spark metrics. These are a good way to track what is happening and what is going wrong as your code runs. In the video related to this post I focus on a… Continue Reading

Azure Databricks with Log Analytics – Updated for DBR 11.3+

By Dustin Vannoy Jan 7, 2024 / 1 Comment

This is an updated video and writeup on setting up and using Log Analytics with your Azure Databricks logs. Some of the content overlaps with what I shared in the past, but these instructions are valid for Databricks Runtimes 11.3+. Log Analytics provides a way to collect and query logs in Azure. For teams that… Continue Reading

Incremental Data Loading with Azure Databricks

By Dustin Vannoy Nov 15, 2023 / Leave a comment

My talk for PASS Summit 2023 is about how to load data incrementally, such as from Change Data Feed or streaming a log of events. Below are some additional thoughts and links to resources for easy reference. Presentation description: There has been an increasing push to load data incrementally throughout the day or even within… Continue Reading

Databricks CI/CD: Intro to Asset Bundles (DABs)

By Dustin Vannoy Oct 3, 2023 / 2 Comments

Databricks Asset Bundles provides a way to version and deploy Databricks assets – notebooks, workflows, Delta Live Tables pipelines, etc. This is a great option to let data teams setup CI/CD (Continuous Integration / Continuous Deployment). Some of the common approaches in the past have been Terraform, REST API, Databricks command line interface (CLI), or… Continue Reading

Data + AI Summit 2023 – Data Engineer key takeaways

By Dustin Vannoy Jun 30, 2023 / Leave a comment

Data + AI Summit 2023 has just completed with many announcements and deep dives. I attended virtually this year but was just as excited as the in-person attendees for some of the new capabilities that were shared. After watching the keynote presentations and tracking additional posts about new features, I want to summarize the top… Continue Reading

Apache Spark DataKickstart: Read and Write with PySpark

By Dustin Vannoy Jun 21, 2023 / 1 Comment

Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake… Continue Reading

Tag: databricks

Stay informed