Claude Code Essentials for Data Professionals

I believe AI coding is a big part of the future for data professionals—including data engineering, data science, and analytics engineering. This means that adopting AI for development will be critical for career success. Since the Cursor article and video, I’ve been digging into the AI coding space more and using Claude Code as well,… Continue Reading


Cursor with Databricks: AI Enhanced Development

The tech industry has evolved rapidly and AI coding tools are changing how we develop. For Databricks developers, tools like Cursor IDE offer significant productivity gains when used correctly. The difference between frustration and success comes down to providing the proper context. In this article and video, I explain recommendations to use Cursor with Databricks.… Continue Reading


Essential Best Practices for Data Engineers on Databricks

Data engineers and scientists should apply software development best practices to enhance their processes, particularly on Databricks, which offers valuable integrations. Key focuses include version control, automated testing, and a structured development lifecycle. By adopting these practices, teams can improve quality and reliability in data projects while facilitating faster feature delivery.


Databricks CI/CD: Intro to Asset Bundles (DABs)

Databricks Asset Bundles provides a way to version and deploy Databricks assets – notebooks, workflows, Delta Live Tables pipelines, etc. This is a great option to let data teams setup CI/CD (Continuous Integration / Continuous Deployment). Some of the common approaches in the past have been Terraform, REST API, Databricks command line interface (CLI), or… Continue Reading


Data Engineer Question and Answer

An aspiring data engineer recently reached out to me for some guidance on pivoting into the field from a software development background. The questions they asked are similar to what others have asked me in the past, so I decided to capture my responses here. I link to prior posts and other resources when possible… Continue Reading


Ingest tables in parallel with an Apache Spark notebook using multithreading

If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big, it is quicker to have Spark load tables concurrently (in parallel) using threads. There are some different options of how to do this, but I am sharing the easiest way I have found when working with a notebook in Databricks, Azure Synapse Spark, Jupyter, or Zeppelin.