Monitoring is important, so I’ve covered the topic a few times in the past. I’ve talked about collecting your Spark application logs and Spark metrics. These are a good way to track what is happening and what is going wrong as your code runs. In the video related to this post I focus on a… Continue Reading
Azure Databricks with Log Analytics – Updated for DBR 11.3+
This is an updated video and writeup on setting up and using Log Analytics with your Azure Databricks logs. Some of the content overlaps with what I shared in the past, but these instructions are valid for Databricks Runtimes 11.3+. Log Analytics provides a way to collect and query logs in Azure. For teams that… Continue Reading
Incremental Data Loading with Azure Databricks
My talk for PASS Summit 2023 is about how to load data incrementally, such as from Change Data Feed or streaming a log of events. Below are some additional thoughts and links to resources for easy reference. Presentation description: There has been an increasing push to load data incrementally throughout the day or even within… Continue Reading
Databricks CI/CD: Intro to Asset Bundles (DABs)
Databricks Asset Bundles provides a way to version and deploy Databricks assets – notebooks, workflows, Delta Live Tables pipelines, etc. This is a great option to let data teams setup CI/CD (Continuous Integration / Continuous Deployment). Some of the common approaches in the past have been Terraform, REST API, Databricks command line interface (CLI), or… Continue Reading
Azure Data Platform Overview slides
I had the privilege to present for Creating Coding Careers, a great organization in the San Diego area that helps people get established in tech careers via apprenticeships and other programs. Above are the slides used in that presentation. Recommended Resources to learn Azure Data Platform Databricks Training https://www.databricks.com/learn Microsoft Learn Training https://learn.microsoft.com/en-us/training/paths/data-engineer-azure-databricks/ https://learn.microsoft.com/en-us/training/paths/get-started-data-engineering/ https://learn.microsoft.com/en-us/training/paths/get-started-fabric/… Continue Reading
Data + AI Summit 2023 – Data Engineer key takeaways
Data + AI Summit 2023 has just completed with many announcements and deep dives. I attended virtually this year but was just as excited as the in-person attendees for some of the new capabilities that were shared. After watching the keynote presentations and tracking additional posts about new features, I want to summarize the top… Continue Reading
Apache Spark DataKickstart: Read and Write with PySpark
Every Spark pipeline involves reading data from a data source or table. For data engineers we usually end the pipelines by writing the transformed data. In this tutorial we walk through some of the most common format and cloud storage locations for reading and writing with Spark. We’ll save some of the advanced Delta Lake… Continue Reading
Apache Spark DataKickstart: First Spark SQL Application
Get hands on with Spark SQL (no Python or Scala) to build your first data pipeline. In this video I walk you through how to read, transform, and write the NYC Taxi dataset with Spark SQL. This dataset can be found on Databricks, Azure Synapse, or downloaded from the web to wherever you run Apache… Continue Reading
Apache Spark DataKickstart: First PySpark Application
Get hands on with Python and PySpark to build your first data pipeline. In this video I walk you through how to read, transform, and write the NYC Taxi dataset which can be found on Databricks, Azure Synapse, or downloaded from the web to wherever you run Apache Spark. Once you have watched and followed… Continue Reading
Apache Spark DataKickstart – Introduction
In this video I provide introduction to Apache Spark as part of my YouTube course Apache Spark DataKickstart. This video covers why Spark is popular, what it really is, and a bit about ways to run Apache Spark. Please check out other videos in this series by selecting the relevant playlist or subscribe and turn… Continue Reading