Spark Monitoring video series

In this series I share about monitoring Apache Spark with Azure Databricks. Most of the content is relevant even if using open source Apache Spark or any other managed Spark service. I will be adding to this playlist and would love suggestions on what questions you still have about monitoring your Apache Spark workloads.


Apache Spark with Python

Azure Synapse Spark with Python

In this video, I share with you about Apache Spark using the Python language, often referred to as PySpark. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about PySpark or just want to get… Continue Reading


Apache Spark with Scala

Azure Synapse Spark with Scala

In this video, I share with you about Apache Spark using the Scala language. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at… Continue Reading


Why Apache Kafka?

As a data engineer, you should not be trying to convince your colleagues that everything can be a scheduled batch job. It's time to learn how to building streaming data pipelines. For many data engineers, Apache Kafka is the go to platform for enabling real-time data pipelines. Let's quickly cover why and how to get started.


Data Lake Introduction

Hearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.


Apache Spark Introduction

This video we will quickly cover Apache Spark.  The goal is to cover why use Spark and where it fits in the data ecosystem.  If you want to just get hands on with Spark, check out one of my next videos on Spark and Databricks. Watch the video to get my overview of Spark and… Continue Reading


Data Pipelines: ETL Tool vs Custom Code

I hear questions quite frequently about what options are best for data pipelines? Should we write code using Pandas or Spark? Should we use AWS Glue or Azure Data Factory? Or maybe SSIS? Where do Airflow and Luigi fit? I plan to dive into these technologies and provide more clarity into the options we have… Continue Reading


Big Data Kickstart

Managing big data is critical for many organizations. Analytics can improve products and inform critical business decisions. Using data can provide distinct advantages, and it’s likely that an organization’s competitors are already leveraging their data.