Data Kickstart | DUSTIN VANNOY

Spark Monitoring video series

By Dustin Vannoy Jul 26, 2021 / 1 Comment

In this series I share about monitoring Apache Spark with Azure Databricks. Most of the content is relevant even if using open source Apache Spark or any other managed Spark service. I will be adding to this playlist and would love suggestions on what questions you still have about monitoring your Apache Spark workloads.

Azure Synapse Spark with Python

By Dustin Vannoy Feb 17, 2021 / 1 Comment

In this video, I share with you about Apache Spark using the Python language, often referred to as PySpark. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about PySpark or just want to get… Continue Reading

Azure Synapse Spark with Scala

By Dustin Vannoy Feb 3, 2021 / 1 Comment

In this video, I share with you about Apache Spark using the Scala language. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at… Continue Reading

Why Apache Kafka?

By Dustin Vannoy Nov 10, 2020 / Leave a comment

As a data engineer, you should not be trying to convince your colleagues that everything can be a scheduled batch job. It's time to learn how to building streaming data pipelines. For many data engineers, Apache Kafka is the go to platform for enabling real-time data pipelines. Let's quickly cover why and how to get started.

Top Traits of a Data Engineer

By Dustin Vannoy Jul 21, 2020 / 1 Comment

Data engineer roles vary but some core traits stand out for any data engineer. If you missed it, check out my first posts in this series on What is a Data Engineer? and Data Engineer Skills for Success. Let's finish off this series with the traits I see as most critical for success as a data engineer.

Data Engineer Skills for Success

By Dustin Vannoy May 20, 2020 / 3 Comments

Data engineers job descriptions vary significantly as they are asked to work on many different projects. Yet, there are categories of skills that are consistently desired in a data engineer and serve as a foundation for learning new technologies. Here are the skills I see as most critical for success as a data engineer.

Data Lake Introduction

By Dustin Vannoy Mar 5, 2020 / Leave a comment

Hearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.

Apache Spark Introduction

By Dustin Vannoy Aug 12, 2019 / Leave a comment

This video we will quickly cover Apache Spark. The goal is to cover why use Spark and where it fits in the data ecosystem. If you want to just get hands on with Spark, check out one of my next videos on Spark and Databricks. Watch the video to get my overview of Spark and… Continue Reading

Data Pipelines: ETL Tool vs Custom Code

By Dustin Vannoy Jul 9, 2019 / 4 Comments

I hear questions quite frequently about what options are best for data pipelines? Should we write code using Pandas or Spark? Should we use AWS Glue or Azure Data Factory? Or maybe SSIS? Where do Airflow and Luigi fit? I plan to dive into these technologies and provide more clarity into the options we have… Continue Reading

Big Data Kickstart

By Dustin Vannoy Jul 5, 2019 / Leave a comment

Managing big data is critical for many organizations. Analytics can improve products and inform critical business decisions. Using data can provide distinct advantages, and it’s likely that an organization’s competitors are already leveraging their data.

Category: Data Kickstart

Stay informed