Data Lake Introduction

Hearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.


Apache Spark Introduction

This video we will quickly cover Apache Spark.  The goal is to cover why use Spark and where it fits in the data ecosystem.  If you want to just get hands on with Spark, check out one of my next videos on Spark and Databricks. Watch the video to get my overview of Spark and… Continue Reading


Data Pipelines: ETL Tool vs Custom Code

I hear questions quite frequently about what options are best for data pipelines? Should we write code using Pandas or Spark? Should we use AWS Glue or Azure Data Factory? Or maybe SSIS? Where do Airflow and Luigi fit? I plan to dive into these technologies and provide more clarity into the options we have… Continue Reading


Big Data Kickstart

Managing big data is critical for many organizations. Analytics can improve products and inform critical business decisions. Using data can provide distinct advantages, and it’s likely that an organization’s competitors are already leveraging their data.