Slides from my PASS Summit presentation: https://www.slideshare.net/DustinVannoy/passsummit2019azurestorageoptionsforanalytics
I am pleased to share with you a new, improved way of developing for Azure Databricks from your IDE – Databricks Connect! Databricks Connect is a client library to run large scale Spark jobs on your Databricks cluster from anywhere you can import the library (Python, R, Scala, Java). It allows you to develop from your computer with your normal IDE features like auto complete, linting, and debugging. You can work in an IDE you are familiar with but have the Spark actions send out to the cluster, with no need to install Spark locally.
This video we will quickly cover Apache Spark. The goal is to cover why use Spark and where it fits in the data ecosystem. If you want to just get hands on with Spark, check out one of my next videos on Spark and Databricks. Watch the video to get my overview of Spark and… Continue Reading
I hear questions quite frequently about what options are best for data pipelines? Should we write code using Pandas or Spark? Should we use AWS Glue or Azure Data Factory? Or maybe SSIS? Where do Airflow and Luigi fit? I plan to dive into these technologies and provide more clarity into the options we have… Continue Reading
Managing big data is critical for many organizations. Analytics can improve products and inform critical business decisions. Using data can provide distinct advantages, and it’s likely that an organization’s competitors are already leveraging their data.