Azure | DUSTIN VANNOY

Azure Synapse Spark: Add Scala/Java Libraries

By Dustin Vannoy Jan 5, 2022 / Leave a comment

When working with an Apache Spark environment you may need to install third party libraries or custom packages. In this post I share the steps for installing Java or Scala libraries to Azure Synapse serverless Apache Spark pools. For Java or Scala code the libraries are packaged as JAR files that you add to the… Continue Reading

Intro to Azure Stream Analytics

By Dustin Vannoy Nov 16, 2021 / 2 Comments

Real-time data processing is becoming more common in companies of all sizes. The use cases range from simple stream ingestion to complex machine learning pipelines. If you need to get started with streaming in Azure, Stream Analytics gives you a simple way to get up and running. Most of my streaming projects involve Apache Kafka and Spark which can take a lot of setup (or at least involving additional vendors to simplify the experience). Those technologies are great especially for challenging streaming pipelines, but if your data platform is within Azure you should consider if Stream Analytics will meet your needs.

Querying Log Analytics using KQL

By Dustin Vannoy Sep 14, 2021 / Leave a comment

Intro Let’s walk through the fundamentals of using Kusto Query Language (KQL) to query your logs in Azure Log Analytics. Check out the video to see it in action and keep reading for more code examples and written steps to run queries. This covers a few basics as well as a complex query used to… Continue Reading

Monitoring Azure Databricks with Log Analytics

By Dustin Vannoy Aug 9, 2021 / 2 Comments

Original video Updated Video Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Databricks log4j output and the Spark metrics. I include… Continue Reading

Azure Synapse Spark with Python

By Dustin Vannoy Feb 17, 2021 / 1 Comment

In this video, I share with you about Apache Spark using the Python language, often referred to as PySpark. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about PySpark or just want to get… Continue Reading

Azure Synapse Spark with Scala

By Dustin Vannoy Feb 3, 2021 / 1 Comment

In this video, I share with you about Apache Spark using the Scala language. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at… Continue Reading

Azure Synapse Spark .NET (C#)

By Dustin Vannoy Jan 27, 2021 / 2 Comments

Spark .NET is the C# API for Apache Spark - a popular platform for big data processing. This demo is for you if you are curious to see a sample Spark .NET program in action or are interested in seeing Azure Synapse serverless Apache Spark notebooks. This demo includes guidance of how you can follow along to build a Spark .NET data load that reads linked sample data, transforms data, joins to a lookup table, and saves as a Delta Lake file to your Azure Data Lake Storage Gen2 account.

Data Lake Introduction

By Dustin Vannoy Mar 5, 2020 / Leave a comment

Hearing a lot of mention of Data Lakes but still not sure what that means or why anyone cares? This video will cover a brief introduction to what a Data Lake is and why so many organizations are adding them to their analytics ecosystem. To show what interacting with a data lake may look like for a typical data analyst, I included a demo of how you would use Spark SQL to query the data lake from Azure Databricks.

Create Service Principal in Azure Portal

By Dustin Vannoy Feb 28, 2020 / Leave a comment

If you are working with Azure Databricks (or many other Azure resources), you may come across the need for a Service Principal in order to configure access to different resources. The steps are fairly straight forward but the terminology is not consistent so this video will walk through the steps and describe where to find the values to use when you authenticate.

Create Azure Databricks Cluster from the Portal

By Dustin Vannoy Feb 21, 2020 / 3 Comments

When getting started with Azure Databricks for data processing and analytics, you need to create at least one cluster to get started. Check out the video for a quick overview of how to do this from the Azure Portal. I include a quick description of the options you have and an overview of what cluster… Continue Reading

Tag: Azure

Stay informed