In this post I introduce some of the core capabilities of Azure Synapse Analytics and when they are used. I present from the perspective of data engineer but it should be easy to translate what is most useful for analysts and data scientists also. Please continue reading for a quick walkthrough of the capabilities and… Continue Reading
Azure Synapse CI/CD
For production uses of Azure Synapse there are benefits to implementing Continuous Integration (CI) and Continuous Deployment (CD). Implementing CI/CD includes the need to deploy the Azure infrastructure in an automated way. In this post, I share things I learned that may be helpful for you. I also have a few links to other content that was helpful for me to get an environment setup.
Azure Synapse Spark: External Python Packages
When working with an Apache Spark environment you may need to install external libraries or custom packages. In this post I share the steps for installing Python packages to Azure Synapse serverless Apache Spark pools. For Python code the libraries are packages as wheel (.whl) files. You can also install Python packages that are available… Continue Reading
Azure Synapse Spark: Add Scala/Java Libraries
When working with an Apache Spark environment you may need to install third party libraries or custom packages. In this post I share the steps for installing Java or Scala libraries to Azure Synapse serverless Apache Spark pools. For Java or Scala code the libraries are packaged as JAR files that you add to the… Continue Reading
Intro to Azure Stream Analytics
Real-time data processing is becoming more common in companies of all sizes. The use cases range from simple stream ingestion to complex machine learning pipelines. If you need to get started with streaming in Azure, Stream Analytics gives you a simple way to get up and running. Most of my streaming projects involve Apache Kafka and Spark which can take a lot of setup (or at least involving additional vendors to simplify the experience). Those technologies are great especially for challenging streaming pipelines, but if your data platform is within Azure you should consider if Stream Analytics will meet your needs.
Learn Python – Resource List
I get asked about getting started with Python a lot since it's the language I recommend for someone wanting to break into data engineering (unless they already know Scala or Java since those are heavily used also). In this post I share some Python resources that I think will help you learn, whether you are brand new to development or a seasoned developer who just wants to pick it up as an additional language.
Stream Processing Frameworks – User group discussion
I recently led a discussion on stream processing frameworks at my user group Data Engineering San Diego. Check out the video if you are interested in a high-level overview of some of the frameworks used by data engineers. I didn’t heavily research the frameworks so if you have more to add on a particular one… Continue Reading
Querying Log Analytics using KQL
Intro Let’s walk through the fundamentals of using Kusto Query Language (KQL) to query your logs in Azure Log Analytics. Check out the video to see it in action and keep reading for more code examples and written steps to run queries. This covers a few basics as well as a complex query used to… Continue Reading
Monitoring Azure Databricks with Log Analytics
Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Databricks log4j output and the Spark metrics. I include written instructions and troubleshooting… Continue Reading
Spark Monitoring video series
In this series I share about monitoring Apache Spark with Azure Databricks. Most of the content is relevant even if using open source Apache Spark or any other managed Spark service. I will be adding to this playlist and would love suggestions on what questions you still have about monitoring your Apache Spark workloads.