If you choose to use Snowflake along with Azure for your data platform, you will have to make choices on how to load the data. Landing processed data into your data lake on Azure Data Lake Storage Gen2 (ADLS) is the first step that I recommend in most environments. I like this pattern because then… Continue Reading
Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Synapse Spark log4j output. I include written instructions and troubleshooting guidance in this… Continue Reading
If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big, it is quicker to have Spark load tables concurrently (in parallel) using threads. There are some different options of how to do this, but I am sharing the easiest way I have found when working with a notebook in Databricks, Azure Synapse Spark, Jupyter, or Zeppelin.
In this post I introduce some of the core capabilities of Azure Synapse Analytics and when they are used. I present from the perspective of data engineer but it should be easy to translate what is most useful for analysts and data scientists also. Please continue reading for a quick walkthrough of the capabilities and… Continue Reading
For production uses of Azure Synapse there are benefits to implementing Continuous Integration (CI) and Continuous Deployment (CD). Implementing CI/CD includes the need to deploy the Azure infrastructure in an automated way. In this post, I share things I learned that may be helpful for you. I also have a few links to other content that was helpful for me to get an environment setup.
When working with an Apache Spark environment you may need to install external libraries or custom packages. In this post I share the steps for installing Python packages to Azure Synapse serverless Apache Spark pools. For Python code the libraries are packages as wheel (.whl) files. You can also install Python packages that are available… Continue Reading
When working with an Apache Spark environment you may need to install third party libraries or custom packages. In this post I share the steps for installing Java or Scala libraries to Azure Synapse serverless Apache Spark pools. For Java or Scala code the libraries are packaged as JAR files that you add to the… Continue Reading
In this video, I share with you about Apache Spark using the Python language, often referred to as PySpark. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about PySpark or just want to get… Continue Reading
In this video, I share with you about Apache Spark using the Scala language. We’ll walk through a quick demo on Azure Synapse Analytics, an integrated platform for analytics within Microsoft Azure cloud. This short demo is meant for those who are curious about Spark with Scala or just want to get a peek at… Continue Reading
Spark .NET is the C# API for Apache Spark - a popular platform for big data processing. This demo is for you if you are curious to see a sample Spark .NET program in action or are interested in seeing Azure Synapse serverless Apache Spark notebooks. This demo includes guidance of how you can follow along to build a Spark .NET data load that reads linked sample data, transforms data, joins to a lookup table, and saves as a Delta Lake file to your Azure Data Lake Storage Gen2 account.