Azure Data Platform Overview slides

I had the privilege to present for Creating Coding Careers, a great organization in the San Diego area that helps people get established in tech careers via apprenticeships and other programs. Above are the slides used in that presentation. Recommended Resources to learn Azure Data Platform Databricks Training Microsoft Learn Training… Continue Reading

Snowflake on Azure – Create External Stage

Snowflake, like similar analytic databases, has a fast way to load data from files. The COPY command can quickly read files and append the records to a table. It does this by reading from an external stage which points to a cloud storage location. This currently supports Azure Storage, Amazon S3, and Google Cloud Storage.… Continue Reading

Ingest tables in parallel with an Apache Spark notebook using multithreading

If we want to kick off a single Apache Spark notebook to process a list of tables we can write the code easily. The simple code to loop through the list of tables ends up running one table after another (sequentially). If none of these tables are very big, it is quicker to have Spark load tables concurrently (in parallel) using threads. There are some different options of how to do this, but I am sharing the easiest way I have found when working with a notebook in Databricks, Azure Synapse Spark, Jupyter, or Zeppelin.

Azure Synapse Analytics Kickstart

In this post I introduce some of the core capabilities of Azure Synapse Analytics and when they are used. I present from the perspective of data engineer but it should be easy to translate what is most useful for analysts and data scientists also. Please continue reading for a quick walkthrough of the capabilities and… Continue Reading

Azure Synapse CI/CD

For production uses of Azure Synapse there are benefits to implementing Continuous Integration (CI) and Continuous Deployment (CD). Implementing CI/CD includes the need to deploy the Azure infrastructure in an automated way. In this post, I share things I learned that may be helpful for you. I also have a few links to other content that was helpful for me to get an environment setup.

Azure Synapse Spark: External Python Packages

When working with an Apache Spark environment you may need to install external libraries or custom packages. In this post I share the steps for installing Python packages to Azure Synapse serverless Apache Spark pools. For Python code the libraries are packages as wheel (.whl) files. You can also install Python packages that are available… Continue Reading

Intro to Azure Stream Analytics

Real-time data processing is becoming more common in companies of all sizes. The use cases range from simple stream ingestion to complex machine learning pipelines. If you need to get started with streaming in Azure, Stream Analytics gives you a simple way to get up and running. Most of my streaming projects involve Apache Kafka and Spark which can take a lot of setup (or at least involving additional vendors to simplify the experience). Those technologies are great especially for challenging streaming pipelines, but if your data platform is within Azure you should consider if Stream Analytics will meet your needs.