Monitoring Azure Databricks with Log Analytics

Original video

Updated Video

Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Databricks log4j output and the Spark metrics. I include written instructions and troubleshooting guidance in this post to help you set this up yourself.

Setup steps

Clone repository (or just download jars and skip to step 3): https://github.com/datakickstart/spark-monitoring
Build jars

Run commands in upload_with_dbfs.sh to upload files

dbfs mkdirs dbfs:/databricks/spark-monitoring
dbfs cp --overwrite src/target/spark-listeners_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/
dbfs cp --overwrite src/target/spark-listeners-loganalytics_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/
dbfs cp --overwrite src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/

Create log analytics workspace (if doesn’t exist)
Get log analytics workspace id and key (from “Agents management” pane)
Add log analytics workspace ID and key to a Databricks secret scope
Add environment configs to cluster environment variables
Add the spark-monitoring.sh init script in the cluster advanced options
Start cluster and confirm Event Log shows successful cluster init
Confirm custom logs are created in Log Analytics and messages are flowing to it

Troubleshooting

What if custom logs do not show up in Azure Log Analytics?

There are a few things to look at to try and see what has gone wrong.

Start cluster and watch event log to confirm you see an INITS_SCRIPTS_FINISHED message for spark-monitoring.sh (setup step 9).
Confirm cluster environment variables were set (setup step 7) and that they reference secret names in a Databricks secret scope. To check what is in your databricks secret scope, replace demo with your secret scope name and run the following script from a notebook: dbutils.secrets.list(scope=”demo”)
Confirm the init script was added properly (setup step 8). To confirm script exists in the location you configured, run the following script from a notebook: dbutils.fs.ls(“dbfs:/databricks/spark-monitoring/spark-monitoring.sh”)

Paulo

April 6, 2022 at 6:31 am

Nice summary of how to set up spark-monitoring with law
I’m facing now an issue to compile sample files to test it
https://github.com/mspnp/spark-monitoring#run-the-sample-job-optional
with this message
[ERROR] Failed to execute goal on project spark-monitoring-sample: Could not resolve dependencies for project com.microsoft.pnp:spark-monitoring-sample:jar:1.0.0: com.microsoft.pnp:spark-listeners:jar:1.0.0 was not found in https://repo.maven.apache.org/maven2 during a previous attempt. This failure was cached in the local repository and resolution is not reattempted until the update interval of central has elapsed or updates are forced

Do you know what it could be?
Many thanks

Loading...

DUSTIN VANNOY

Monitoring Azure Databricks with Log Analytics

Original video

Updated Video

Setup steps

Troubleshooting

What if custom logs do not show up in Azure Log Analytics?

Like this:

1 Comment

Leave a ReplyCancel reply

About

Featured Posts

Databricks Monitoring with System Tables

Azure Databricks with Log Analytics – Updated for DBR 11.3+

Incremental Data Loading with Azure Databricks

Databricks CI/CD: Intro to Asset Bundles (DABs)

Azure Data Platform Overview slides

Data + AI Summit 2023 – Data Engineer key takeaways

Original video

Updated Video

Setup steps

Troubleshooting

What if custom logs do not show up in Azure Log Analytics?

Share this:

Like this:

Leave a ReplyCancel reply

About

Stay informed

Featured Posts

Discover more from DUSTIN VANNOY