Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Databricks log4j output and the Spark metrics. I include written instructions and troubleshooting guidance in this post to help you set this up yourself.
- Clone repository (or just download jars and skip to step 3): https://github.com/datakickstart/spark-monitoring
- Build jars
- Run commands in upload_with_dbfs.sh to upload files
dbfs mkdirs dbfs:/databricks/spark-monitoring dbfs cp --overwrite src/target/spark-listeners_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/ dbfs cp --overwrite src/target/spark-listeners-loganalytics_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/ dbfs cp --overwrite src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/
- Create log analytics workspace (if doesn’t exist)
- Get log analytics workspace id and key (from “Agents management” pane)
- Add log analytics workspace ID and key to a Databricks secret scope
- Add environment configs to cluster environment variables
- Add the spark-monitoring.sh init script in the cluster advanced options
- Start cluster and confirm Event Log shows successful cluster init
- Confirm custom logs are created in Log Analytics and messages are flowing to it
What if custom logs do not show up in Azure Log Analytics?
There are a few things to look at to try and see what has gone wrong.
- Start cluster and watch event log to confirm you see an INITS_SCRIPTS_FINISHED message for spark-monitoring.sh (setup step 9).
- Confirm cluster environment variables were set (setup step 7) and that they reference secret names in a Databricks secret scope. To check what is in your databricks secret scope, replace demo with your secret scope name and run the following script from a notebook: dbutils.secrets.list(scope=”demo”)
- Confirm the init script was added properly (setup step 8). To confirm script exists in the location you configured, run the following script from a notebook: dbutils.fs.ls(“dbfs:/databricks/spark-monitoring/spark-monitoring.sh”)