Monitoring Azure Databricks with Log Analytics

Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Databricks log4j output and the Spark metrics. I include written instructions and troubleshooting guidance in this post to help you set this up yourself.

Setup steps

  1. Clone repository (or just download jars and skip to step 3): https://github.com/datakickstart/spark-monitoring
  2. Build jars
  3. Run commands in upload_with_dbfs.sh to upload files
    dbfs mkdirs dbfs:/databricks/spark-monitoring
    dbfs cp --overwrite src/target/spark-listeners_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/
    dbfs cp --overwrite src/target/spark-listeners-loganalytics_3.1.1_2.12-1.0.0.jar dbfs:/databricks/spark-monitoring/
    dbfs cp --overwrite src/spark-listeners/scripts/spark-monitoring.sh dbfs:/databricks/spark-monitoring/
  4. Create log analytics workspace (if doesn’t exist)
  5. Get log analytics workspace id and key (from “Agents management” pane)
  6. Add log analytics workspace ID and key to a Databricks secret scope
  7. Add environment configs to cluster environment variables
    Screenshot showing cluster advanced options has environment variables for log analytics workspace id and key
  8. Add the spark-monitoring.sh init script in the cluster advanced options
    Screenshot of cluster advanced options init script section showing spark monitoring file was added
  9. Start cluster and confirm Event Log shows successful cluster init
    Screenshot of JSON detail for INIT_SCRIPTS_FINISHED step in the cluster event log
  10. Confirm custom logs are created in Log Analytics and messages are flowing to it
    Screenshot of Logs pane in Log Analytics showing Spark custom logs and results from querying SparkLoggingEvent_CL log

Troubleshooting

What if custom logs do not show up in Azure Log Analytics?

There are a few things to look at to try and see what has gone wrong.

  1. Start cluster and watch event log to confirm you see an INITS_SCRIPTS_FINISHED message for spark-monitoring.sh (setup step 9).
  2. Confirm cluster environment variables were set (setup step 7) and that they reference secret names in a Databricks secret scope. To check what is in your databricks secret scope, replace demo with your secret scope name and run the following script from a notebook: dbutils.secrets.list(scope=”demo”)
  3. Confirm the init script was added properly (setup step 8). To confirm script exists in the location you configured, run the following script from a notebook: dbutils.fs.ls(“dbfs:/databricks/spark-monitoring/spark-monitoring.sh”)
Leave a comment

Leave a Reply