Monitor Synapse Spark with Log Analytics

Log Analytics provides a way to easily query Spark logs and setup alerts in Azure. This provides a huge help when monitoring Apache Spark. In this video I walk through the setup steps and quick demo of this capability for the Azure Synapse Spark log4j output. I include written instructions and troubleshooting guidance in this post to help you set this up yourself.

Setup steps


  1. Azure Synapse Analytics workspace
  2. Log Analytics workspace
  3. Azure Key Vault
  4. Linked service for Azure Key Vault (in the Synapse Workspace)

To connect an Azure Synapse Apache Spark pool to send the log4j output to Log Analytics, you create a file with the required Spark configurations and upload it to your notebook.

First, create a text file with the following settings. Replace any of the values inside angle brackets <> with your own values. For example, I replace <keyvault-name> with dvtrainingkv. You may use the same secret name log-analytics-secret-key or provide your own secret name.

spark.synapse.logAnalytics.enabled true
spark.synapse.logAnalytics.workspaceId <workspace-guid> <keyvault-name>
spark.synapse.logAnalytics.keyVault.linkedServiceName <keyvault-linkedservice-name>
spark.synapse.logAnalytics.keyVault.key.secret log-analytics-secret-key

To get the workspaceId and secretKey, find the Agents management section in the log analytics workspace. The workspace id should replace <workspace-guid> in the configuration file. The secret key should be added to Azure Key Vault with secret name log-analytics-secret-key (or you can choose your own name and modify the last item in the above configuration to match).

Save the file with name you choose, for example synapse-spark-defaults.txt.

Next, upload the file to a Synapse Apache Spark pool. If you do not have one yet, you can add the file as you create it. To add to an existing one, find the pool in Manage -> Apache Spark pools, choose the action menu and Apache Spark Configuration.

From the configuration screen, select Upload.

Next, browse to the configuration file you saved earlier and then choose Upload.

Finally, select Apply.

Now the next time you run a Spark notebook or job with the pool the Spark logs (log4j stderr) will go to the Azure Log Analytics workspace.

PySpark Logging

PySpark notebooks always has log messages being generated, but you may want to set up your own logger and add messages. The following code can be run from a PySpark notebook to get a log4j logger and write messages.

spark_log4j =
logger = spark_log4j.LogManager.getLogger("datakickstart")"Test simple log message")

import json
msg = {
    "message":"Test simple log message", 
    "notebook": "pyspark_logging", 
    "source_data": "StackOverflow", 
    "destination_data": "raw_stackoverflow"

Log Analytics Query (KQL)

It takes a few minutes for the custom logs to show up the first time the pool sends log messages to the workspace. Once the custom logs show up, you can see results with a simple query.

| project TimeGenerated, logger_name_s, Message
| where logger_name_s contains "datakickstart"

Or if you send messages as JSON string then you can parse out the values.

| project TimeGenerated, logger_name_s, j=parse_json(Message)
| extend source=j.source_data, notebook=j.notebook, msg=j.message
| where logger_name_s contains "datakickstart"

Leave a comment

Leave a Reply