Create Azure Databricks Cluster from the Portal

By Dustin Vannoy / Feb 21, 2020 / 3 Comments

When getting started with Azure Databricks for data processing and analytics, you need to create at least one cluster to get started. Check out the video for a quick overview of how to do this from the Azure Portal. I include a quick description of the options you have and an overview of what cluster management tabs are available after cluster creation.

The requirements to follow along in your own Azure account are:

An Azure Account
An Azure Databricks Workspace (14-day trial will work)

Here are the basic setting and I recommend for a test cluster (see video for explanations of all the UI options).

Cluster Mode = Standard
Pool = None
Databricks Runtime Version = 6.3 (or latest)
Enable Autoscaling = No
Terminate After = 120 minutes (default)
Worker Type = Standard_DS3_v2 (default)
Workers = 2
Driver Type = Same as worker

In future posts I’ll share how to create clusters from the command line or using a Python script and show a few more options that are not included in this video.

3 Comments

Annie W.

February 12, 2021 at 10:56 am
Reply

It’s very often that one would encounter an error as below:
Databricks execution failed with error state: InternalError, error message: Unexpected failure while waiting for the cluster (0208-202419-zinc966) to be ready.Cause Unexpected state for cluster (0208-202419-zinc966): CLOUD_PROVIDER_LAUNCH_FAILURE(CLOUD_FAILURE): azure_error_code:OperationNotAllowed,azure_error_message:Operation could not be completed as it results in exceeding approved Total Regional Cores quota. Additional details – Deployment Model: Resource Manager, Location: eastus, Current Limit: 10, Current Usage: 10, Additional Required: 8, (Minimum) New Limit Required: 18.
When you have a free tier Azure subscription (which only allows 4 cores).
So the above settings that you have require 8 cores to be available. I would have to resort to have the following settings in order to get sufficient cores to be ran for free tier azure subscription. Would you agree?

Cluster Mode = single node
Pool = None
Databricks Runtime Version = 7.4 LTE (or latest)
Enable Autoscaling = No
Terminate After = 120 minutes (default)
Worker Type = Standard_DS3_v2 (default)
Driver Type = Same as worker

Loading...
- dustinvannoy
  
  February 12, 2021 at 11:04 am
  Reply
  
  Yes, that is a good point. Thanks for adding that. For a pay as you go subscription it is pretty easy to request the quota be raised but based on the free tier limits you mention your configuration seems good. It’s also worth mentioning that the “Current Usage” number may include clusters that recently terminated (if I recall correctly). That is definitely something I have seen with a similar quota message about the number of public IP address that can be used on the subscription. So, if you get one of these quota messages and it doesn’t seem like you are actually using the resources it says, then give it a few minutes and try again.
  
  Loading...
  - Annie Wong
    
    February 12, 2021 at 12:35 pm
    
    Thanks Dustin!
    
    Loading...

DUSTIN VANNOY

Create Azure Databricks Cluster from the Portal

Like this:

3 Comments

Leave a ReplyCancel reply

About

Featured Posts

Claude Code Essentials for Data Professionals

Cursor with Databricks: AI Enhanced Development

OSS Spotlight: Unity Catalog

Essential Best Practices for Data Engineers on Databricks

PASS 2024 – Databricks Resources for DevX and CICD

Databricks Asset Bundles: Advanced Examples

Share this:

Like this:

Leave a ReplyCancel reply

About

Stay informed

Featured Posts

Discover more from DUSTIN VANNOY