Snowflake, like similar analytic databases, has a fast way to load data from files. The COPY command can quickly read files and append the records to a table. It does this by reading from an external stage which points to a cloud storage location. This currently supports Azure Storage, Amazon S3, and Google Cloud Storage. Alternatively, you can setup an internal stage which is a file location within Snowflake and PUT files there via an API call. When possible I recommend using external storage since often the data can be used for a variety of purposes out of the same cloud storage location.
In this tutorial we set up storage integration and an external stage on Azure. This is meant to be a quick view of the steps needed to set this up for the first time, assuming Azure is already configured for the most part. The official documentation for this setup is good and contains some additional caveats and considerations, but hopefully this post serves as a good starting point to understand the main steps. In other posts I will cover more about using the COPY statement.
Configure a Storage Integration for Azure
The preferred way to setup access to your Azure Storage location is by creating a Storage Integration within Snowflake. This requires the storage path(s) and an Azure Active Directory Tenant ID.
1. Find your Azure Active Directory Tenant ID and save it to use in the script in step 3. Within the Azure Portal, browse to the correct Azure Active Directory instance (if not sure then use Default Directory). In the Overview pane for that AAD, copy the value from Tenant ID.
2. If you do not know your storage url then browse to the Azure Storage Account. On the left pane select Containers. Next select the action menu (three dots), then choose Container properties.
From the Container properties screen, copy the URL. You will need to replace https: with azure: in the next step.
3. Open a Worksheet in your Snowflake account. Select your database from the dropdown then populate and run the following script. Replace the azure_tenant_id with your AAD Tenant ID and the storage_allowed_locations with your storage container url (or create a comma separate list of storage urls). Optionally, you can replace azure_int with a custom name, but be sure to replace it in all other commands in this tutorial.
create storage integration azure_int type = external_stage storage_provider = azure enabled = true azure_tenant_id = '9fb11111-11ae-111a-babb-b1111c11aaba' storage_allowed_locations = ('azure://datakickstartadls.blob.core.windows.net/demo/');
Run the describe command below to find the AZURE_CONSENT_URL (Red) and AZURE_MULTI_TENANT_APP_NAME (Orange)
desc storage integration azure_int;
Paste the AZURE_CONSENT_URL value into your browser then accept the conditions it displays on behalf of your organization.
4. To add permissions, from the storage account Access Control pane, choose +Add -> Add role assignment -> Storage Blob Data Contributor. Then click on +Select members, search and select your Snowflake user principal
According to the Snowflake documentation:
“It can take an hour or longer for Azure to create the Snowflake service principal requested through the Microsoft request page in this section. If the service principal is not available immediately, we recommend waiting an hour or two and then searching again.”
Create External Stage
1. Use the ACCOUNTADMIN role or another with appropriate permissions to create a stage. In this example I will create a Parquet format stage. This will automatically detect compression which works fine for my SNAPPY compressed data. Alternatively, you could set type to csv, json, avro, orc, or xml to load other file types.
CREATE STAGE nyctaxi_yellow
storage_integration = azure_int
url = 'azure://datakickstartadls.blob.core.windows.net/demo/nyctaxi/tripdata/yellow_parquet'
file_format = (type = parquet);
2. Run list command to confirm stage is created and view files.
Now you have create the Storage Integration with an embedded credential. You have also created one example of an external stage. These are the first setup steps when reading data from Azure Storage and into Snowflake. Once the stage is configured the COPY command works as documented and it is a good start for setting up automatic load with Snowpipe. Keep an eye out for other posts about the steps that follow.