Jump to content

Part 5 - Unlock the Power of Azure Data Factory: A Guide to Boosting Your Data Ingestion Process

Featured Replies

Posted

Introduction

 

 

Part 1

 

  1. Architecture and Scenario
  2. Creating resources in Azure
  3. Create Azure Storage Containers
  4. Create Azure Key Vaults
  5. Create Azure Data Factory: With Key Vault Access

 

Part 2

 

  1. Configure Azure Data Factory Source Control
  2. Construct Azure Data Factory Data Pipeline
  3. Publishing Concept for Azure Data Factory
  4. Configure Deployed Azure Resources.

 

Part 3

 

  1. The YAML Pipeline Structure
  2. The Publish Process
  3. ARM Template Parameterization
  4. ADF ARM Template Deployment

 

Part 4

 

  1. How to use Azure DevOps Pipeline Templates

 

This section will build on that and include how to build/deploy larger Data Factories which leverage Azure Resource Manager Linked Templates for deployment.

 

 

 

When Linked Templates Are Required

 

 

Typically, users will not desire to use linked templates; however, there are explicit limitations of a single ARM template which may require one to deploy their ARM template via Linked Templates. Specifically, when the ARM template.json file is > 4MB is one such instance.

 

 

 

Regardless of the Data Factory size the Data Factory build process will generate the necessary files and file structure to accommodate both a single template deployment and a linked ARM template deployment. The linked templates are only required if your main ARM template file is greater than >4MB. I.e. if you have a larger Data Factory.

 

 

 

What is a Linked Template

 

 

A linked template is a process by which the main deployment template will call additional files (‘linked’). These files will need to stored in an Azure Storage Account. This will break up the single 4 MB file limitation of an ARM template as it will call multiple files whose size is < 4MB. Now to be clear you will still be limited by limitation of an ARM template. As a reminder those are:

 

  • 256 parameters
  • 256 variables
  • 800 resources (including copy count)
  • 64 output values

  • 10 unique locations per subscription/tenant/management group scope
  • 24,576 characters in a template expression

 

 

 

As mentioned, all ARM template and parameter files associated with the deployment are required to be hosted in an Azure Storage Account. The Azure Resource Manager will directly call the main ARM template.json file and then loop through all the associated template files.

 

 

 

This is an important concept as it will drive some of the requirements for our Azure Data Factory Linked ARM Template deployments. Since the Azure Resource Manager will be deploying our templates, it will need access to the individual files; unfortunately, this is achieved via a Shared Access Signature (SaS) token. I say unfortunately as on the Microsoft's well architected framework specifically calls out limiting the use of SAS tokens. We will do our best to limit their usage and duration.

 

 

 

Outlining the steps

 

 

Since these linked template files are already being generated by our Data Factory build, we do not need to make any changes to our Data Factory build template. The updates will be on the deployment process to accommodate the ability to deploy linked templates. I am going to take a security focused approach and let the deployment pipeline create the necessary storage account, SAS token, and then destroy the storage account.

 

 

 

The rationale behind this is that the storage account is a temporary method that is associated with the specific stage of the pipeline instance. It does not need to live longer than the stage execution. Removing it as part of the stage execution will clean up after itself as well as eliminating any potential security concern over the storage account. Any concerns or auditing on deployment artifacts should be traced back to the published artifact of the pipeline.

 

 

 

Based on the nature of SAS tokens these will be the steps required to deploy the linked Data Factory ARM template:

 

  1. Azure CLI to create the storage account
  2. Azure CLI to create the storage account container
  3. Azure CLI to create a SAS expiration date and write it back to an ADO variable
  4. Azure CLI to generate the SAS token and write it to an ADO secret
  5. Azure CLI to copy the linked ARM templates to the storage account container
  6. Stop Azure Data Factory Triggers – This is a PowerShell script created by the deploy process which will ensure our triggers are not executing during deployment.
  7. Azure Resource Manager (ARM) Template Deployment – The ARM template published as part of the build process will now be deployed to an environment. The location of the linked template files will be required as well as the SaS token generated above as way to acces them. We will need to provide the opportunity to include a parameter file as well as override parameters if needed.
  8. Start Azure Data Factory Triggers – After a successful deployment we will want to start the Azure Data Factory triggers with the same script we used to stop the ADF triggers.
  9. Azure CLI to delete the storage account

 

Now we could combine all of our CLI tasks together into one; however, I prefer to break it up task by task to ensure task isolation.

 

 

 

Jobs

 

 

Since there is overlap with the regular Azure Data Factory deployment and multiple CLI steps I am going to skip the task breakdown and provide the job in an expanded format to talk through.

 

 

 

jobs:

- deployment: adfdemo_infrastructure_dev_eus

environment:

name: dev

variables:

- name: azureServiceConnectionName

value: 'AzureDevServiceConnection'

- name: azureSubscriptionID

value: '#######-####-####-####-#############'

- name: dataFactoryAbrv

value: 'adf'

- name: storageAccountAbrv

value: 'sa'

- name: deploymentName

value: adfdemo_infrastructure_dev_eus

- name: resourceGroupName

value: rg-adfdemo-dev-eus

- name: dataFactoryName

value: adf-adfdemo-dev-eus

- name: powerShellScriptPath

value: ../ADFTemplates/PrePostDeploymentScript.ps1

- name: ARMTemplatePath

value: ADFTemplates/linkedTemplates/ArmTemplate_master.json

- name: linkedServiceStorageAccountName

value: saadf$(Build.BuildId)deveus

- name: linkedServiceStorageAccountContainerName

value: 'templates'

- name: linkedServiceStorageAccountURL

value: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates

strategy:

runOnce:

deploy:

steps:

- task: AzureCLI@2

displayName: Create Storage Account for Linked Templates

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: 'az storage account create --name saadf$(Build.BuildId)deveus --resource-group rg-adfdemo-dev-eus '

- task: AzureCLI@2

displayName: Create Container for Linked Templates

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: 'az storage container create --account-name saadf$(Build.BuildId)deveus --name templates '

- task: AzureCLI@2

displayName: Get SAS Expiration Date

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: "$date= $(Get-Date).AddDays(1)\n$formattedDate = $date.ToString(\"yyyy-MM-dd\")\necho \"##vso[task.setvariable variable=sasExpirationDate;]$formattedDate\"\n "

- task: AzureCLI@2

displayName: Get SAS Token for Storage Account

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: "$token= az storage container generate-sas --account-name saadf$(Build.BuildId)deveus --name templates --permissions r --expiry $(sasExpirationDate) --output tsv\necho \"##vso[task.setvariable variable=sasToken;issecret=true]?$token\"\n "

- task: AzureCLI@2

displayName: Copy Linked Templates to Azure Storage

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: 'az storage blob upload-batch --account-name saadf$(Build.BuildId)deveus --destination templates --source ../ADFTemplates '

- task: AzurePowerShell@5

displayName: Stop ADF Triggers

inputs:

scriptType: 'FilePath'

ConnectedServiceNameARM: AzureDevServiceConnection

scriptPath: ../ADFTemplates/PrePostDeploymentScript.ps1

ScriptArguments: -armTemplate "ADFTemplates/linkedTemplates/ArmTemplate_master.json" -ResourceGroupName rg-adfdemo-dev-eus -DataFactoryName adf-adfdemo-dev-eus -predeployment $true -deleteDeployment $false

errorActionPreference: stop

FailOnStandardError: False

azurePowerShellVersion: azurePowerShellVersion

preferredAzurePowerShellVersion: 3.1.0

pwsh: False

workingDirectory: ../

- task: AzureResourceManagerTemplateDeployment@3

inputs:

deploymentScope: Resource Group

azureResourceManagerConnection: AzureDevServiceConnection

action: Create Or Update Resource Group

resourceGroupName: rg-adfdemo-dev-eus

location: eastus

csmFileLink: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/linkedTemplates/ArmTemplate_master.json$(sasToken)

csmParametersFileLink: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/parameters/dev.eus.parameters.json$(sasToken)

overrideParameters: ' -containerUri https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/linkedTemplates -containerSasToken $(sasToken)'

deploymentMode: Incremental

templateLocation: 'URL of the file'

- task: AzurePowerShell@5

displayName: Start ADF Triggers

inputs:

scriptType: 'FilePath'

ConnectedServiceNameARM: AzureDevServiceConnection

scriptPath: ../ADFTemplates/PrePostDeploymentScript.ps1

ScriptArguments: -armTemplate "ADFTemplates/linkedTemplates/ArmTemplate_master.json" -ResourceGroupName rg-adfdemo-dev-eus -DataFactoryName adf-adfdemo-dev-eus -predeployment $false -deleteDeployment $true

errorActionPreference: stop

FailOnStandardError: False

azurePowerShellVersion: azurePowerShellVersion

preferredAzurePowerShellVersion: 3.1.0

pwsh: False

workingDirectory: ../

- task: AzureCLI@2

displayName: Delete Storage Account

inputs:

azureSubscription: AzureDevServiceConnection

scriptType: 'pscore'

scriptLocation: 'inlineScript'

inlineScript: 'az storage account delete --name saadf$(Build.BuildId)deveus --resource-group rg-adfdemo-dev-eus --yes '

 

 

 

To go into more specifics at the task level refer to my personal blog on Deploying Linked ARM Templates via YAML Pipelines.

 

Combining Templates

 

 

So that’s great we have a job for Linked Templates and a job for single file ARM deployments. Since these are both Data Factory we probably don’t want to maintain two version that overlap so what if we leverage our templating strategy from Part 4 and have on template determine which job to use?

 

 

 

Well we can:

 

 

 

jobs:

- ${{ if eq(parameters.linkedTemplates, false)}}:

- template: ../jobs/adf_deploy_env_job.yml

parameters:

environmentName: ${{ environmentObject.environmentName }}

templateFile: ${{ variables.templateFile }}

templateParametersFile: ${{ parameters.templateParametersFile }}

serviceName: ${{ parameters.serviceName}}

regionAbrv: ${{ regionAbrv }}

- ${{ else }}:

- template: ../jobs/adf_linked_template_deploy_env_job.yml

parameters:

environmentName: ${{ environmentObject.environmentName }}

templateFile: ${{ variables.templateFile }}

templateParametersFile: ${{ parameters.templateParametersFile }}

serviceName: ${{ parameters.serviceName}}

regionAbrv: ${{ regionAbrv }}

 

 

 

If we pass in a parameter to our adf_deploy_stage.yml we can then dynamically decide which job template to load. For more on how to leverage you can read this blog on leveraging if expressions in your pipelines.

 

 

 

Conclusion

 

 

Azure Data Factory natively provides templates to accommodate linked ARM template deployments; however, it is up to the individual developer to fill in the gaps on how to deploy these. There are specific accommodations for a linked template such as a storage account and a SAS token.

 

This article provided one such approach and associated pipeline and templates to help you achieve this. For the complete source code be sure to visit TheYAMLPipelineOne repository on GitHub.

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...