Part 5 - Unlock the Power of Azure Data Factory: A Guide to Boosting Your Data Ingestion Process

  • Thread starter Thread starter j_folberth
  • Start date Start date
J

j_folberth

Introduction


Part 1

  1. Architecture and Scenario
  2. Creating resources in Azure
  3. Create Azure Storage Containers
  4. Create Azure Key Vaults
  5. Create Azure Data Factory: With Key Vault Access

Part 2

  1. Configure Azure Data Factory Source Control
  2. Construct Azure Data Factory Data Pipeline
  3. Publishing Concept for Azure Data Factory
  4. Configure Deployed Azure Resources.

Part 3

  1. The YAML Pipeline Structure
  2. The Publish Process
  3. ARM Template Parameterization
  4. ADF ARM Template Deployment

Part 4

  1. How to use Azure DevOps Pipeline Templates

This section will build on that and include how to build/deploy larger Data Factories which leverage Azure Resource Manager Linked Templates for deployment.



When Linked Templates Are Required


Typically, users will not desire to use linked templates; however, there are explicit limitations of a single ARM template which may require one to deploy their ARM template via Linked Templates. Specifically, when the ARM template.json file is > 4MB is one such instance.



Regardless of the Data Factory size the Data Factory build process will generate the necessary files and file structure to accommodate both a single template deployment and a linked ARM template deployment. The linked templates are only required if your main ARM template file is greater than >4MB. I.e. if you have a larger Data Factory.



What is a Linked Template


A linked template is a process by which the main deployment template will call additional files (‘linked’). These files will need to stored in an Azure Storage Account. This will break up the single 4 MB file limitation of an ARM template as it will call multiple files whose size is < 4MB. Now to be clear you will still be limited by limitation of an ARM template. As a reminder those are:

  • 256 parameters
  • 256 variables
  • 800 resources (including copy count)
  • 64 output values
  • 10 unique locations per subscription/tenant/management group scope
  • 24,576 characters in a template expression



As mentioned, all ARM template and parameter files associated with the deployment are required to be hosted in an Azure Storage Account. The Azure Resource Manager will directly call the main ARM template.json file and then loop through all the associated template files.



This is an important concept as it will drive some of the requirements for our Azure Data Factory Linked ARM Template deployments. Since the Azure Resource Manager will be deploying our templates, it will need access to the individual files; unfortunately, this is achieved via a Shared Access Signature (SaS) token. I say unfortunately as on the Microsoft's well architected framework specifically calls out limiting the use of SAS tokens. We will do our best to limit their usage and duration.



Outlining the steps


Since these linked template files are already being generated by our Data Factory build, we do not need to make any changes to our Data Factory build template. The updates will be on the deployment process to accommodate the ability to deploy linked templates. I am going to take a security focused approach and let the deployment pipeline create the necessary storage account, SAS token, and then destroy the storage account.



The rationale behind this is that the storage account is a temporary method that is associated with the specific stage of the pipeline instance. It does not need to live longer than the stage execution. Removing it as part of the stage execution will clean up after itself as well as eliminating any potential security concern over the storage account. Any concerns or auditing on deployment artifacts should be traced back to the published artifact of the pipeline.



Based on the nature of SAS tokens these will be the steps required to deploy the linked Data Factory ARM template:

  1. Azure CLI to create the storage account
  2. Azure CLI to create the storage account container
  3. Azure CLI to create a SAS expiration date and write it back to an ADO variable
  4. Azure CLI to generate the SAS token and write it to an ADO secret
  5. Azure CLI to copy the linked ARM templates to the storage account container
  6. Stop Azure Data Factory Triggers – This is a PowerShell script created by the deploy process which will ensure our triggers are not executing during deployment.
  7. Azure Resource Manager (ARM) Template Deployment – The ARM template published as part of the build process will now be deployed to an environment. The location of the linked template files will be required as well as the SaS token generated above as way to acces them. We will need to provide the opportunity to include a parameter file as well as override parameters if needed.
  8. Start Azure Data Factory Triggers – After a successful deployment we will want to start the Azure Data Factory triggers with the same script we used to stop the ADF triggers.
  9. Azure CLI to delete the storage account

Now we could combine all of our CLI tasks together into one; however, I prefer to break it up task by task to ensure task isolation.



Jobs


Since there is overlap with the regular Azure Data Factory deployment and multiple CLI steps I am going to skip the task breakdown and provide the job in an expanded format to talk through.



jobs:
- deployment: adfdemo_infrastructure_dev_eus
environment:
name: dev
variables:
- name: azureServiceConnectionName
value: 'AzureDevServiceConnection'
- name: azureSubscriptionID
value: '#######-####-####-####-#############'
- name: dataFactoryAbrv
value: 'adf'
- name: storageAccountAbrv
value: 'sa'
- name: deploymentName
value: adfdemo_infrastructure_dev_eus
- name: resourceGroupName
value: rg-adfdemo-dev-eus
- name: dataFactoryName
value: adf-adfdemo-dev-eus
- name: powerShellScriptPath
value: ../ADFTemplates/PrePostDeploymentScript.ps1
- name: ARMTemplatePath
value: ADFTemplates/linkedTemplates/ArmTemplate_master.json
- name: linkedServiceStorageAccountName
value: saadf$(Build.BuildId)deveus
- name: linkedServiceStorageAccountContainerName
value: 'templates'
- name: linkedServiceStorageAccountURL
value: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates
strategy:
runOnce:
deploy:
steps:
- task: AzureCLI@2
displayName: Create Storage Account for Linked Templates
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: 'az storage account create --name saadf$(Build.BuildId)deveus --resource-group rg-adfdemo-dev-eus '
- task: AzureCLI@2
displayName: Create Container for Linked Templates
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: 'az storage container create --account-name saadf$(Build.BuildId)deveus --name templates '
- task: AzureCLI@2
displayName: Get SAS Expiration Date
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: "$date= $(Get-Date).AddDays(1)\n$formattedDate = $date.ToString(\"yyyy-MM-dd\")\necho \"##vso[task.setvariable variable=sasExpirationDate;]$formattedDate\"\n "
- task: AzureCLI@2
displayName: Get SAS Token for Storage Account
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: "$token= az storage container generate-sas --account-name saadf$(Build.BuildId)deveus --name templates --permissions r --expiry $(sasExpirationDate) --output tsv\necho \"##vso[task.setvariable variable=sasToken;issecret=true]?$token\"\n "
- task: AzureCLI@2
displayName: Copy Linked Templates to Azure Storage
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: 'az storage blob upload-batch --account-name saadf$(Build.BuildId)deveus --destination templates --source ../ADFTemplates '
- task: AzurePowerShell@5
displayName: Stop ADF Triggers
inputs:
scriptType: 'FilePath'
ConnectedServiceNameARM: AzureDevServiceConnection
scriptPath: ../ADFTemplates/PrePostDeploymentScript.ps1
ScriptArguments: -armTemplate "ADFTemplates/linkedTemplates/ArmTemplate_master.json" -ResourceGroupName rg-adfdemo-dev-eus -DataFactoryName adf-adfdemo-dev-eus -predeployment $true -deleteDeployment $false
errorActionPreference: stop
FailOnStandardError: False
azurePowerShellVersion: azurePowerShellVersion
preferredAzurePowerShellVersion: 3.1.0
pwsh: False
workingDirectory: ../
- task: AzureResourceManagerTemplateDeployment@3
inputs:
deploymentScope: Resource Group
azureResourceManagerConnection: AzureDevServiceConnection
action: Create Or Update Resource Group
resourceGroupName: rg-adfdemo-dev-eus
location: eastus
csmFileLink: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/linkedTemplates/ArmTemplate_master.json$(sasToken)
csmParametersFileLink: https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/parameters/dev.eus.parameters.json$(sasToken)
overrideParameters: ' -containerUri https://saadf$(Build.BuildId)deveus.blob.core.windows.net/templates/linkedTemplates -containerSasToken $(sasToken)'
deploymentMode: Incremental
templateLocation: 'URL of the file'
- task: AzurePowerShell@5
displayName: Start ADF Triggers
inputs:
scriptType: 'FilePath'
ConnectedServiceNameARM: AzureDevServiceConnection
scriptPath: ../ADFTemplates/PrePostDeploymentScript.ps1
ScriptArguments: -armTemplate "ADFTemplates/linkedTemplates/ArmTemplate_master.json" -ResourceGroupName rg-adfdemo-dev-eus -DataFactoryName adf-adfdemo-dev-eus -predeployment $false -deleteDeployment $true
errorActionPreference: stop
FailOnStandardError: False
azurePowerShellVersion: azurePowerShellVersion
preferredAzurePowerShellVersion: 3.1.0
pwsh: False
workingDirectory: ../
- task: AzureCLI@2
displayName: Delete Storage Account
inputs:
azureSubscription: AzureDevServiceConnection
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: 'az storage account delete --name saadf$(Build.BuildId)deveus --resource-group rg-adfdemo-dev-eus --yes '



To go into more specifics at the task level refer to my personal blog on Deploying Linked ARM Templates via YAML Pipelines.

Combining Templates


So that’s great we have a job for Linked Templates and a job for single file ARM deployments. Since these are both Data Factory we probably don’t want to maintain two version that overlap so what if we leverage our templating strategy from Part 4 and have on template determine which job to use?



Well we can:



jobs:
- ${{ if eq(parameters.linkedTemplates, false)}}:
- template: ../jobs/adf_deploy_env_job.yml
parameters:
environmentName: ${{ environmentObject.environmentName }}
templateFile: ${{ variables.templateFile }}
templateParametersFile: ${{ parameters.templateParametersFile }}
serviceName: ${{ parameters.serviceName}}
regionAbrv: ${{ regionAbrv }}
- ${{ else }}:
- template: ../jobs/adf_linked_template_deploy_env_job.yml
parameters:
environmentName: ${{ environmentObject.environmentName }}
templateFile: ${{ variables.templateFile }}
templateParametersFile: ${{ parameters.templateParametersFile }}
serviceName: ${{ parameters.serviceName}}
regionAbrv: ${{ regionAbrv }}



If we pass in a parameter to our adf_deploy_stage.yml we can then dynamically decide which job template to load. For more on how to leverage you can read this blog on leveraging if expressions in your pipelines.



Conclusion


Azure Data Factory natively provides templates to accommodate linked ARM template deployments; however, it is up to the individual developer to fill in the gaps on how to deploy these. There are specific accommodations for a linked template such as a storage account and a SAS token.

This article provided one such approach and associated pipeline and templates to help you achieve this. For the complete source code be sure to visit TheYAMLPipelineOne repository on GitHub.

Continue reading...
 
Back
Top