Jump to content

Azure Data Factory CI/CD with GitHub Actions


Recommended Posts

Guest OlgaMolocenco
Posted

Azure Data Factory allows connecting to a Git repository for source control, partial saves, better collaboration among data engineers and better CI/CD. As of this writing, Azure Repos and GitHub are supported. To enable automated CI/CD, we can use Azure Pipelines or GitHub Actions.

 

 

In this blog post, we will implement CI/CD with GitHub Actions. This will be done using workflows. A workflow is defined by a YAML (.yml) file that contains the various steps and parameters that make up the workflow.

 

 

The workflow will leverage the automated publishing capability of ADF. As well as the Azure Data Factory Deploy Action from the GitHub Marketplace which under the hood uses the pre- and post-deployment script.

 

 

We will perform the following steps:

 

- Generate deployment credentials,

 

- Configure the GitHub secrets,

 

- Create the workflow,

 

- Monitor the workflow execution.

 

 

 

Requirements:

 

- Azure Subscription. If you don't have one, create a free Azure account before you begin.

 

- Azure Data Factory instance. If you don't have an existing Data Factory, follow this tutorial to create one.

 

- GitHub repository integration set up. If you don't yet have a GitHub repository connected to your development Data Factory, follow the steps here to set it up.

 

 

Generate deployment credentials

 

 

You will need credentials that will authenticate and authorize GitHub Actions to deploy your ARM template to the target Data Factory. We will leverage workload identity federation. Using workload identity federation allows you to access Azure Active Directory (Azure AD) protected resources without needing to manage secrets.

 

We will create an Azure Active Directory application and service principal with Azure CLI. If you prefer to do it in Azure Portal or with Azure PowerShell, see here instructions.

 

1. Create the Azure Active Directory application.

 

 

 

 

 

az ad app create --display-name myApp

 

 

 

 

 

 

 

This command will output JSON with an appId that is your client-id. The objectId is APPLICATION-OBJECT-ID and it will be used for creating federated credentials with Graph API calls.

 

 

2. Create a service principal. Replace the $appID with the appId from your JSON output. This command generates JSON output with a different objectId will be used in the next step. The new objectId is the assignee-object-id.

 

 

 

 

 

az ad sp create --id $appId

 

 

 

 

 

 

3. Create a new role assignment by subscription and object. By default, the role assignment will be tied to your default subscription. Replace $subscriptionId with your subscription ID, $resourceGroupName with your resource group name, and $assigneeObjectId with generated assignee-object-id (the newly created service principal object id).

 

az role assignment create --role contributor --subscription $subscriptionId --assignee-object-id $assigneeObjectId --assignee-principal-type ServicePrincipal --scope /subscriptions/$subscriptionId/resourceGroups/$resourceGroupName

 

 

4. Copy the values for clientId, subscriptionId, and tenantId to use later in your GitHub Actions workflow.

 

 

5. Next, we will add federated credentials. You can add federated credentials in the Azure portal or with the Microsoft Graph REST API. Follow the steps here.

 

Configure the GitHub secrets

 

 

You need to provide your application's Client ID, Tenant ID and Subscription ID to the login action. These values can either be provided directly in the workflow or can be stored in GitHub secrets and referenced in your workflow. Saving the values as GitHub secrets is the more secure option.

 

 

1. Open your GitHub repository and go to Settings.

 

mediumvv2px400.png.10f067abf025bb38de553034dbc07f88.png

 

 

 

2. Select Security -> Secrets and variables -> Actions.

 

mediumvv2px400.png.03f751eb285e0d7f59fe4c9d0c273774.png

 

3. Create secrets for AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_SUBSCRIPTION_ID. Use these values from your Azure Active Directory application for your GitHub secrets:

 


GitHub Secret

Azure Active Directory Application

AZURE_CLIENT_ID

Application (client) ID

AZURE_TENANT_ID

Directory (tenant) ID

AZURE_SUBSCRIPTION_ID

Subscription ID

 

 

4. Save each secret by selecting Add secret.

 

 

Create the workflow that deploys the ADF ARM template

 

 

At this point, you must have a Data Factory instance with git integration set up. If this is not the case, please follow the links in the Requirements section.

 

The workflow is composed of two jobs:

 

  • A build job which leverages the npm package @microsoft/azure-data-factory-utilities to (1) validate all the Data Factory resources in the repository. You'll get the same validation errors as when "Validate All" is selected in ADF Studio. And (2) export the ARM template that’ll be later used to deploy to the QA or Staging environment.
  • A release job which takes the exported ARM template artifact and deploys it to the higher environment ADF instance.

 

1. Navigate to the repository connected to your ADF, under your root folder (ADFroot in the below example) create a build folder where you will store the package.json file:

 

 

 

 

 

{

"scripts":{

"build":"node node_modules/@microsoft/azure-data-factory-utilities/lib/index"

},

"dependencies":{

"@microsoft/azure-data-factory-utilities":"^1.0.0"

}

}

 

 

 

 

 

 

 

Here is how this should look like:

 

mediumvv2px400.png.6b5350affeedc941561b7b4b22c596d6.png

 

 

 

And here is the Git repository setup from the ADF Studio for reference:

 

mediumvv2px400.png.c6d1ca0087395eb80a2e21929ea9c4a7.png

 

2. Navigate to the Actions tab -> New workflow

 

mediumvv2px400.png.6f0fcd5d259ebadf6ce9e397f7811dd3.png

 

 

 

3. Paste the workflow YAML attached to this blog.

 

 

4. Let’s walk together through the parameters you need to supply. These ate highlighted and comments describe what each expects. For the build job:

 

mediumvv2px400.png.ebbaabdfe2ae0aadbda3e6322a167594.png

 

 

 

Tip: Use the same artifact name in the Export, Upload and Download actions.

 

More details about the validate and export commands can be found here.

 

Release step:

 

mediumvv2px400.png.e4c644a666edec73e5c7e41ffbf83eb9.png

 

 

 

For more details about the Azure Data Factory Deploy Action, please check the GitHub Marketplace listing.

 

 

Monitor the workflow execution

 

 

Now, let’s test the setup by making some changes in the development ADF instance. Create a feature branch where you make the changes, and then make a pull request to main. This should trigger the workflow to execute.

 

1. To check it, browse to the repository -> Actions -> and identify your workflow

mediumvv2px400.png.5e48e3a8f00e7e39db2794f35983bb10.png

 

2. You can further drill down into each run, see the jobs composing it and their statuses and duration, as well as the Artifact created by the run. In our scenario, this is the ARM template created in the build job.

mediumvv2px400.png.e2e7501a1ab4ffa91a0542dfe3dcc4a2.png

 

3. You can further drill down by navigating to a job and its steps.

mediumvv2px400.png.dc74f095413e1c6a46200f76940223f1.png

 

 

 

Stay tuned for more tutorials.

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...