Submit Apache Spark and Apache Flink jobs with Azure Logic Apps on HDInsight on AKS

  • Thread starter Thread starter asethia
  • Start date Start date
A

asethia

Author(s): Arun Sethia is a Program manager in Azure HDInsight Customer Success Engineering (CSE) team.

Co-Author: Sairam is a Product manager for Azure HDInsight on AKS.




Introduction


Azure Logic Apps allows you to create and run automated workflows with little to no code. These workflows can be stateful or stateless. Each workflow starts with a single trigger, after which you must add one or more actions. An Action specifies a task to perform. Trigger specifies the condition for running any further steps in that workflow, for example when a blob is added or updated, when http request is received, checks for new data in an SQL database table, etc. These workflows can be stateful or stateless, based on your Azure Logic App plan (Standard and Consumption).



Using workflows, you can orchestrate complex workflow with multiple processing steps, triggers, and interdependencies. These steps can involve certain Apache Spark and Apache Flink jobs, and integration with Azure services.

The blog is focused on how you can add an action to trigger Apache Spark or Apache Flink job on HDInsight on AKS from a workflow.



Azure Logic App - Orchestrate Apache Spark Job on HDInsight on AKS


In our previous blog, we discussed about different options to submit Apache Spark jobs to HDInsight on AKS cluster. The Azure Logic Apps workflow will make use of Livy Batch Job API to submit Apache Spark job.

The following diagram shows interaction between Azure Logic Apps, Apache Spark cluster on HDInsight on AKS, Azure Active Directory and Azure Key Vault. You can always use the other cluster shapes like Apache Flink or Trino for the same, with the Azure management endpoints.

large?v=v2&px=999.png

HDInsight on AKS allows you to access Apache Spark Livy REST APIs using OAuth token. It would require a Microsoft Entra service principal and  Grant access to the cluster for the same service principal to the HDInsight on AKS cluster (RBAC support is coming soon). The client id (appId) and secret (password) of this principal can be stored in Azure Key Vault (you can use various design pattern’s to rotate secrets).



Based on your business scenario, you can start (trigger) your workflow; in this example we are using “Http request is received.” The workflow connects to Key Vault using System managed (or you can use User Managed identities) to retrieve secrets and client id for a service principal created to access HDInsight on AKS cluster. The workflow retrieves OAuth token using client credential (secret, client id, and scope as https://hilo.azurehdinsight.net/.default).



The invocation to the Apache Spark Livy REST APIs on HDInsight is done with Bearer token and Livy Batch (POST /batches) payload.

The final workflow is as follows, the source code and sample payload are available on this GitHub

large?v=v2&px=999.png

Azure Logic App - Orchestrate Apache Flink Job on HDInsight on AKS


HDInsight on AKS provides user friendly ARM Rest APIs to submit and manage Flink jobs. Users can submit Apache Flink jobs from any Azure service using these Rest APIs. Using ARM REST API, you can orchestrate the data pipeline with Azure Data Factory Managed Airflow. Similarly, you can use Azure Logic Apps workflow to manage complex business workflow.



The following diagram shows interaction between Azure Logic Apps, Apache Flink cluster on HDInsight on AKS, Azure Active Directory and Azure Key Vault.

large?v=v2&px=999.png

To invoke ARM REST APIs, we would require a Microsoft Entra service principal and configure its access to specific Apache Flink cluster on HDInsight on AKS with Contributor role. (resource id can be retrieved from the portal, go to cluster page, click on JSON view, value for “id” is resource id).



az ad sp create-for-rbac -n <sp name> --role Contributor --scopes <Flink Cluster Resource ID>



The client id (appId) and secret (password) of this principal can be stored in Azure Key Vault (you can use various design pattern’s to rotate secrets).



The workflow connects to Key Vault using System managed (or you can use User Managed identities) to retrieve secrets and client id for a service principal created to access HDInsight on AKS cluster. The workflow retrieves OAuth token using client credential (secret, client id, and scope as https://management.azure.com/.default).



The final workflow is as follows, the source code and sample payload is available on GitHub

large?v=v2&px=999.png

Summary


HDInsight on AKS REST APIs lets you automate, orchestrate, schedule and allows you to monitor workflows with your choice of framework. Such automation reduces complexity, reduces development cycles and completes tasks with fewer errors.



You can choose what works best for your organization, let us know your feedback or any other integration from Azure services to automate and orchestrate your workload on HDInsight on AKS.

References


We are super excited to get you started:

  • Join our community, share an idea or share your success story - Sign Up | LinkedIn
  • Have a question on how to migrate or want to discuss a use case - Microsoft Forms

Continue reading...
 
Back
Top