Jump to content

Featured Replies

Posted

Introduction

Azure Data Factory is good for data transformation, in this blog we will discuss how to convert CSV file into Json and explain about the aggregate activity.

 

 

Main Idea

In ADF, a JSON is a complex data type, we want to build an array that consists of a JSONs.

The idea is to create a DataFlow and add a key "Children" to the data, aggregate JSONs to build an array of JSONs using the aggregate activity.

We will use a dummy value (constant 1) and by this dummy value we will do the grouping to build the array.

 

 

Pre-requisites

 

 

we will require:

 

  • A basic knowledge on ADF including how to create a new pipeline and add activities/ dataflows to a pipeline etc.
  • Knowing How to save data to blob storage

 

 

Prepare your data:

Input CSV file:

 

 

mediumvv2px400.png.f5febd7034b8f863fe2d2e5997abf6c5.png

Expected Output:

 

{children: [

{"key1":"a1", "key2":"b1", "key3":"c1", "key4":"d1"},

{"key1":"a2", "key2":"b2", "key3":"c2", "key4":"d2"},

...

]}

 

services

 

we will need:

 

 

 

ADF DataFlow:

 

997x157vv2.png.7f4f4a0421b6fec5ee395a888a8d57d9.png

 

 

 

The settings for the activities in the dataflow:

 

  • Source:
    Blob storage account, Load the CSV data and select first row as a header.
  • Map Drifted Columns:
    That will give us the ability to perform actions on the columns.
  • Derived Columns:
    Here, we are adding the dummy column with a constant value of 1, and a children column that will hold the array of JSONs later on.
    To build the Children Column, under Expressions -> Expression Builder -> click on children -> add 4 sub columns named key1, key2, key3, key4.
    mediumvv2px400.png.92b7098d80d4ab7c09c2eb5a410341da.png
    Click on each key and pass the column as an input (expression) to this key (see the below snip)
     
    mediumvv2px400.png.37e41fee7e688e0a027720120430a224.png
    Click on save and finish.
     
     
     
  • Aggregate By Dummy:
    In this activity we will group the data by the dummy column that we added and collect all values under children, that will help us to build the array of JSONs instead of JSON of JSONs.
    Click on the activity -> group by dummy -> aggregates -> children -> collect(children)
    mediumvv2px400.png.3ba76fafe829cfa1e684e67e743f6cb0.png
  • Drop Dummy Column:
    Select only children array.
  • Sink:
    Blob storage account, we will write to sink.
     
    Output:
     
    540x448vv2.png.fe121336a2ebb99c89aa85727cb278a9.png
     
     

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...