Posted September 30, 20222 yr The Database Templates feature in Azure Synapse Analytics helps organizations create data models based off existing industry focused templates. After creation of the Lake Database (using a chosen database template), the Map Data tool can be used to map attributes between source and destination. This will create data flows and pipelines in order to complete the data ingestion process. When there is a need to ingest data from multiple sources into a single destination entity, these sources will need to be joined while performing the mapping. The "Additional Source" button in the Map Data tool helps achieve these joins. The documentation does not provide more details or examples on how to use the UI to create these joins which will be covered in this blog. Consider the below sources and destination: Source CSV files: Sales.csv: (highlighted column will be used for ingestion) SalesAmount decimal TotalProductCost decimal ProductStandardCost decimal UnitPriceDiscountPct decimal ExtendedAmount decimal UnitPrice decimal OrderQuantity integer SalesTerritoryKey integer ShipDateKey integer DueDateKey integer OrderDateKey integer ProductKey integer CustomerKey integer ResellerKey integer SalesOrderLineKey integer SalesOrder.csv: (highlighted column will be used for ingestion). SalesOrderLine string SalesOrder string SalesOrderLineKey integer Channel string Destination Entity: The TransactionLineItem table in the Lake Database has the following structure: These above two csv files should be joined using a common column (SalesOrderLineKey) to ingest the data into TransactinonLineItem entity in the Lake Database. Mapping Process: Start with adding both the source csv files in the Map Data tool. The UI after adding the source files should resemble the below: Click Continue. In the next dialog, provide a name to the mapping and click Continue. In the following dialog, on the left navigation, under Target Tables, select the correct destination table. In this case – TransactionLineItem and select the Primary Source Table as Sales. Next, click Additional Sources, and which will open a dialog similar to the below: Notice that the left source defaults the primary source table. Select SalesOrderLineKey as the left source column. Choose “==” as the join from the dropdown In the right source, select SalesOrder and select the same column SalesOrderLineKey as the Right Source Column. The UI should resemble the below: Repeat the steps above to add more sources if there are more tables to join. Click OK to come back to the MapData tool. Using the New Mapping, setup the mapping to map columns from the source to the destination. Once completed, the mapping along with preview of data should resemble the below: Click the Create Pipeline to create the pipeline. The resulting data flow will resemble the below: At this stage, if everything went well, publish and run the pipeline which should execute successfully and ingest the data into the lake database. Finally, running a select against the table like the below should return the results successfully The CSV files used in the above example: Data: adfdataflowdocs/AdventureWorks Data.zip at master · kromerm/adfdataflowdocs (github.com) Schema: adfdataflowdocs/AdventureWorksSchemas.xlsx at master · kromerm/adfdataflowdocs (github.com) Continue reading...
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.