Jump to content

HDInsight Environment Cloning


Recommended Posts

Guest sairamyeturi
Posted

Overview

 

 

 

Testing environments play a critical role in the development and deployment of any product or service. As a best practice, HDInsight customers are recommended to start the deployments in lower environments to assess the performance and functional capabilities before moving the workloads to production. All migrations, upgrades, and solutions being tested in a lower end environment ensure minimal to no issues in the production environment.

 

 

 

To maintain two different environments requires copying of the cluster configurations and underlying resources. HDInsight Team provides the following two solutions for safe cloning of the storage account and metadata associated with it.

 

 

 

Export and Import Scripts

 

 

 

The script will re-create the tables to the new cluster's default filesystem. Native tables will also copy their data in storage. Non-native tables will copy only by definition. The path of external tables not in the Hive warehouse directory will be preserved. Other tables will be copied to the target cluster's default Hive path. 

 

This approach is suitable if only tables copy is required, and the environment cannot be put under maintenance for some period. This approach always provides consistency between metadata and data without stopping data ingestion. This requires minimal interference.

 

This is not suitable if there are different external paths used for the tables.

 

 

 

Complete Guideline: Hive workload migration to new account in Azure Storage | Microsoft Learn

 

 

 

AzCopy and Hive Metatool

 

 

 

This approach copies both the data and metadata separately. To ensure consistency between the metadata and data, there should not be any data ingestion/data load or metadata changes during the migration.

 

This approach requires the following steps to be taken:

 

  1. Copy storage account data via Az-Copy. Guide: azcopy copy | Microsoft Learn
  2. Create a copy of the databases. (Hive, Oozie). Guide: Copy a database - Azure SQL Database | Microsoft Learn
  3. Update Locations in the hive metastore database via any of the following approaches.

    1. Use Hive Metatool to update the storage locations:

 

 

 

/usr/bin/hive --service metatool -updateLocation <new location> <old location> [-serdePropKey <serde-prop-key>] [-tablePropKey <table-prop-key>]

 

 

 

 

 

Example:

 

 

 

export HADOOP_HEAPSIZE=3g

 

/usr/bin/hive --service metatool -updateLocation 'abfs://container@newaccount.dfs.core.windows.net/hive'

'abfs://container@oldaccount.dfs.core.windows.net/hive' -serdePropKey 'path'

 

 

 

 

 

 

OR

 

2. Use the below SQL to update the locations in the database directly.

 

 

 

DECLARE @OldLocation AS VARCHAR(100)='<old_location>'

 

DECLARE @NewLocation AS VARCHAR(100)='<new_location>'

 

 

 

-- Update DBS table

 

update DBS

 

set DB_LOCATION_URI = replace(DB_LOCATION_URI, @OldLocation, @NewLocation);

 

 

 

-- Update SDS table

 

update SDS

 

set LOCATION = replace(LOCATION, @OldLocation, @NewLocation);

 

 

 

-- Update TABLE_PARAMS table

 

update TABLE_PARAMS

 

set PARAM_VALUE = replace(PARAM_VALUE, @OldLocation, @NewLocation)

 

where PARAM_KEY='<table_properties_key_to_update>';

 

 

 

-- Update SERDE_PARAMS table

 

update SERDE_PARAMS

 

set PARAM_VALUE = replace(PARAM_VALUE, @OldLocation, @NewLocation)

 

where PARAM_KEY='<serde_key_to_update>';

 

 

 

 

 

This approach ensures copying of all table locations and provides a perfect copy in case no data/metadata changes happen during the cloning process.

 

 

 

Note: The hive metatool takes complete locations (like abfs://container@newaccount.dfs.core.windows.net/, wasb://container@newaccount.blob.core.windows.net/) and not strings (like container1, container3). While with SQL any strings can be used for replacement purposes. Using complete paths/locations ensure low risk of replacing any unnecessary substring that damages the metadata)

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...