Jump to content

Recommended Posts

Guest Gabi Lehner
Posted

Azure CosmosDB data connection - detailed description

 

 

This breaking change is relevant to anyone using an Azure Cosmos DB data connection.

 

The Managed identity of your Azure Data Explorer cluster associated with the connection will now require an additional role of “Cosmos DB Account Reader Role” Azure built-in roles - Azure RBAC | Microsoft Learn (Control plane permission)

 

 

 

This role will allow Azure Data Explorer to map the Cosmos DB account resource ID (passed through the ARM provider) to its endpoint URL, in addition it will be used to validate the Cosmos DB database and container input parameters. During the public preview this mapping was inferred. Now, at General Availability (GA) of the feature, that mapping will be acquired by reading the Cosmos DB account properties.

 

 

Your active data connections will continue to work without making any changes, but without this role you wouldn’t be able to create new data connections or update existing ones.

 

 

 

When data connections are provisioned or updated through Azure Portal, this role assignment will be done by the Portal on behalf of the logged in user, if the logged in user has enough privileges over the Cosmos DB account to do so. If not, the role assignment will need to be done by a principal (user or service principal) that has sufficient privilege.

 

 

 

Cosmos DB Build-in Data Reader (Data Plane permissions) is still required for reading the data from the cosmos DB account.

 

 

 

Required change

 

 

Add “Cosmos DB Account Reader Role” to the managed identity of the Cosmos DB data connections.

 

 

Schedule & plan

 

 

 

 

 

 

Export to Azure storage in Parquet format - detailed description

 

 

This breaking change is relevant to anyone exporting data from Kusto to Azure Storage in parquet format, be it by using one-time export or continuous export.

 

The generated parquet files will start using new encodings that are not supported by Spark versions below 3.3.0, so if you're using Spark version < 3.3.0 to read Parquet files exported from Azure Data Explorer you'll be affected. The error message will include in many cases the following - “Unsupported encoding: DELTA_BYTE_ARRAY”.

 

The purpose of this change is to increase performance and security

 

Required change

 

 

Update the Spark version you’re using to read parquet files, exported from ADX cluster, to Spark version 3.3.0 or newer.

 

Schedule & plan

 

 

Update your spark version by July 31th.

 

The change will be deployed and applicable starting on August 1st.

 

 

 

Extent level commands- detailed description

 

 

Today it is possible to run certain extent-level commands without specifying the name of the table in which the source extents are in, and/or specifying a creation time range that scopes the lookup of the source extents to operate on.

 

Specifically, this refers to the following commands:

 

.alter[-merge] extent tags

 

.drop extent tags

 

.move extents

 

.replace extents

 

For example, the following commands are still working for you today:

 

.drop extent tags from table <tableName> <tagsSpecificationString>

 

.drop extent tags <| .show table T extents

 

.alter extent tags ('t1', 't2') <| .show table T extents

 

.move extents from table T1 to table T2 (extentId1, extentId2, ...)

 

.move extents to table T2 <| .show table T1 extents

 

.replace extents in table T1 <| {.show table T1 extents},{.show table T2 extents}

 

 

 

With the goal of improving efficiency of these operations, we’re planning to block this for all existing clusters that are not using the commands today.

 

The new behaviour will fail such commands with any of the following error messages:

 

Admin command cannot be executed due to an invalid argument; argument: TableName, reason: The name of the table must be specified

 

Admin command cannot be executed due to an invalid argument; argument: ExtentCreatedOnRange, reason: Both 'ExtentCreatedOnFrom' and 'ExtentCreatedOnTo' must be specified

 

The current behaviour will continue to be supported on clusters that currently use it.

 

We would like to ask you to modify tools you own and extent-level commands they run according to the specification below, so that we will be able to block this behaviour on clusters that are currently using this inefficient form of commands.

 

 

 

Required change

 

 

Change extent level commands according to specification in the table below, to make sure they include:

 

The name of the table that contains the source extents.

 

The shortest-possible creation time range that scopes the lookup of the source extents to operate on.

 

 

 


Command type



Existing syntax

New syntax

Purpose of changes

Drop extent tags

.drop [async] extent tags from table <tableName> <tagsSpecificationString>



.drop [async] extent tags <| <innerQuery>

.drop [async] table <tableName> extent tags <tagsSpecificationString> with(extentCreatedOnFrom='...', extentCreatedOnTo='...')



.drop [async] table <tableName> extent tags with(extentCreatedOnFrom='...', extentCreatedOnTo='...')

<| <innerQuery>

- scoping the command to a specific table

- scoping the command to specific narrowed time range

Alter(-merge) extent tags

.alter[-merge] (async) extent tags <tagsSpecificationString> <| <innerQuery>



.alter[-merge] [async] table <tableName> extent tags <tagsSpecificationString> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| <innerQuery>



- scoping the command to a specific table

- scoping the command to specific narrowed time range

Move extents

.move [async] extents from table <tableName> to table <tableName> <extentIdsSpecification>

.move [async] extents to table <tableName> <| <innerQuery>

.move [async] extents from table <tableName> to table <tableName> <extentIdsSpecification> with(extentCreatedOnFrom='...', extentCreatedOnTo='...')

.move [async] extents to table <tableName> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| <innerQuery>

- scoping the command to specific narrowed time range

Replace extents

.replace [async] extents in table <tableName> <| {query for extents to be dropped from table},{query for extents to be moved to table}

.replace [async] extents in table <tableName> with(extentCreatedOnFrom='...', extentCreatedOnTo='...') <| {query for extents to be dropped from table},{query for extents to be moved to table}

- scoping the command to specific narrowed time range

Schedule & plan

 

 

The existing experience is blocked on all new clusters and all existing clusters that do not currently use it.

 

For all clusters that are using the existing, to be deprecated pattern, we will block it by July 31th.

 

The change will be deployed and applicable starting on August 1st.

 

 

 

 

 

For more details or help please contact us,

 

Kusto team

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...