Optimizing Log Throughput Consumption on Azure SQL: Effective Workarounds and Best Practices

Posted August 11, 20231 yr

Introduction

SQL DB, SQL Managed Instance (MI), and SQL Hyperscale in Azure have a hard log throughput limit. This limit defines the maximum rate at which transaction log records can be written to disk. When the log throughput reaches this limit, it can impact the overall performance of the database and cause delays in transaction processing.

To avoid reaching this cap, it is essential to actively monitor and optimize the workload and choose the most appropriate service tier. This guarantees the capacity to effectively handle the necessary log throughput specific to your application's needs.

But if you have done everything that is possible to reduce the log throughput and selected the best possible tier and still hit the log throughput limit, what other possibilities might be worth considering?

In this blog, we will be looking at unconventional options to reduce logging. These options may require you to change the data processing job, but the outcome should be the same.

Customer story

Let's delve into a real-life scenario that we faced that illustrates the log throttling issue. Imagine we need to extract data from a non-SQL database as our source, then apply required business logic, and subsequently transform and aggregate the data before loading it into the target database, which is a SQL DB Hyperscale in our case.

Data is first loaded into the staging layer which is truncated before every run, the second stage of the pipeline applies business logic on the extracted data and transforms it, and then loads it into a partitioned target table. This process runs once a day and loads data into an empty target partition.

Running this pipeline on its own was taking around 20 minutes of processing time, however, if triggered with other similar pipelines the processing time increased to 40 minutes.

Upon investigation, it was understood that the log throughput was hitting the limit of 105mb/s when the pipelines were triggered parallelly.

To resolve this problem, we looked at many solutions and narrowed them down to the following three solutions that gave the best results without many changes in the code.

Recreate the problem!

Let’s recreate this problem using a sample setup as outlined below.

The database setup comprises an Azure SQL DB provisioned with 2vCores on a General-Purpose tier.
Both the Stage table and Target tables are heap tables with 30 columns of different data types (the specific table structure isn't relevant here).
The dataset contains approximately 3 million rows.
The test involves inserting these 3 million rows initially into the staging table and then transferring them from the staging table to the target table.

In examining the graph showing log IO utilization, we observe two peaks. These peaks coincide with the periods when data insertion occurs, both into the staging table and subsequently into the target table. These spikes in activity highlight the moments of heightened log IO consumption during the data transfer operations.

Solutions

Now let’s look at the solutions.

Option 1 -Relocating Staging Table to Temp DB.

Read more about temp tables here

A temporary table does minimal logging and therefore is a perfect object for cases where data validity is short and there is no requirement for data recovery.

Looking at the below graph it is evident that the log IO consumption has dropped below 10% therefore making it available for other DML operations.

Option 2 - Use clustered ColumnStore index instead of heap tables.

Read more about CCI here

This index uses column-based data storage to achieve gains up to 10 times the data compression over the uncompressed data size. This reduces storage and IO requirements.

This method will only work effectively for batch/bulk load processes and not for OLTP tables where there are many smaller transactions.

The graph below shows log IO consumed during data loading from staging to target. It has used only 5% of log IO as compared to 100% for a heap table. It is worth noting that the IO consumption is based on the data distribution and compression and will vary on a case-to-case basis. In this test, the data had many duplicate columns and therefore it managed to achieve more significant compression.

Option 3 – Using partition switch.

This option is not suited for all cases because it requires meeting all the requirements mentioned on this page Transferring Data Efficiently by Using Partition Switching. But if all the prerequisites are met then this option is the best of all because it does not use any log IO and the data movement is instantaneous.

Therefore, if you have massive amounts of data that you want to move from one table to another, a partition switch is the fastest way to do it with very minimal resource usage. This process does not physically shift the data but only updates the pointer (metadata operation) to point to the right blocks where the data is stored.

Final thoughts

For the use case mentioned above, we used a combination of Temporary table and partition switching that transformed the 40 minutes of data loading to only 12 minutes while virtually eliminating Log IO consumption.

This change not only accelerated data ingestion but also unlocked the database's potential for expanded data pipelines without resource escalation, highlighting the significant impact of strategic resource utilization on performance and cost efficiency.

Though not tested for this scenario/use-case, the in-memory table can also be used for lower log IO consumption requirements.

Sign In

Optimizing Log Throughput Consumption on Azure SQL: Effective Workarounds and Best Practices

Featured Replies

Join the conversation

Account

Navigation

Search