Jump to content

Benchmarking the NVIDIA Clara Parabricks for Secondary Genomics Analysis on Microsoft Azure


Recommended Posts

Guest Erdal Cosgun
Posted

This blog co-authored by Alexander Spiridonov- Senior Solutions Architect at NVIDIA

 

 

 

Introduction

 

 

 

Next-generation sequencing (NGS) is a massively parallel sequencing technology that offers ultra-high throughput, scalability, and speed. The technology is used to determine the order of nucleotides in entire genomes or targeted regions of DNA or RNA. NGS has revolutionized the biological sciences, allowing labs to perform a wide variety of applications and study biological systems at a level never before possible.

 

 

 

NGS makes large-scale whole-genome sequencing (WGS) accessible and practical for the average researcher. It enables scientists to analyze the entire human genome in a single sequencing experiment, and scale to tens of thousands of genomes per year.

 

 

 

As the throughput of sequencing instruments increases and the cost per sample decreases, data volumes are increasing exponentially. As a result, data storage, management, and analysis are becoming a major bottleneck in the overall workflow and increasing compute costs. As state-of-the-art methods allow users to extract more information from their data and the analytical pipelines become more computationally intensive, this bottleneck is growing worse.

 

 

 

NVIDIA Clara Parabricks addresses these computational challenges. Clara Parabricks is the only GPU accelerated and optimized secondary analysis software that includes industry standard tools, plus deep learning-based tools such as DeepVariant for variant calling. As the throughput of genomics sequencers increases, driving the cost of sequencing down, the bottleneck now lies in the computational analysis of the sequence samples.

 

 

 

NVIDIA Clara Parabricks

 

 

 

NVIDIA introduced the Clara Parabricks software suite for performing analysis of NGS DNA and RNA data. It delivers results at blazing fast speeds and low cost. Clara Parabricks can analyze 30x WGS data in under 25 minutes on a single 8-GPU server, instead of 30 hours for traditional CPU-based methods. Its output matches commonly used software, making it simple to verify the accuracy of the results.

 

 

 

Clara Parabricks software provides at least an order of magnitude acceleration in compute time while generating identical outputs and reducing analysis costs. Clara Parabricks is available free on NVIDIA GPU Cloud (NGC) and can be easily deployed on Azure GPU based virtual machines (VM).

 

 

 

Clara Parabricks provides optimal performance for multiple Azure instance types and can be used out of the box for essential bioinformatics needs. Currently, the Clara Parabricks accelerated analysis tools start from FASTQ files and perform alignment through variant calling and expression analysis, including QC tools for both types of outputs. The suite of tools can be used to support end-to-end workflows for germline, somatic and RNA-Seq pipelines, providing the flexibility to meet the individual needs of most projects. The tools can also be used individually, as drop-in replacements for steps in existing workflows.

 

 

 

Figure-1 below shows most of the accelerated tools within the Clara Parabricks software package. Due to the acceleration of the pipelines, users can implement multiple variant callers to extract the most information from their data, and still generate the results in less time and at lower cost than using standard baseline software solutions. A standard 30x WGS sample can be processed in 25 min using ND96asr v4 Azure VM.

 

 

 

largevv2px999.png.e60ffc9a345d2126ddfdcc70b491b488.png

 

Figure 1: The NVIDIA Clara Parabricks 4.0 Toolset-Ref.

 

 

 

Running Parabricks 4.0 on Microsoft Azure

 

 

 

The prerequisites for running Parabricks 4.0 on Microsoft Azure are:

 

  • Microsoft Azure subscription with Compute-VM (cores-vCPUs) quota allowing to create GPU based VMs (preferably NCas_T4_v3 and ND96asr_A100_v4)
  • An NVIDIA driver greater than version 465.32.*
  • Any Linux Operating System that supports nvidia-docker2 Docker version 20.10 (or higher)

 

Note: Clara Parabricks requires at least two GPUs per sample to run efficiently.

 

 

 

To create a Microsoft Azure VM for Clara Parabricks you can use Azure CLI or the Azure portal

 

 

 

The fastest way to run the application is to use a predefined Ubuntu Data Science Virtual Machine image instead of standard Ubuntu. In this case you do not need to install the required NVIDIA driver. Otherwise, you will need to install the relevant driver. We are recommending using SSH public key authentication as a fast, simple, and secure way to connect to your VM.

 

Once the VM is created you need to connect to it using ssh:

 

 

 

 

 

$ ssh -I private-key.pem user-id@vm-public-DNS

 

 

 

 

 

If the NVIDIA driver is already installed, check your NVIDIA hardware and driver version using the nvidia-smi command:

 

 

 

largevv2px999.png.5025fa013c91773bbe1271b28ab0cb2c.png

 

 

 

To make sure you have nvidia-docker2 installed, run this command:

 

 

 

 

 

$ docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

 

 

 

 

 

When it finishes downloading the container, it will run the nvidia-smi command and show you the same output as above. The Clara Parabricks Docker image can be obtained from NGC by running the following command (please check https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/clara-parabricks for the latest version):

 

 

 

 

 

$ docker pull nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1

 

 

 

 

 

At this point the software is ready to use. For example, to run the Clara Parabricks fq2bam tool using the Docker container, use the following command:

 

 

 

 

 

$ docker run \

--gpus all \

--rm \

--volume /host/data:/input_data \

--volume /host/results:/outputdir \

--workdir /image/input_data \

nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \

pbrun fq2bam \

--ref /input_data/Homo_sapiens_assembly38.fasta \

--in-fq /input_data/fastq1.gz /input_data/fastq2.gz \

--out-bam /image/outputdir/fq2bam_output.bam

 

 

 

 

 

 

 

Parabricks 4.0 benchmarking

 

 

 

Let’s discuss the details of the benchmarking on Microsoft Azure:

 

 

 

Step 1: Download data

 

The original source of the data can be found from this link. For the tests below, Microsoft Azure blob storage was used. In order to run benchmarking against WGS, you need to extend the VM hard drive to have at least 1TB of local space with the following steps. Subsequently, start your VM and download the 30x WGS dataset:

 

 

 

Sample fastq paired-end data:

 

 

Here are the commands to download the reference file:

 

 

 

 

 

 

 

$ wget -O parabricks_sample.tar.gz https://storeshare.blob.core.windows.net/publicdata/parabricks_sample.tar.gz

$ tar xzvf parabricks_sample.tar.gz

 

 

 

 

 

 

 

 

Step 2 - Run Parabricks Benchmark

 

To perform a simple benchmarking, run the Parabricks docker container like before and estimate runtime using the time command:

 

 

 

 

 

 

 

$ sudo time -v docker run --gpus all -v /data:/parabricks nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 pbrun germline \

--ref /parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.fasta \

--in-fq /parabricks/HG002-NA24385-pFDA_S2_L002_R1_001-30x.fastq.gz /parabricks/HG002-NA24385-pFDA_S2_L002_R2_001-30x.fastq.gz \

--knownSites /parabricks/parabricks_sample/Ref/Homo_sapiens_assembly38.known_indels.vcf.gz --out-bam /parabricks/output.bam \

--out-variants /parabricks/output.vcf \

--out-recal-file /parabricks/report.txt \

--run-partition --no-alt-contigs |& tee germline_30x_4gpu.txt.

 

 

 

 

 

 

 

Benchmark analysis

 

 

 

As described previously, use the standard 30x HG002 dataset downloaded from public sources.

 

 

 

The below results were achieved using the Parabricks 4.0 software across the following Microsoft Azure VM sizes: NCas_T4_v3 and ND96asr_A100_v4.

 

 

 

Sample Jupyter notebook can be found from Microsoft Genomics Notebook GitHub project.

 

 

 

Results

 

 

 

Germline Pipeline execution time on NCas_T4_v3 with 4 NVIDIA T4 GPUs is 69.3 minutes with Pay as You Go cost of $5.02 or $2.19 for Spot Instances.

 

 

 

Execution time on ND96asr_A100_v4 with 8 A100 GPUs is 25 minutes with Pay as You Go cost of $9.18 and $3.17 for Spot Instances.

 

All prices are with Microsoft Azure Hybrid Benefits.

 

 

 

VM costs calculated in 1/23/2023. For up-to-date VM prices please visit pricing calculator.

 

 

 

References:

 

 

 

[1] McCombie WR, McPherson JD, Mardis ER. Next-Generation Sequencing Technologies. Cold Spring Harb Perspect Med. 2019 Nov 1;9(11):a036798. doi: 10.1101/cshperspect.a036798. PMID: 30478097; PMCID: PMC6824406.

 

[2] GA4GH publishes review of national genomic data initiative: GA4GH publishes review of national genomic data initiatives

 

[3] Precision FDA Truth Challenge: PrecisionFDA Truth Challenge – precisionFDA

 

[4] PUBLICATIONS — Clara Parabricks Pipelines 3.6 documentation

 

Continue reading...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...