Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse to accelerate their 3D workflows and create virtual worlds. 3D artist Rafi Nizam has worn many hats since starting his career as a web designer more than two decades ago, back when Read article >
Many organizations are using Clara Parabricks for fast human genome and exome analysis for large population projects, critically-ill patients, clinical…
Many organizations are using Clara Parabricks for fast human genome and exome analysis for large population projects, critically-ill patients, clinical workflows, and cancer genomics projects. Their work aims to accurately and quickly identify disease-causing variants, keeping pace with accelerated next-generation sequencing as well as accelerated genomic analyses.
Most recently, two peer-reviewed scientific publications in August and September highlight the speed, accuracy, and cost savings of Clara Parabricks for de novo and pathogen workflows.
Genome variant identification to track Malaria transmission
Lead Purdue University researcher Dr. Giovanna Carpi and her team sought to understand the performance of Clara Parabricks relative to existing methods used by the Malaria community for variant identification to track malaria transmission and monitor antimalarial drug resistance using 1,000 malaria genomes.
Dr. Carpi, who has been researching pathogen genomics for many years, demonstrated a 27x increase in analysis speed and a 5x decrease in cost compared to the CPU conventional pipeline, while delivering 99.9% accuracy. The malaria genome is relatively large (24 MB) and AT-rich, which makes it quite challenging to analyze. Dr. Carpi used publicly available data from the MalariaGEN consortium, which were raw reads from Illumina. The research is presented in A GPU-Accelerated Compute Framework for Pathogen Genomic Variant Identification to Aid Genomic Epidemiology of Infectious Disease: A Malaria Case Study, published in Briefings in Bioinformatics.
The ability to sequence and analyze whole-genome pathogens quickly helps public health officials understand the spread of a disease, drug resistance, and also new variants’ transmissibility and severity. The World Health Organization (WHO) reported 241 million cases of malaria in 2020 compared to 227 million cases in 2019, and an estimated 627,000 deaths in 2020—an increase of 69,000 deaths over the previous year.
Malaria is caused by Plasmodium parasites that are transmitted to people through the bites of infected female Anopheles mosquitoes. Africa carries a disproportionately high share of the global malaria burden, with children under five years of age accounting for 80% of the total deaths in the region.
Dr. Carpi noted, “The ability to generate analysis-ready variant outputs in less than five minutes with greater than 99.9% accuracy for large-scale whole-genome Plasmodium studies at lower costs, remarkably reduces the computational bottleneck that most malaria genomics programs currently face, and facilitates decentralized bioinformatics analyses in endemic countries.” Visit malaria-parabricks-pipeline on GitHub to download this Clara Parabricks workflow for malaria and to learn more.
Discovering de novo variants in autism patients
Separately, Dr. Tychele Turner and her team from Washington University in St. Louis developed a fast genomics workflow for discovering de novo variants (DNVs) in autism patients using GPU-accelerated Clara Parabricks. Dr. Turner is a geneticist/genomicist with a deep interest in understanding the genetic architecture of human disease. Her lab is focused on the genomics of neurodevelopmental disorders, optimization of genomic workflows, and application of novel genomic technologies to understand disease. The research is presented in De Novo Variant Calling Identifies Cancer Mutation Signatures in the 1000 Genomes Project, published in Human Mutation.
Dr.Turner worked closely with the NVIDIA genomic team to integrate her trio analysis into NVIDIA Clara Parabricks. Dr. Turner was astonished to see a 100x speedup in turnaround time for a trio analysis using NVIDIA Clara Parabricks. The initial analysis to generate DNVs on GPUs took 8.5 hours using a server with just 4 GPUs versus 800 hours on CPUs. When the team further parallelized the workflow on GPUs, the run time was further shortened to less than one hour.
Dr. Turner has focused most of her career on DNVs, which are novel variants present in children’s DNA but not present in their parent’s DNA. These DNVs can be assessed by sequencing the DNA from a child and both parents followed by a comparative analysis, called a trio analysis. In the general population, each individual has around 40 to 100 DNVs and most DNVs do not affect the genes.
However, a genetic disease often results when a Single Nucleotide Variant (SNV) in a base pair (A,T, C, G), small insertion/deletion (indel), or Structural Variant (SV), alters a gene and affects the resulting protein production or function. This is the case with some neurodevelopmental disorders, where enrichment of protein-coding DNVs in patients has been identified in phenotypes including autism, epilepsy, intellectual disability, and congenital heart defects.
These fast results held promise not only for scientific discovery but also for Dr. Turner’s vision of same-day clinical results. To confirm the accuracy of the de novo variant calls from the new GPU-based workflow, the team leveraged NVIDIA Clara Parabricks to study a family with monozygotic twins, also known as identical twins, who have the same DNA.
The results showed the same number of DNVs in both the GPU-based and the previous CPU-based workflows, and in both cases about 20% CpG sites were found, indicating that the NVIDIA Clara Parabricks workflow produced equivalent results, but 100x faster. This meant that their autism genomic research could be completed faster, variants could be discovered faster, and hopefully insights for patients can be understood faster.
Dr. Turner remarked, “Utilization of GPUs is enabling rapid bioinformatic analyses to move forward to a one-hour genomic workup.”
With the new GPU-based DNV genomic analysis workflow, the team proceeded to study sequence data from the 1000 Genomes Project, an international research consortium that has sequenced representative cohorts from African, East Asian, South Asian, and European populations. The 1000 Genomes Project aims to describe and characterize the variations found in human genomes as a basis for investigating the relationship between genetic polymorphisms and phenotypes by sequencing 2,600 individuals from 26 populations from around the world.
Recently, The New York Genome Center sequenced these individuals at high depth and made the data publicly available. The population included 602 trios of families with no autism. This was the first opportunity to look at DNVs with no known phenotypes as a control to understand the level of DNVs in population and compare those to the autism cohort.
The DNV analysis of the 1000 Genomes Project individuals ended up surprising Dr. Turner’s team. They saw a bimodal distribution of the number of DNVs with peaks at 200, a little larger than expected, and at 2000, much larger than expected. Dr. Turner looked at the various cohorts in the 1000 Genomes Project data and noticed that the CEU population, which is a cohort of European individuals, has been studied for a longer time and therefore has been also cultured more, potentially leading to more cell line artifacts.
One individual, identified as NA12878 in the cohort, was sequenced multiple times: in 2012, 2013, twice in 2018, and in 2020. Dr. Turner showed that the DNVs had increased over time. 2020 had the most DNVs, supporting the conclusion that more cell line artifacts were in the 2020 samples versus the 2012 sample. The team concluded that although the 1000 Genomes Project is an excellent source of data for genomic study, it may not be ideal for filtering datasets for patient controls, due to the prevalence of cell line artifacts.
Though the 1000 Genomes Project provides critical biological and practical insights, only 20% of the children have the expected number of DNVs and considerable evidence indicates that excessive DNVs are cell line artifacts. The excess DNVs match mutation signatures of B-cell lymphoma cancers, demonstrating that cell line artifacts are not accumulating in a random manner.
Protein-coding DNVs are identified in DNA repair genes and may contribute to excess DNVs. The cohort of 602 individuals is significant for protein-coding DNVs in IGLL5 that is known to have excess mutations in B-cell lymphomas and individuals with these DNVs all have greater than 100 DNVs. Protein-coding DNVs are identified in clinically relevant variant sites warranting caution in using this data as a binary filtering set for patients. Future genomic studies performing genome sequencing should focus on either family-based approaches or utilized DNA derived directly from blood for building good controls and reference data bases.
Dr. Turner commented, “My lab was excited to develop a de novo variant calling workflow that utilizes GPUs which enabled us to quickly analyze nearly 4,800 whole-genome sequenced parent-child trios to gain important biological insights.”
An accelerated suite of tools to power genomic research
Clara Parabricks v4.0 is a more focused genomic analysis toolset than previous versions, with rapid alignment, gold standard processing, and high accuracy variant calling. It offers the flexibility to freely and seamlessly intertwine GPU and CPU tasks and prioritize the GPU-acceleration of the most popular and bottlenecked tools in the genomics workflow. Clara Parabricks can also integrate cutting-edge deep learning approaches in genomics.
You can register to download Clara Parabricks for free. You can also request a free Clara Parabricks NVIDIA LaunchPad Lab demo to experience accelerated industry-standard tools for germline and somatic analysis for an exome and whole genome dataset.
For more information about Clara Parabricks, including technical details on the tools available, check out the Clara Parabricks documentation.
Skycatch, a San Francisco-based startup, has been helping companies mine both data and minerals for nearly a decade. The software-maker is now digging into the creation of digital twins, with an initial focus on the mining and construction industry, using the NVIDIA Omniverse platform for connecting and building custom 3D pipelines. SkyVerse, which is a Read article >
The numbers are in, and they paint a picture of data centers going a deeper shade of green, thanks to energy-efficient networks accelerated with data processing units (DPUs). A suite of tests run with help from Ericsson, RedHat and VMware show power reductions up to 24% on servers using NVIDIA BlueField-2 DPUs. In one case, Read article >
It’s a brand new month, which means this GFN Thursday is all about the new games streaming from the cloud. In November, 26 titles will join the GeForce NOW library. Kick off with 11 additions this week, like Total War: THREE KINGDOMS and new content updates for Genshin Impact and Apex Legends. Plus, leading 5G Read article >
CFO Commentary to Be Provided in Writing Ahead of CallSANTA CLARA, Calif., Nov. 02, 2022 (GLOBE NEWSWIRE) — NVIDIA will host a conference call on Wednesday, November 16, at 2 p.m. PT (5 p.m. …
Today, NVIDIA announced the long-term support (LTS) release of NVIDIA DOCA 1.5. NVIDIA DOCA is the open cloud SDK and acceleration framework for NVIDIA…
Today, NVIDIA announced the long-term support (LTS) release of NVIDIA DOCA 1.5.
NVIDIA DOCA is the open cloud SDK and acceleration framework for NVIDIA BlueField DPUs. It unlocks data center innovation by enabling you to rapidly create applications and services for BlueField DPUs by using industry-standard APIs.
The new NVIDIA DOCA 1.5 release includes several important platform updates, making this an LTS release due to the stability and robustness of the code base. In addition, NVIDIA DOCA now supports NVIDIA ConnectX SmartNICs to simplify the transition from NIC/SmartNIC to the NVIDIA BlueField DPU.
New NVIDIA DOCA 1.5 features focus on adding advanced programmability, security, and functionality to support new storage use cases.
NVIDIA DOCA 1.5 software
Highlights of the NVIDIA DOCA 1.5 release include the following:
Platform updates
Long-term support (LTS) version for BlueFIeld-2
Support for ConnectX SmartNICs (ConnectX-6/7 family)
Advanced programmability
NVIDIA DOCA FLOW, which is a superset of functionality compared to DPDK
New storage use cases:
SHA2 library for hash operations and crypto acceleration
Compression and decompression library
The NVIDIA commitment to forward and backward compatibility ensures that applications developed with NVIDIA DOCA will run seamlessly on future versions of the BlueField DPU. They can take advantage of future hardware upgrades for higher performance and increased scale.
The adoption of NVIDIA DOCA has been driven by the delivery of substantial performance gains and through the option of a dual development approach through either NVIDIA DOCA drivers or libraries.
NVIDIA DOCA drivers provide customization for experienced developers.
NVIDIA DOCA libraries give the best per-use case performance and scale, for those looking for lower coding complexity.
NVIDIA DOCA services and performance enhancements
This release adds live migration support for VirtIO-blk and support for transitional virtio-blk emulation devices, with the ability to support a mix of both Virtio0.95 and Virtio1.0 devices simultaneously.
Platform security and cryptography acceleration
Other additions include AppShield for ransomware inspection and a regular expression (regex) library with reference applications to enable the security matching of repeated code and text patterns.
A TPM firmware trusted application is designed to support the deployment of sensitive applications on the Arm TrustZone. This adds an additional level of security that enables the use of hardware keys to authenticate and encrypt data on the DPU Arm cores.
Telemetry aggregator and logging
NVIDIA DOCA now exposes collected telemetry data for logging and metrics. BlueField can be used to sample data on demand and log metrics for later querying by implementing Prometheus, a free software application for event monitoring.
NVIDIA DOCA FireFly: A synchronized data center
Precision timing is in the heart of the data center. NVIDIA DOCA FireFly is a timing service for the data center that supports all timing needs in one place. With Nanosecond-level clock synchronization, we can enable a new spectrum of timing and delay-critical applications.
Improving the accuracy of data center timing represents an order of magnitude improvement as accuracy changes from milliseconds to nanoseconds with FireFly.
With a synchronized data center, you can accelerate globally synchronized data centers, AI, high-performance computing, professional media production, telco virtual network functions, and precise monitoring. All the servers in the data center can be harmonized to provide something that is bigger than a compute node.
Storage acceleration
Storage data compression/decompression is a CPU-intensive operation. The NVIDIA DOCA Compression library implements storage data compression and decompression onto the BlueField DPU. This offloads storage operations from the CPU to free up cycles for other compute functions and lowering server TCO.
ConnectX SmartNICs
With added support for NVIDIA ConnectX SmartNICs, the open NVIDIA DOCA software unlocks the benefits of the most comprehensive software APIs, libraries, and services. Developers and IT leaders can foster data center innovation on the most widely deployed, high-performance SmartNICs.
This introduces a broad range of networking, storage, and security capabilities and enhancements to deliver breakthrough performance for software partners, server and storage vendors, end users, and global system integrators. NVIDIA DOCA support for ConnectX SmartNICs helps to speed and simplify the transition from ConnectX SmartNICs to the BlueField DPU.
Open data center innovation
NVIDIA DOCA is built on open APIs such as DPDK for networking, OFED for RDMA, and SPDK for storage. It’s fully compatible with all major OS and hypervisors. Applications written with NVIDIA DOCA run on BlueField-2 and future versions of the BlueField DPU.
DOCA Hackathon in China
The recent hackathon in China focused on BlueField DPU innovations that use the NVIDIA DOCA software framework to streamline the development process. Participants continue to find new ways to use the DPU for offloading, accelerating, and isolating a broad range of services. There were 13 teams competing over 24 hours, with four winners announced:
First place: SDIC, Research on RDMA Virtualization based on the BlueField DPU
Second place:Zhindex-Numa, Distributed intelligent key-value storage engine
Third place:Network Needs, Key-value storage acceleration based on DPU cache
Congratulations to the winners and thanks to all the teams that participated in making this NVIDIA DOCA Hackathon such a success!
For more information about how leading companies are using NVIDIA software-defined, hardware-accelerated data center solutions to change the world, see the Modernize Your Data Center with Accelerated Networking free ebook.
Spearheading research in very high-speed silicon nanophotonics/plasmonics, the European plaCMOS project has reached a successful conclusion. The 51-month…
Spearheading research in very high-speed silicon nanophotonics/plasmonics, the European plaCMOS project has reached a successful conclusion. The 51-month project explored ferroelectric materials to improve performance and reliability. The team achieved world-leading advancements related to key components used in optical links: modulators, photodiodes, and optical switches.
Modulators using barium-titanate integrated on silicon were demonstrated, and monolithically integrated modulators with BiCMOS drivers were tested up to 187 GBaud. Germanium photodiode designs were generated achieving 3 dB bandwidths up to 265 GHz. Ferroelectric nonvolatile optical BTO switches were demonstrated with 100 states in a closed loop control scheme.
These groundbreaking results have been published in the article, A Ferroelectric Multilevel Nonvolatile Photonic Phase Shifter in the journal Nature Photonics. High-profile articles have also appeared in Nature Electronics, Nature Materials, and IEEE/OSA journals.
plaCMOS project overview
The consortium brought together eight partners from industry and academia, all renowned experts in their fields: NVIDIA Mellanox, MICRAM Microelectronic GmbH (now Keysight Technologies), ETH Zurich, IHP Leibniz-institut für innovative mikroelektronik, Aristotle University of Thessaloniki, IBM Research GmbH, Universität des Saarlandes, and Lumiphase AG.
Funding was provided by the European Commission’s Horizon 2020 program for research and innovation and the project was coordinated by Elad Mentovich, Head of the Advanced Development Group at NVIDIA Mellanox.
The innovative technologies developed in plaCMOS provide the foundation for the evolution of optical interconnects in data center networks for the second half of the decade. The team has furthered numerous research fields, including materials engineering and nanofabrication, plasmonic-photonic devices, high-speed analog electronics, and transceiver design.
Related projects
Research on the leading-edge technologies established in plaCMOS continues in the spin-off projects, NEBULA and plasmoniAC. These new projects aim to extend the plaCMOS material platform and investigate new applications of the technology in co-packaged optics, inter-data center coherent links, and optical neuromorphic computing.
Additional resources
For more information, see the articles listed below.