![]() |
submitted by /u/lizziepika [visit reddit] [comments] |

![]() |
submitted by /u/lizziepika [visit reddit] [comments] |
Meet The Spaghetti Detective, an AI-based failure-detection tool for 3D printer remote management and monitoring.
3D printing can be a quick and convenient way to prototype ideas and build useful everyday objects. But it can also be messy—and stressful—when a print job encounters an error that leaves your masterpiece buried in piles of plastic filament. Those tangles of twisted goop are known as “spaghetti monsters,” and they have the power to kill your project and raise your blood pressure.
Thankfully, there is a way to tame these monsters. Meet The Spaghetti Detective (TSD), an AI-based (deep learning) failure-detection tool for 3D printer remote management and monitoring. In other words, with TSD you can detect spaghetti monsters before they get out of hand. It issues an early warning that could save days of work and pounds and pounds of filament.
In fact, according to the team behind TSD, the tool has caught more than 560,000 failed prints by watching more than 47 million hours of 3D project printing time, saving more than 27,500 pounds of filament.
This short demo shows TSD in action:
Kenneth Jiang, founder of TSD, reported being “stunned” at just how outdated most 3D-printing software can be. So he and his team created TSD to bring new technologies to the world of 3D printing.
Every part of TSD is open source, including the plug-in, the backend, and the algorithm.
According to a post by Jiang in the NVIDIA Developer Forum, TSD is “based on a Convolutional Neural Network architecture called YOLO. It is essentially a super-fast objection-detection model.”
The Spaghetti Detective also communicates with OctoPrint, an open-source web interface for your 3D printer. The private TSD server has an array of advanced settings for all requirements, including enabling NVIDIA GPU acceleration, reverse proxies, NGINX settings, and more.
With more than 600 stars on GitHub, TSD is being used by hundreds of NVIDIA Jetson Nano fans who are also 3D printing enthusiasts. Inspired by their success, Jiang took it upon himself to set up TSD with Jetson Nano, and created a demo to show other users how to set it up.
The project requires an NVIDIA Jetson Nano with 4GB of memory to run (they do not advise trying to set this up with the 2GB model), an Ethernet cable to connect your network router, an HDMI cable, keyboard, and mouse. TSD is installed using Docker and Docker-compose, and the server uses e-mail delivery through SMTP for notifications. The web interface is written in Django and you can log in and create a password-secured account. Notifications from TSD can also be sent by SMS.
TSD is available as a free service for occasional 3D-print monitoring. If you expect to be printing daily, there is also a paid option starting at $4 per month.
The team working on TSD also plans to add event-based recording for fluid video capture, improvements to model accuracy and capability, and functionality to enable local hosting for increased data privacy. At the same time, they are clearly having fun figuring it all out, as you can see from their very popular videos on TikTok.
If you are interested in learning more about how Jetson Nano can be used to run The Spaghetti Detective, check out the code in GitHub.
The latest NVIDIA HPC SDK includes a variety of tools to maximize developer productivity, as well as the performance and portability of HPC applications.
Today, NVIDIA announced the upcoming HPC SDK 21.11 release with new Library enhancements. This software will be available free of charge in the coming weeks.
The NVIDIA HPC SDK is a comprehensive suite of compilers and libraries for high-performance computing development. It includes a wide variety of tools proven to maximize developer productivity, as well as the performance and portability of HPC applications.
The HPC SDK and its components are updated numerous times per year with new features, performance advancements, and other enhancements.
This 21.11 release will include updates to HPC C++/Fortran compiler support and the developer environment, as well as new multinode mulitGPU library capabilities.
Introduced last year with version 20.11, the NVFORTRAN compiler automatically parallelizes code written using the DO CONCURRENT standard language feature as described in this post.
New in version 21.11, the programmer can use the REDUCE clause as described in the current working draft of the ISO Fortran Standard to perform reduction operations, a requirement of many scientific algorithms.
Starting with the 21.11 release, the HPC Compilers now support the –gcc-toolchain option, similarly to the clang-based compilers. This is provided in addition to the existing rc-file method of specifying nondefault GNU Compiler Collection (GCC) versions. The HPC Compilers leverage open source GCC libraries for things like common system operations and C++ standard library support.
Sometimes, a developer needs a different version of the GCC toolchain than the system default. Now 21.11 has both command line and file-based ways of making that specification. In addition to –gcc-toolchain, the 21.11 HPC Compilers add several GCC-compatible command line flags for specifying x86-64 target architecture details.
The 21.11 release now includes two new Fortran modules to integrate with NVIDIA libraries, Fortran applications maximize the benefit from NVIDIA platforms and Fortran developers be as productive as possible. HPC applications written in Fortran can directly use cufftX—a highly optimized multi-GPU FFT library from NVIDIA. It also enables easier use with the NVIDIA Tools Extension Library (NVTX) for performance and profiling studies with Nsight.
Version 21.11 will ship with CMake config files that define CMake targets for the various components of the HPC SDK. This offers application packagers and developers a more seamless code integration with the NVIDIA HPC SDK.
HPC SDK version 21.11 will include the first of our upcoming multinode, multiGPU Math Library functionality, cuSOLVERMp. Initial functionality will include Cholesky and LU Decomposition, with and without pivoting. Future releases will include LU, with multiple RHS.
A partial differential equation is “the most powerful tool humanity has ever created,” Cornell University mathematician Steven Strogatz wrote in a 2009 New York Times opinion piece. This quote opened last week’s GTC talk AI4Science: The Convergence of AI and Scientific Computing, presented by Anima Anandkumar, director of machine learning research at NVIDIA and professor Read article >
The post A Revolution in the Making: How AI and Science Can Mitigate Climate Change appeared first on The Official NVIDIA Blog.
Atos and NVIDIA today announced the Excellence AI Lab (EXAIL), which brings together scientists and researchers to help advance European computing technologies, education and research.
NVIDIA cuSPARSELt v0.2 now supports ReLu and GeLu activation functions, bias vector, and batched Sparse GEMM.
Today, NVIDIA is announcing the availability of cuSPARSELt, version 0.2.0, which increases performance on activation functions, bias vectors, and Batched Sparse GEMM. This software can be downloaded now free of charge.
Download the cuSPARSELt software.
INT8
I/O, INT32
Tensor Core compute kernels.For more technical information, see the cuSPARSELt Release Notes.
NVIDIA cuSPARSELt is a high-performance CUDA library dedicated to general matrix-matrix operations in which at least one operand is a sparse matrix:
In this equation, and
refer to in-place operations such as transpose and nontranspose.
The cuSPARSELt APIs provide flexibility in the algorithm/operation selection, epilogue, and matrix characteristics, including memory layout, alignment, and data types.
FP16
I/O, FP32
Tensor Core accumulate.BFLOAT16
I/O, FP32
Tensor Core accumulate.INT8
I/O, INT32
Tensor Core compute.FP32
I/O, TF32
Tensor Core compute.TF32
I/O, TF32
Tensor Core compute.New nvCOMP v2.1.0 Library with Redesigned Batch API and Performance Optimizations
Today, NVIDIA is announcing the availability of nvCOMP, version 2.1.0. This software can be downloaded now free of charge.
See the nvCOMP Release Notes for more information.
nvCOMP is a CUDA library that features generic compression interfaces to enable developers to use high-performance GPU compressors in their applications.
Supported nvCOMP Compression algorithms:
NVIDIA and partners have been working hard to get the NVIDIA Arm HPC Developer Kit units into the hands of developers and enhance the software stack.
In July of 2021, NVIDIA announced the availability of the NVIDIA Arm HPC Developer Kit for preordering, along with the NVIDIA HPC SDK. Since then NVIDIA and its partners have been working hard to get units into the hands of developers, to increase global availability, and enhance the software stack.
The NVIDIA Arm HPC Developer Kit is based on the GIGABYTE G242-P32 2U server. It includes an Arm CPU, two A100 GPUs, two NVIDIA BlueField-2 data processing units (DPUs), and the NVIDIA HPC SDK suite of tools.
This delivers support for both single and multinode configurations. Units are available to order for global delivery through GIGABYTE.
The first units are already being used at sites, including Los Alamos National Laboratory (LANL), the University of Leicester, Oak Ridge National Laboratory (ORNL), and the National Center for High-performance Computing (NCHC) in Taiwan. They have successfully deployed multinode configurations and opened the systems to users to run HPC codes.
“Los Alamos National Laboratory has a broad set of requirements related to our national security mission spaces. With this as a backdrop, we evaluate, deploy, and integrate many advanced technologies into our ecosystem. The consistent goal of these technologies is to improve our responses to mission requirements.
“As part of our 2023 HPE/NVIDIA system, which will utilize NVIDIA’s Grace Arm-based CPU, Los Alamos has been working with the Arm ecosystem software and hardware. With that in mind, we have already deployed early development test systems where we see good success migrating and developing new codes. One such code, which we are actively codesigning both HW and SW, is an astrophysics code-named Phoebus.” – Steve Poole, Chief Architect at LANL.
“The University of Leicester, thanks to the contribution of the ExCALIBUR Hardware and Enabling Software Programme and STFC DiRAC HPC facility, has recently completed the deployment of 4x NVIDIA Arm HPC Developer Kit accessible by all UK developers interested in testing, porting, and optimizing strategic UK applications on NVIDIA Ampere Architecture Computing Altra CPU and NVIDIA A100 GPU.
“The UK remains at the forefront of computing thanks to initiatives like ExCALIBUR. The addition of this accelerated Arm-based system opens new opportunities to evaluate the role of accelerators in a fast-growing and diversified Arm HPC ecosystem. We welcome the close partnership of NVIDIA in pushing the ecosystem forward into the next era of accelerated computing.” – Mark Wilkinson, Professor of Theoretical Astrophysics and Director at DiRAC HPC Facility.
“Here at ORNL, we are looking forward to working with NVIDIA to explore the deployment of a wide array of applications on the NVIDIA Arm HPC developer kit as performance portability continues to gain prominence in HPC.” – Ross Miller, Systems Integration Programmer in the National Center for Computational Sciences at ORNL.
NVIDIA continues to make rapid progress on enhancing the HPC SDK and supporting its full stack of ML tools on Arm. Separate from the HPC SDK, NVIDIA is announcing support for two of the most popular deep learning frameworks: PyTorch and TensorFlow.
In addition, the RAPIDS suite of software libraries and the NVIDIA Triton Inference Server will be available on Arm by the end of year.
The NVIDIA Arm HPC Developer Kit is the first step in enabling an Arm HPC ecosystem with GPU acceleration. NVIDIA is committed to full support for Arm for HPC and AI applications.
Deep learning has successfully been applied to a wide range of important challenges, such as cancer prevention and increasing accessibility. The application of deep learning models to weather forecasts can be relevant to people on a day-to-day basis, from helping people plan their day to managing food production, transportation systems, or the energy grid. Weather forecasts typically rely on traditional physics-based techniques powered by the world’s largest supercomputers. Such methods are constrained by high computational requirements and are sensitive to approximations of the physical laws on which they are based.
Deep learning offers a new approach to computing forecasts. Rather than incorporating explicit physical laws, deep learning models learn to predict weather patterns directly from observed data and are able to compute predictions faster than physics-based techniques. These approaches also have the potential to increase the frequency, scope, and accuracy of the predicted forecasts.
Within weather forecasting, deep learning techniques have shown particular promise for nowcasting — i.e., predicting weather up to 2-6 hours ahead. Previous work has focused on using direct neural network models for weather data, extending neural forecasts from 0 to 8 hours with the MetNet architecture, generating continuations of radar data for up to 90 minutes ahead, and interpreting the weather information learned by these neural networks. Still, there is an opportunity for deep learning to extend improvements to longer-range forecasts.
To that end, in “Skillful Twelve Hour Precipitation Forecasts Using Large Context Neural Networks”, we push the forecasting boundaries of our neural precipitation model to 12 hour predictions while keeping a spatial resolution of 1 km and a time resolution of 2 minutes. By quadrupling the input context, adopting a richer weather input state, and extending the architecture to capture longer-range spatial dependencies, MetNet-2 substantially improves on the performance of its predecessor, MetNet. Compared to physics-based models, MetNet-2 outperforms the state-of-the-art HREF ensemble model for weather forecasts up to 12 hours ahead.
MetNet-2 Features and Architecture
Neural weather models like MetNet-2 map observations of the Earth to the probability of weather events, such as the likelihood of rain over a city in the afternoon, of wind gusts reaching 20 knots, or of a sunny day ahead. End-to-end deep learning has the potential to both streamline and increase quality by directly connecting a system’s inputs and outputs. With this in mind, MetNet-2 aims to minimize both the complexity and the total number of steps involved in creating a forecast.
The inputs to MetNet-2 include the radar and satellite images also used in MetNet. To capture a more comprehensive snapshot of the atmosphere with information such as temperature, humidity, and wind direction — critical for longer forecasts of up to 12 hours — MetNet-2 also uses the pre-processed starting state used in physical models as a proxy for this additional weather information. The radar-based measures of precipitation (MRMS) serve as the ground truth (i.e., what we are trying to predict) that we use in training to optimize MetNet-2’s parameters.
Example ground truth image: Instantaneous precipitation (mm/hr) based on radar (MRMS) capturing a 12 hours-long progression. |
MetNet-2’s probabilistic forecasts can be viewed as averaging all possible future weather conditions weighted by how likely they are. Due to its probabilistic nature, MetNet-2 can be likened to physics-based ensemble models, which average some number of future weather conditions predicted by a variety of physics-based models. One notable difference between these two approaches is the duration of the core part of the computation: ensemble models take ~1 hour, whereas MetNet-2 takes ~1 second.
Steps in a MetNet-2 forecast and in a physics-based ensemble. |
One of the main challenges that MetNet-2 must overcome to make 12 hour long forecasts is capturing a sufficient amount of spatial context in the input images. For each additional forecast hour we include 64 km of context in every direction at the input. This results in an input context of size 20482 km2 — four times that used in MetNet. In order to process such a large context, MetNet-2 employs model parallelism whereby the model is distributed across 128 cores of a Cloud TPU v3-128. Due to the size of the input context, MetNet-2 replaces the attentional layers of MetNet with computationally more efficient convolutional layers. But standard convolutional layers have local receptive fields that may fail to capture large spatial contexts, so MetNet-2 uses dilated receptive fields, whose size doubles layer after layer, in order to connect points in the input that are far apart one from the other.
Example of input spatial context and target area for MetNet-2. |
Results
Because MetNet-2’s predictions are probabilistic, the model’s output is naturally compared with the output of similarly probabilistic ensemble or post-processing models. HREF is one such state-of-the-art ensemble model for precipitation in the United States, which aggregates ten predictions from five different models, twice a day. We evaluate the forecasts using established metrics, such as the Continuous Ranked Probability Score, which captures the magnitude of the probabilistic error of a model’s forecasts relative to the ground truth observations. Despite not performing any physics-based calculations, MetNet-2 is able to outperform HREF up to 12 hours into the future for both low and high levels of precipitation.
Continuous Ranked Probability Score (CRPS; lower is better) for MetNet-2 vs HREF aggregated over a large number of test patches randomly located in the Continental United States. |
Examples of Forecasts
The following figures provide a selection of forecasts from MetNet-2 compared with the physics-based ensemble HREF and the ground truth MRMS.
Comparison of 0.2 mm/hr precipitation on March 30, 2020 over Denver, Colorado. Left: Ground truth, source MRMS. Center: Probability map as predicted by MetNet-2 . Right: Probability map as predicted by HREF.MetNet-2 is able to predict the onset of the storm (called convective initiation) earlier in the forecast than HREF as well as the storm’s starting location, whereas HREF misses the initiation location, but captures its growth phase well. |
Interpreting What MetNet-2 Learns About Weather
Because MetNet-2 does not use hand-crafted physical equations, its performance inspires a natural question: What kind of physical relations about the weather does it learn from the data during training? Using advanced interpretability tools, we further trace the impact of various input features on MetNet-2’s performance at different forecast timelines. Perhaps the most surprising finding is that MetNet-2 appears to emulate the physics described by Quasi-Geostrophic Theory, which is used as an effective approximation of large-scale weather phenomena. MetNet-2 was able to pick up on changes in the atmospheric forces, at the scale of a typical high- or low-pressure system (i.e., the synoptic scale), that bring about favorable conditions for precipitation, a key tenet of the theory.
Conclusion
MetNet-2 represents a step toward enabling a new modeling paradigm for weather forecasting that does not rely on hand-coding the physics of weather phenomena, but rather embraces end-to-end learning from observations to weather targets and parallel forecasting on low-precision hardware. Yet many challenges remain on the path to fully achieving this goal, including incorporating more raw data about the atmosphere directly (rather than using the pre-processed starting state from physical models), broadening the set of weather phenomena, increasing the lead time horizon to days and weeks, and widening the geographic coverage beyond the United States.
Acknowledgements
Shreya Agrawal, Casper Sønderby, Manoj Kumar, Jonathan Heek, Carla Bromberg, Cenk Gazen, Jason Hickey, Aaron Bell, Marcin Andrychowicz, Amy McGovern, Rob Carver, Stephan Hoyer, Zack Ontiveros, Lak Lakshmanan, David McPeek, Ian Gonzalez, Claudio Martella, Samier Merchant, Fred Zyda, Daniel Furrer and Tom Small.
AI neural network deep potential to combine classical MD simulation with DFT calculation accuracy.
Molecular simulation communities have faced the accuracy-versus-efficiency dilemma in modeling the potential energy surface and interatomic forces for decades. Deep Potential, the artificial neural network force field, solves this problem by combining the speed of classical molecular dynamics (MD) simulation with the accuracy of density functional theory (DFT) calculation.1 This is achieved by using the GPU-optimized package DeePMD-kit, which is a deep learning package for many-body potential energy representation and MD simulation.2
This post provides an end-to-end demonstration of training a neural network potential for the 2D material graphene and using it to drive MD simulation in the open-source platform Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS).3 Training data can be obtained either from the Vienna Ab initio Simulation Package (VASP)4, or Quantum ESPRESSO (QE).5
A seamless integration of molecular modeling, machine learning, and high-performance computing (HPC) is demonstrated with the combined efficiency of molecular dynamics with ab initio accuracy — that is entirely driven through a container-based workflow. Using AI techniques to fit the interatomic forces generated by DFT, the accessible time and size scales can be boosted several orders of magnitude with linear scaling.
Deep potential is essentially a combination of machine learning and physical principles, which start a new computing paradigm as shown in Figure 1.
The entire workflow is shown in Figure 2. The data generation step is done with VASP and QE. The data preparation, model training, testing, and compression steps are done using DeePMD-kit. The model deployment is in LAMMPS.
A container is a portable unit of software that combines the application, and all its dependencies, into a single package that is agnostic to the underlying host OS.
The workflow in this post involves AIMD, DP training, and LAMMPS MD simulation. It is nontrivial and time-consuming to install each software package from source with the correct setup of compiler, MPI, GPU library, and optimization flags.
Containers solve this problem by providing a highly optimized GPU-enabled computing environment for each step, and eliminates the time to install and test software.
The NGC catalog, a hub of GPU-optimized HPC and AI software, carries a whole of HPC and AI containers that can be readily deployed on any GPU system. The HPC and AI containers from the NGC catalog are updated frequently and are tested for reliability and performance — necessary to speed up the time to solution.
These containers are also scanned for Common Vulnerabilities and Exposure (CVEs), ensuring that they are devoid of any open ports and malware. Additionally, the HPC containers support both Docker and Singularity runtimes, and can be deployed on multi-GPU and multinode systems running in the cloud or on-premises.
The first step in the simulation is data generation. We will show you how you can use VASP and Quantum ESPRESSO to run AIMD simulations and generate training datasets for DeePMD. All input files can be downloaded from the GitHub repository using the following command:
git clone https://github.com/deepmodeling/SC21_DP_Tutorial.git
A two-dimensional graphene system with 98-atoms is used as shown in Figure 3.6 To generate the training datasets, 0.5ps NVT AIMD simulation at 300 K is performed. The time step chosen is 0.5fs. The DP model is created using 1000 time steps from a 0.5ps MD trajectory at a fixed temperature.
Due to the short simulation time, the training dataset contains consecutive system snapshots, which are highly correlated. Generally, the training dataset should be sampled from uncorrelated snapshots with various system conditions and configurations. For this example, we used a simplified training data scheme. For production DP training, using DP-GEN is recommended to utilize the concurrent learning scheme to efficiently explore more combinations of conditions.7
The projector-augmented wave pseudopotentials are employed to describe the interactions between the valence electrons and frozen cores. The generalized gradient approximation exchange−correlation functional of Perdew−Burke−Ernzerhof. Only the Γ-point
was used for k-space
sampling in all systems.
The AIMD simulation can also be carried out using Quantum ESPRESSO, available as a container from the NGC Catalog. Quantum ESPRESSO is an integrated suite of open-source computer codes for electronic-structure calculations and materials modeling at the nanoscale based on density-functional theory, plane waves, and pseudopotentials. The same graphene structure is used in the QE calculations. The following command can be used to start the AIMD simulation:
$ singularity exec --nv docker://nvcr.io/hpc/quantum_espresso:qe-6.8 cp.x
Once the training data is obtained from AIMD simulation, we want to convert its format using dpdata
so that it can be used as input to the deep neural network. The dpdata
package is a format conversion toolkit between AIMD, classical MD, and DeePMD-kit.
You can use the convenient tool dpdata
to convert data directly from the output of first-principles packages to the DeePMD-kit format. For deep potential training, the following information of a physical system has to be provided: atom type, box boundary, coordinate, force, viral, and system energy.
A snapshot, or a frame of the system, contains all these data points for all atoms at one-time step, which can be stored in two formats, that is raw
and npy
.
The first format raw
is plain text with all information in one file, and each line of the file represents a snapshot. Different system information is stored in different files named as box.raw, coord.raw, force.raw, energy.raw
, and virial.raw
. We recommended you follow these naming conventions when preparing the training files.
An example of force.raw
:
$ cat force.raw
-0.724 2.039 -0.951 0.841 -0.464 0.363
6.737 1.554 -5.587 -2.803 0.062 2.222
-1.968 -0.163 1.020 -0.225 -0.789 0.343
This force.raw
contains three frames, with each frame having the forces of two atoms, resulting in three lines and six columns. Each line provides all three force components of two atoms in one frame. The first three numbers are the three force components of the first atom, while the next three numbers are the force components of the second atom.
The coordinate file coord.raw
is organized similarly. In box.raw
, the nine components of the box vectors should be provided on each line. In virial.raw
, the nine components of the virial tensor should be provided on each line in the order XX XY XZ YX YY YZ ZX ZY ZZ
. The number of lines of all raw files should be identical. We assume that the atom types do not change in all frames. It is provided by type.raw
, which has one line with the types of atoms written one by one.
The atom types should be integers. For example, the type.raw
of a system that has two atoms with zero and one:
$ cat type.raw
0 1
It is not a requirement to convert the data format to raw
, but this process should give a sense on the types of data that can be used as inputs to DeePMD-kit for training.
The easiest way to convert the first-principles results to the training data is to save them as numpy binary data.
For VASP output, we have prepared an outcartodata.py
script to process the VASP OUTCAR file. By running the commands:
$ cd SC21_DP_Tutorial/AIMD/VASP/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python outcartodata.py
$ mv deepmd_data ../../DP/
For QE output:
$ cd SC21_DP_Tutorial/AIMD/QE/
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 python logtodata.py
$ mv deepmd_data ../../DP/
A folder called deepmd_data
is generated and moved to the training directory. It generates five sets 0/set.000, 1/set.000, 2/set.000, 3/set.000, 4/set.000
, with each set containing 200 frames. It is not required to take care of the binary data files in each of the set.* directories. The path containing the set.*
folder and type.raw
file is called a system. If you want to train a nonperiodic system, an empty nopbc
file should be placed under the system directory. box.raw
is not necessary as it is a nonperiodic system.
We are going to use three of the five sets for training, one for validating, and the remaining one for testing.
The input of the deep potential model is a descriptor vector containing the system information mentioned previously. The neural network contains several hidden layers with a composition of linear and nonlinear transformations. In this post, a three layer-neural network with 25, 50 and 100 neurons in each layer is used. The target value, or the label, for the neural network to learn is the atomic energies. The training process optimizes the weights and the bias vectors by minimizing the loss function.
The training is initiated by the command where input.json
contains the training parameters:
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp train input.json
The DeePMD-kit prints detailed information on the training and validation data sets. The data sets are determined by training_data
and validation_data
as defined in the training
section of the input script. The training dataset is composed of three data systems, while the validation data set is composed of one data system. The number of atoms, batch size, number of batches in the system, and the probability of using the system are all shown in Figure 4. The last column presents if the periodic boundary condition is assumed for the system.
During the training, the error of the model is tested every disp_freq
training step with the batch used to train the model and with numb_btch
batches from the validating data. The training error and validation error are printed correspondingly in the file disp_file
(default is lcurve.out
). The batch size can be set in the input script by the key batch_size
in the corresponding sections for training and validation data set.
An example of the output:
# step rmse_val rmse_trn rmse_e_val rmse_e_trn rmse_f_val rmse_f_trn lr
0 3.33e+01 3.41e+01 1.03e+01 1.03e+01 8.39e-01 8.72e-01 1.0e-03
100 2.57e+01 2.56e+01 1.87e+00 1.88e+00 8.03e-01 8.02e-01 1.0e-03
200 2.45e+01 2.56e+01 2.26e-01 2.21e-01 7.73e-01 8.10e-01 1.0e-03
300 1.62e+01 1.66e+01 5.01e-02 4.46e-02 5.11e-01 5.26e-01 1.0e-03
400 1.36e+01 1.32e+01 1.07e-02 2.07e-03 4.29e-01 4.19e-01 1.0e-03
500 1.07e+01 1.05e+01 2.45e-03 4.11e-03 3.38e-01 3.31e-01 1.0e-03
The training error reduces monotonically with training steps as shown in Figure 5. The trained model is tested on the test dataset and compared with the AIMD simulation results. The test command is:
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp test -m frozen_model.pb -s deepmd_data/4/ -n 200 -d detail.out
The results are shown in Figure 6.
After the model has been trained, a frozen model is generated for inference in MD simulation. The process of saving neural network from a checkpoint is called “freezing” a model:
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp freeze -o graphene.pb
After the frozen model is generated, the model can be compressed without sacrificing its accuracy; while greatly speeding up the inference performance in MD. Depending on simulation and training setup, model compression can boost performance by 10X, and reduce memory consumption by 20X when running on GPUs.
The frozen model can be compressed using the following command where -i
refers to the frozen model and -o
points to the output name of the compressed model:
$ singularity exec --nv docker://nvcr.io/hpc/deepmd-kit:v2.0.3 dp compress -i graphene.pb -o graphene-compress.pb
A new pair-style has been implemented in LAMMPS to deploy the trained neural network in prior steps. For users familiar with the LAMMPS workflow, only minimal changes are needed to switch to deep potential. For instance, a traditional LAMMPS input with Tersoff potential has the following setting for potential setup:
pair_style tersoff
pair_coeff * * BNC.tersoff C
To use deep potential, replace previous lines with:
pair_style deepmd graphene-compress.pb
pair_coeff * *
The pair_style
command in the input file uses the DeePMD model to describe the atomic interactions in the graphene system.
graphene-compress.pb
file represents the frozen and compressed model for inference.x
– and y
-directions, and free boundary is applied to the z
-direction.The system configuration after NVT relaxation is shown in Figure 7. It can be observed that the deep potential can describe the atomic structures with small ripples in the cross-plane direction. After 10ps NVT relaxation, the system is placed under NVE ensemble to check system stability.
The system temperature is shown in Figure 8.
To validate the accuracy of the trained DP model, the calculated radial distribution function (RDF) from AIMD, DP and Tersoff, are plotted in Figure 9. The DP model-generated RDF is very close to that of AIMD, which indicates that the crystalline structure of graphene can be well presented by the DP model.
This post demonstrates a simple case study of graphene under given conditions. The DeePMD-kit package streamlines the workflow from AIMD to classical MD with deep potential, providing the following key advantages:
Furthermore, the use of GPU-optimized containers from the NGC catalog simplifies and accelerates the overall workflow by eliminating the steps to install and configure software. To train a comprehensive model for other applications, download the DeepMD Kit Container from the NGC catalog.
We thank the helpful discussions with Dr. Chunyi Zhang from Temple University, Dan Han, Dr. Xinyu Wang from Shandong University, and Dr. Linfeng Zhang, Yuzhi Zhang, Jinzhe Zeng, Duo Zhang, and Fengbo Yuan from the DeepModeling community.
[1] Jia W, Wang H, Chen M, Lu D, Lin L, Car R, E W and Zhang L 2020 Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning IEEE Press 5 1-14
[2] Wang H, Zhang L, Han J and E W 2018 DeePMD-kit: A deep learning package for many-body potential energy representation and molecular dynamics Computer Physics Communications 228 178-84
[3] Plimpton S 1995 Fast Parallel Algorithms for Short-Range Molecular Dynamics Journal of Computational Physics 117 1-19
[4] Kresse G and Hafner J 1993 Ab initio molecular dynamics for liquid metals Physical Review B 47 558-61
[5] Giannozzi P, Baroni S, Bonini N, Calandra M, Car R, Cavazzoni C, Ceresoli D, Chiarotti G L, Cococcioni M, Dabo I, Dal Corso A, de Gironcoli S, Fabris S, Fratesi G, Gebauer R, Gerstmann U, Gougoussis C, Kokalj A, Lazzeri M, Martin-Samos L, Marzari N, Mauri F, Mazzarello R, Paolini S, Pasquarello A, Paulatto L, Sbraccia C, Scandolo S, Sclauzero G, Seitsonen A P, Smogunov A, Umari P and Wentzcovitch R M 2009 QUANTUM ESPRESSO: a modular and open-source software project for quantum simulations of materials Journal of Physics: Condensed Matter 21 395502
[6] Humphrey W, Dalke A and Schulten K 1996 VMD: Visual molecular dynamics Journal of Molecular Graphics 14 33-8
[7] Yuzhi Zhang, Haidi Wang, Weijie Chen, Jinzhe Zeng, Linfeng Zhang, Han Wang, and Weinan E, DP-GEN: A concurrent learning platform for the generation of reliable deep learning based potential energy models, Computer Physics Communications, 2020, 107206.