NVIDIA today reported record revenue for the third quarter ended October 31, 2021, of $7.10 billion, up 50 percent from a year earlier and up 9 percent from the previous quarter, with record revenue from the company’s Gaming, Data Center and Professional Visualization market platforms.
The most important soft skill for ML practitioners and Data Scientists
Editor’s Note: If you’re interested sharing your data science and AI expertise, you can apply to write for our blog here.
Data Science as a discipline and profession demands its practitioners possess various skills, ranging from soft skills such as communication, leadership to hard skills such as deductive reasoning, algorithmic thinking, programming, and so on. But there’s a crucial skill that should be attained by Data Scientists, irrespective of their experience, and that is writing.
Even Data Scientists working in technical fields such as quantum computing, or healthcare research need to write. It takes time to develop strong writing ability, and there are challenges that Data Scientists confront that might prevent them from expressing their thoughts easily. That’s why this article contains a variety of writing strategies and explanations of how they benefit Data Science and Machine Learning professionals.
1. Short-form writing
Let’s start with the most typical accessible styles of writing we encounter. Writing in a short form is typically low effort and doesn’t take up too much time. Machine Learning and Data science contents written On Twitter, LinkedIn, Facebook, Quora, and StackOverflow, all fall into this category.

Long-form content, such as books, articles, and essays, is usually the most valuable material in the ML field. All require time to write, read, and analyze. Short-form content on social media platforms, on the other hand, can provide information while using far less effort and time than long form content.
Currently, we have the privilege to witness discourse and ideas shared between AI pioneers and reputable machine learning practitioners, without having to wait for them to write and publish a research paper or an essay. Writing short-form posts on social media platforms provides insight into opinions and views that are not easily expressed verbally and your voice can participate and opinions shared.
For those who want to experiment with connecting with other ML experts through social media postings, I recommend following some people who post genuine and relevant information about Machine learning and Data Science. Take some time to read the tone of the discussions and contributions on posts, and if you have anything valuable to contribute, speak up.
To get you started, here is a list of individuals that post AI-related content (among other interesting things): Andrew Ng, Geoffrey Hinton, Allie, K Miller, Andrej Karpathy, Jeremy Howard, Francois Chollet, Aurélien Geron, Lex Fridman. There are plenty more individuals to follow, but content from these individuals should keep you busy for a while.
Questions/Answer platforms
Questions/Answers as a form of writing has the lowest entry barrier and does not consume as much time, depending on your ability to answer proposed questions.
Given your profession, I’m sure you’ve heard of StackOverflow, the internet’s most popular resource for engineers. When it comes to asking questions on StackOverflow, things aren’t as simple; clarity and transparency are required. Writing queries properly is such an important component of StackOverflow that they’ve published a comprehensive guide on the subject.
Here’s the key takeaway in this section: asking and answering questions on StackOverflow helps you become concise and clear when posing queries, as well as thorough when responding.
2. Emails and Messages

Writing emails and messages is nothing specific to machine learning but Data Scientists and Machine-Learning practitioners that practice the art of composing effective messages tend to flourish within corporations and teams for obvious reasons, some of which are the ability to contribute, network, and get things done.
Composing well-written messages and emails can land you a new role, get your project funded or get you into an academic institution. Purvanshi Mehta wrote an article that explores the effective methods of cold messaging individuals on LinkedIn to build networks. Purvanshi article is a step-by-step instruction on adoptable cold messaging etiquette.
3. Blogs and Articles
Many experts believe that blogs and articles have a unique role in the machine learning community. Articles are how professionals stay up to date on software releases, learn new methods, and communicate ideas.
Technical and non-technical ML articles are the two most frequent sorts of articles you’ll encounter. Technical articles are composed of descriptive text coupled with code snippets or gists that describe the implementation of particular features. Non-technical articles include more descriptive language and pictures to illustrate ideas and concepts.
4. Newsletters
Starting and maintaining a newsletter might not be for Data scientists, but this sort of writing has shown to provide professional and financial advantages to those who are willing to put in the effort.
A newsletter is a key strategic play for DS/ML professionals to increase awareness and presence in the AI sector. A newsletter’s writing style is not defined, so you may write it however so you choose. You might start a formal, lengthy, and serious newsletter or a short, informative, and funny one.
The lesson to be drawn from this is that creating a newsletter may help you develop a personal brand in your field, business, or organization. Those who like what you do will continue to consume and promote your material.
There are a thousand reasons why you should not start a newsletter today, but to spark some inspiration, below are some ideas you can base your newsletter on, and I’ve also included some AI newsletters you should subscribe to.
Newsletter Ideas related to AI:
- A collection of AI/ML videos to watch, with your input on each video.
- A collection of AI/ML articles to read.
- Job postings in your areas that job seekers might be interested in.
- Up-to-date relevant AI news for ML practitioners interested in the more practical application of AI.
Remember that the frequency, length, and content of your newsletter are all defined by you. You could start a monthly newsletter if you feel you don’t have much time or a daily newsletter to churn out content like a machine.
Machine Learning and Data Science Newsletter to subscribe to:
- The Batch by Andrew Ng
- Data Dribble by Ken Jee
- O’Reilly AI Newsletter
- Daniel Bourke’s Newsletter
- Data Science Weekly
- Data Elixir
5. Documentation
Documentation, both technical and non-technical, is a common activity among software engineering occupations. Data Scientists are not exempt from the norm, and documentation that explains software code or individual features is recommended and considered best practice.
When is a project successful? Some might consider that it’s when your model achieves an acceptable accuracy on a test dataset?
Experienced Data Scientists understand that project success is influenced by a number of variables, including software maintainability, longevity, and knowledge transfer. Software documentation is a task that can improve the prospects of a project beyond the capabilities of a single team member not to mention, it provides an extra layer of software quality and maintainability.
One of the main advantages of documentation that Data Scientists should be aware of is its role in reducing queries concerning source code from new project members or novice Data Analysts. The majority of questions about source code are concerned with file locations, coding standards and best practices. This data can all be recorded once and referenced by many individuals.
Here are some ideas of items you could document
- Code Documentation: It’s critical to standardize implementation style and format in order to guarantee uniformity across applications. This conformity makes the transition for new developers into a codebase easier since coding standards are given through code documentation.
- Research and Analysis: Given the importance of software product features, successful development is always dependent on thorough study and analysis. Any ML expert who has worked on a project at the start will have handled the plethora of feature requests from stakeholders. Documenting information surrounding feature requests enables other parties involved in the project to get a more straightforward overview of the requirement and usefulness of the proposed feature. It also enforces the feature requester to conduct better research and analysis.
- Database Configurations / Application Information: Documenting information particular to applications, such as configuration parameters and environment variables, is critical for any software team, especially if you move to a new job or company.
- How-tos: Installation of software libraries and packages may be difficult, but the fact is that there could be different installation processes for various operating systems or even versions. It’s not uncommon to discover missing dependencies in official library documentation and quirks you must go through to install the program.
- API Documentation: When teams develop internal and external APIs (Application Programming Interfaces), they should document the components of methods, functions, and data resources needed by those APIs. There’s nothing more annoying than working with a non-documented API; the whole process becomes a guessing game, and you’ll spend time researching the parameters, internal workings, and outputs of an undocumented API. Save your team and clients time by creating a smooth experience when consuming the technical resources you make.
There’s no question that extensive resources allow organizations to conduct many types of documentation, and some even hire technical writers. Although those are all viable options, it is critical for machine learning experts who wish to take software completeness seriously to practice documenting programs and software developed in order to promote the idea that they can provide thorough explanations.
A quick Google search on “how to write good software documentation” provided good resources that all shared the same messages and best practices on documentation.
6. Research Papers
In 2020, I published an article on how to read research papers, which became a huge hit. When it comes to utilizing ML algorithms and models, we have to optimize the way we read these papers in much the same way that seasoned machine-learning experts do.
Writing machine-learning research papers is the other side of the coin. I’ve never written a research paper, and I don’t intend to start now. However, some Machine-learning specialties are very concerned with writing and publishing research studies. As a metric of career success, research institutions and firms use the number of papers published by an individual or group.
There’s an art to writing research papers; researchers and scientists must think about the structure and content of the data to ensure that a message, breakthrough, or idea is delivered effectively. Most of us are probably not writing research papers anytime soon, but there’s value in adopting the practice of writing good research papers. For example, having an abstract, introduction, and conclusion is a writing structure transferable to other writing pieces.
Go ahead and read some research papers; take note of the language, structure and use of visual images the authors are using. Try and adopt any good practice you identify in your next written piece.
7. Books and E-books

There’s no doubt that ML/DS books are the most authoritative texts on machine learning theory and hands-on expertise. I’m not suggesting that all data scientists and ML engineers should write a book. But bear with me.
I looked through several of the authors on my shelf who wrote books in AI/ML, and they all have extensive experience in their fields.
Writing non-fiction, technical books about machine learning is very difficult. It requires a high level of theoretical and practical industry knowledge that can only be attained through total immersion in study, research, and implementation. To educate hundreds of ML Engineers and Data Scientists, your reputation must be based on solid academic, commercial, or research credentials. Not to mention that writers require creativity when delivering well-written books. More specifically, they have to master the art of conveying sophisticated topics in books.
My argument is that to create a timeless machine learning book, you must go down the road of expertise. This does not sound inviting, but I’d want you to consider the fact that setting a long-term objective of writing a book will push you to delve more into the subject of machine intelligence or chosen field, which will enhance your general understanding of AI.
Books for Data Scientist and Machine Learning practitioners:
- SuperIntelligence by Nick Bostrom
- AI 2041 by Chen Qiufan and Kai-Fu Lee
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow By Aurélien Géron
- Artificial Intelligence: A Modern Approach by Peter Norvig
You will find that most authors listed preceding have produced the majority if not all forms of writing listed in this article, regardless of their domain specialty, hence why I consider writing a vital skill for Machine Learning practitioners and Data Scientists to master.
Conclusion
Whenever I’m asked what life decision provided me with the most benefit, either financial, academic or career, I usually answer with my decision to write.
Throughout this post, you’ve seen several advantages Data Scientists and Machine Learning experts may obtain if they write AI-related material on a regular basis. This section centralizes all the benefits listed throughout this article to make sure it all hits home.
- ML professionals employ writing to communicate complicated subjects in a simple way. By reading a well-written blog post by Andrej Karpathy, I was able to acquire a greater appreciation for the practical application of convolutional neural networks.
- Various types of writing can help you improve your creativity and critical thinking. I recently read AI 2041 by Kai-Fu Lee and Chen Qiufan, in which the authors examine AI technologies and their effects on human lives through well-written fictional stories and thorough explanations of AI technologies. Both writers have written for many years and have authored other books. It’s reasonable to conclude that their writing abilities allowed the writers to express future situations involving AI technology and explore the unknown societal impact of AI integration through critical and logical predictions based on current AI development.
- Writing in the form of storytelling gives life to projects. Good stories are spoken, but great stories are written. The retelling of machine-learning projects to stakeholders such as customers, investors, or project managers takes a positive and exciting turn when coupled with the art of storytelling. A Data Scientist explaining to stakeholders why a new state-of-the-art cancer detection deep-learning model should be leveraged across federal hospitals becomes more impactful and relatable when coupled with the story of an early diagnosis of a patient.
- Within the machine learning community, writing is a successful method of knowledge transfer. Most of the information you’ll get in the DS/ML world will be through written content. Articles, essays, and research papers are all repositories of years worth of knowledge organized into succinct chapters with clear explanations and digestible formats. Writing is an efficient way to condense years of knowledge and experience.
Did you know that AI pioneers and experts we admire and learn from also publish regularly? In this article, I compile a shortlist of individuals in the AI field and provide samples of their work, emphasizing the value and consequence of their work.
Thanks for reading.
AI Pioneers Write So Should Data Scientists
Data Scientists role in producing AI related written content to be consumed by the public
Editor’s Note: If you’re interested sharing your data science and AI expertise, you can apply to write for our blog here.
Primarily the dual purpose of writing has been to preserve and transfer knowledge across communities, organizations, and so on. Writing within the machine-learning domain is used for the sole purposes mentioned. There are prominent individuals that have placed immense time and effort in advancing the frontier of machine learning and AI as a field. Coincidentally, a good number of these AI pioneers and experts write a lot.
This article includes individuals that have contributed to the wider field of AI in different shapes and forms, especially, emphasizing their written work. The contribution each individual has provided through the practice of writing AI-related content.
The essential takeaway from this article is that as Data Scientists it’s a requirement that we develop soft skills such as creative and critical thinking, alongside communication. Writing is an activity that cultivates the critical soft skills for Data Scientists.
AI Experts That Write
Andrej Karpathy
At the time of writing, Andrej Karpathy is Senior Director of AI at Tesla. Overseeing engineering and research efforts on bringing commercial autonomous vehicles to market, by using massive artificial neural networks trained with millions of image and video samples.
Andrej is a prominent writer. His work has featured in top publications such as Forbes, MIT Technology Review, Fast Company, and Business Insider. Specifically, I’ve been following Andrej’s writing through his Medium profile and his blog.
In my time as a Computer Vision student exploring the fundamentals of convolutional neural networks, Andrej’s Deep learning course at Stanford proved instrumental in gaining an understanding and intuition of the internal structure of a convolutional neural network. Specifically, the written content of the course explored details such as the distribution of parameters across the CNN, the operations of the different layers within the CNN architecture and the convolution operation that occurs between CNN’s filter parameters, and the values of an input image. Andrej uses his writing to present new ideas, explore the state of deep learning, and educate others.
Data Scientists are intermediaries between the world of numerical representations of data and project stakeholders, therefore the ability to interpret and convey derived understanding from datasets is essential to Data Scientists. Writing is one means of communication that equips Data scientists with the capability to convey and present ideas, patterns, and learning from data. Andrej’s writing is a clear example of how this is done. He provides a clear and concise-written explanation of neural network architecture, data preparation processes and many more.
Kai-Fu Lee
Kai-Fu Lee is an AI and Data Science Expert. He has contributed significantly to AI through his work at Google, Microsoft, Apple, and other organizations.
He’s currently CEO of Sinovation Ventures. Kai-Fu has been making significant contributions to AI research by applying Artificial Intelligence in video analysis, computer vision, pattern recognition, and so on. Furthermore, Kai-Fu Lee has written books exploring the global players of AI and the future utilization and impact of AI in his book AI Superpowers and AI 2041.
Through his writing, Kai-Fu Lee dissects the strategies of nations and entities that operate abundantly within the AI domain. The communication of decisions, mindset, and national efforts that drive the AI superpowers of today is crucial to the developing nations seeking to fast-track the development of AI technologies.
However, Kai-Fu Lee also conveys the potential disadvantages that the advancement of AI technologies can have on societies and individuals through his writing as well. By reading Kai-Fu Lee’s written contents, I’ve been able to understand how deep learning and predictive models can affect daily human lives when their usability is projected into imaginative future scenarios that touch on societal issues such as bias, poverty, discrimination, inequality, and so on.
The “dangers of AI” is a discourse that’s held more frequently as the adoption of AI technology and data-fueled algorithms become commonplace within our mobile devices, appliances, and processes. Data Scientists are ushering in the future one model at a time and it’s our responsibility to ensure that. We are able to communicate the fact that we conduct an in-depth cost-benefit analysis of technologies before they are integrated into society. These considerations put the mind of consumers at ease, by ensuring that the positive and negative impact of AI technology is not just afterthoughts to Data Scientists.
An effective method of communicating the previously mentioned considerations for Data Scientists is through writing. There’s effectiveness in writing a post or two explaining the data source, network architectures, algorithms, and extrapolated future utilization of AI applications or predictive models based on current utilization. A Data Scientist that covers these steps as part of their process establishes a sense of accountability and trust within product consumers and largely, the community.
Francois Chollet
TensorFlow and Keras are two primary libraries that are used extensively within data science and machine-learning projects. If you use any of these libraries, then Francois Chollet is probably an individual within AI you’ve come across.
Francois Chollet is an AI researcher that currently works as a Software Engineer at Google. He’s recognized as the creator of the deep-learning library Keras and also a prominent contributor to the TensorFlow library. And no surprise here, he writes.
Through his writing, Francois has expressed his thoughts on concerns, predictions, and limitations of AI. The impact of Francois writing on me as a machine-learning practitioner comes from his essays on software engineering topics, more specifically: API design and software development processes. Through his writing, Francois has educated hundreds of thousands on the topic of practical deep learning and utilization of the Python programming language for machine-learning tasks, through his infamous book Deep Learning With Python.
Through writing Data Scientists have the opportunity to enforce best practices in software development and data science processes among team members, or organizations.
Conclusion
Academic institutions covering Data Science should have writing within the course curriculum. The cultivation of writing as a habit through the years in academia proves beneficial in professional roles.
Professional Data Scientists should expand their craft by adopting writing as an integral aspect of communication of ideas, techniques, and concepts. As pointed out through the work of the AI experts mentioned in this article, written work produced can be in the form of essays, blogs, articles, and so on. Even interacting with peers and engaging in discourse on platforms such as LinkedIn or Twitter can be beneficial for Data Science professionals.
Novice Data Scientists often ask what methods can be adopted to improve skills, knowledge, and confidence, unsurprisingly the answer to that is also writing. Writing enables the expression of ideas in a structured manner that is difficult to convey through other communicative methods. Writing also serves as a method to reinforce learning.
This post is a fantastic resource of inspiration for Data Scientists looking for ideas, and if you’re feeling inspired, read this article about different sorts of writing in the field of machine learning.
How to use tensorflow with an AMD GPU
Error when installing tensoflow…
- Hey guys I get this error message when i try to run
- import tensorflow as tf
- print(tf. __version__)
-
-
-
- 2021-11-17 19:57:46.733325: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library ‘cudart64_110.dll’; dlerror: cudart64_110.dll not found 2021-11-17 19:57:46.739099: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2.7.0
-
- Can anybody help me figure out whats the problem?
submitted by /u/Davidescu-Vlad
[visit reddit] [comments]
Reinforcement learning (RL) is an area of machine learning that focuses on learning from experiences to solve decision making tasks. While the field of RL has made great progress, resulting in impressive empirical results on complex tasks, such as playing video games, flying stratospheric balloons and designing hardware chips, it is becoming increasingly apparent that the current standards for empirical evaluation might give a false sense of fast scientific progress while slowing it down.
To that end, in “Deep RL at the Edge of the Statistical Precipice”, accepted as an oral presentation at NeurIPS 2021, we discuss how statistical uncertainty of results needs to be considered, especially when using only a few training runs, in order for evaluation in deep RL to be reliable. Specifically, the predominant practice of reporting point estimates ignores this uncertainty and hinders reproducibility of results. Related to this, tables with per-task scores, as are commonly reported, can be overwhelming beyond a few tasks and often omit standard deviations. Furthermore, simple performance metrics like the mean can be dominated by a few outlier tasks, while the median score would remain unaffected even if up to half of the tasks had performance scores of zero. Thus, to increase the field’s confidence in reported results with a handful of runs, we propose various statistical tools, including stratified bootstrap confidence intervals, performance profiles, and better metrics, such as interquartile mean and probability of improvement. To help researchers incorporate these tools, we also release an easy-to-use Python library RLiable with a quickstart colab.
Statistical Uncertainty in RL Evaluation
Empirical research in RL relies on evaluating performance on a diverse suite of tasks, such as Atari 2600 video games, to assess progress. Published results on deep RL benchmarks typically compare point estimates of the mean and median scores aggregated across tasks. These scores are typically relative to some defined baseline and optimal performance (e.g., random agent and “average” human performance on Atari games, respectively) so as to make scores comparable across different tasks.
In most RL experiments, there is randomness in the scores obtained from different training runs, so reporting only point estimates does not reveal whether similar results would be obtained with new independent runs. A small number of training runs, coupled with the high variability in performance of deep RL algorithms, often leads to large statistical uncertainty in such point estimates.
The distribution of median human normalized scores on the Atari 100k benchmark, which contains 26 games, for five recently published algorithms, DER, OTR, CURL, two variants of DrQ, and SPR. The reported point estimates of median scores based on a few runs in publications, as shown by dashed lines, do not provide information about the variability in median scores and typically overestimate (e.g., CURL, SPR, DrQ) or underestimate (e.g., DER) the expected median, which can result in erroneous conclusions. |
As benchmarks become increasingly more complex, evaluating more than a few runs will be increasingly demanding due to the increased compute and data needed to solve such tasks. For example, five runs on 50 Atari games for 200 million frames takes 1000+ GPU days. Thus, evaluating more runs is not a feasible solution for reducing statistical uncertainty on computationally demanding benchmarks. While prior work has recommended statistical significance tests as a solution, such tests are dichotomous in nature (either “significant” or “not significant”), so they often lack the granularity needed to yield meaningful insights and are widely misinterpreted.
Number of runs in RL papers over the years. Beginning with the Arcade Learning Environment (ALE), the shift toward computationally-demanding benchmarks has led to the practice of evaluating only a handful of runs per task, increasing the statistical uncertainty in point estimates. |
Tools for Reliable Evaluation
Any aggregate metric based on a finite number of runs is a random variable, so to take this into account, we advocate for reporting stratified bootstrap confidence intervals (CIs), which predict the likely values of aggregate metrics if the same experiment were repeated with different runs. These CIs allow us to understand the statistical uncertainty and reproducibility of results. Such CIs use the scores on combined runs across tasks. For example, evaluating 3 runs each on Atari 100k, which contains 26 tasks, results in 78 sample scores for uncertainty estimation.
In each task, colored balls denote scores on different runs. To compute statified bootstrap CIs using the percentile method, bootstrap samples are created by randomly sampling scores with replacement proportionately from each task. Then, the distribution of aggregate scores on these samples is the bootstrapping distribution, whose spread around the center gives us the confidence interval. |
Most deep RL algorithms often perform better on some tasks and training runs, but aggregate performance metrics can conceal this variability, as shown below.
Data with varied appearance but identical aggregate statistics. Source: Same Stats, Different Graphs. |
Instead, we recommend performance profiles, which are typically used for comparing solve times of optimization software. These profiles plot the score distribution across all runs and tasks with uncertainty estimates using stratified bootstrap confidence bands. These plots show the total runs across all tasks that obtain a score above a threshold (𝝉) as a function of the threshold.
Performance profiles correspond to the empirical tail distribution of scores on runs combined across all tasks. Shaded regions show 95% stratified bootstrap confidence bands. |
Such profiles allow for qualitative comparisons at a glance. For example, the curve for one algorithm above another means that one algorithm is better than the other. We can also read any score percentile, e.g., the profiles intersect y = 0.5 (dotted line above) at the median score. Furthermore, the area under the profile corresponds to the mean score.
While performance profiles are useful for qualitative comparisons, algorithms rarely outperform other algorithms on all tasks and thus their profiles often intersect, so finer quantitative comparisons require aggregate performance metrics. However, existing metrics have limitations: (1) a single high performing task may dominate the task mean score, while (2) the task median is unaffected by zero scores on nearly half of the tasks and requires a large number of training runs for small statistical uncertainty. To address the above limitations, we recommend two alternatives based on robust statistics: the interquartile mean (IQM) and the optimality gap, both of which can be read as areas under the performance profile, below.
As an alternative to median and mean, IQM corresponds to the mean score of the middle 50% of the runs combined across all tasks. It is more robust to outliers than mean, a better indicator of overall performance than median, and results in smaller CIs, and so, requires fewer runs to claim improvements. Another alternative to mean, optimality gap measures how far an algorithm is from optimal performance.
IQM discards the lowest 25% and highest 25% of the combined scores (colored balls) and computes the mean of the remaining 50% scores. |
For directly comparing two algorithms, another metric to consider is the average probability of improvement, which describes how likely an improvement over baseline is, regardless of its size. This metric is computed using the Mann-Whitney U-statistic, averaged across tasks.
Re-evaluating Evaluation
Using the above tools for evaluation, we revisit performance evaluations of existing algorithms on widely used RL benchmarks, revealing inconsistencies in prior evaluation. For example, in the Arcade Learning Environment (ALE), a widely recognized RL benchmark, the performance ranking of algorithms changes depending on the choice of aggregate metric. Since performance profiles capture the full picture, they often illustrate why such inconsistencies exist.
Median (left) and IQM (right) human normalized scores on the ALE as a function of the number of environment frames seen during training. IQM results in significantly smaller CIs than median scores. |
On DM Control, a popular continuous control benchmark, there are large overlaps in 95% CIs of mean normalized scores for most algorithms.
Finally, on Procgen, a benchmark for evaluating generalization in RL, the average probability of improvement shows that some claimed improvements are only 50-70% likely, suggesting that some reported improvements could be spurious.
Each row shows the probability that the algorithm X on the left outperforms algorithm Y on the right, given that X was claimed to be better than Y. Shaded region denotes 95% stratified bootstrap CIs. |
Conclusion
Our findings on widely-used deep RL benchmarks show that statistical issues can have a large influence on previously reported results. In this work, we take a fresh look at evaluation to improve the interpretation of reported results and standardize experimental reporting. We’d like to emphasize the importance of published papers providing results for all runs to allow for future statistical analyses. To build confidence in your results, please check out our open-source library RLiable and the quickstart colab.
Acknowledgments
This work was done in collaboration with Max Schwarzer, Aaron Courville and Marc G. Bellemare. We’d like to thank Tom Small for an animated figure used in this post. We are also grateful for feedback by several members of the Google Research, Brain Team and DeepMind.
NVIDIA-powered systems won four of five tests in MLPerf HPC 1.0, an industry benchmark for AI performance on scientific applications in high performance computing. They’re the latest results from MLPerf, a set of industry benchmarks for deep learning first released in May 2018. MLPerf HPC addresses a style of computing that speeds and augments simulations Read article >
The post MLPerf HPC Benchmarks Show the Power of HPC+AI appeared first on The Official NVIDIA Blog.
Learn about the optimizations and techniques used across the full stack in the NVIDIA AI platform that led to a record-setting performance in MLPerf HPC v1.0.
In MLPerf HPC v1.0, NVIDIA-powered systems won four of five new industry metrics focused on AI performance in HPC. As an industry-wide AI consortium, MLPerf HPC evaluates a suite of performance benchmarks covering a range of widely used AI workloads.
In this round, NVIDIA delivered 5x better results for CosmoFlow, and 7x more performance on DeepCAM, compared to strong scaling results from MLPerf 0.7. The strong showing is the result of a mature NVIDIA AI platform with a full stack of software.
Offering a rich and diverse set of libraries, SDKs, tools, compilers, and profilers it can be difficult to know when and where to apply the right asset in the right situation. This post details the tools, techniques, and benefits for various scenarios, and outlines the results achieved for the CosmoFlow and DeepCAM benchmarks.
We have published similar guides for MLPerf Training v1.0 and MLPerf Inference v1.1, which are recommended for other benchmark-oriented cases.
The tuning plan
We tuned our code with tools including NVIDIA DALI to accelerate data processing, and CUDA Graphs to reduce small-batch latency for efficiently scaling out to 1,024 or more GPUs. We also applied NVIDIA SHARP to accelerate communications by offloading some operations to the network switch.
The software used in our submissions is available from the MLPerf repository. We regularly add new tools along with new versions to the NGC catalog—our software hub for pretrained AI models, industry application frameworks, GPU applications, and other software resources.
Major performance optimizations
In this section, we dive into the selected optimizations that are implemented for MLPerf HPC 1.0.
Using NVIDIA DALI library for data preprocessing
Data is fetched from the disk and preprocessed before each iteration. We moved from the default dataloader to NVIDIA DALI library. This provides optimized data loading and preprocessing functions for GPUs.
Instead of performing data loading and preprocessing on CPU and moving the result to GPU, DALI library uses a combination of CPU and GPU. This leads to more efficient preprocessing of the data for the upcoming iteration. The optimization results in significant speedup for both CosmoFlow and DeepCAM. DeepCAM achieved over a 50% end-to-end performance gain.
In addition, DALI also provides asynchronous data loading for the upcoming iteration to eliminate I/O overhead from the critical path. With this mode enabled, we saw an additional 70% gain on DeepCAM.
Applying the channels-last NHWC layout
By default, the DeepCAM benchmark uses NCHW layout, for the activation tensors. We used PyTorch’s channels-last (NHWC layout) support to avoid extra transpose kernels. Most convolution kernels in cuDNN are optimized for NHWC layout.
As a result, using NCHW layout in the framework requires additional transpose kernels to convert from NCHW to NHWC for efficient convolution operation. Using NHWC layout in-framework avoids these redundant copies, and delivered about 10% performance gains on the DeepCAM model. NHWC support is available in the PyTorch framework in beta mode.
CUDA Graphs
CUDA Graphs allow launching a single graph that consists of a sequence of kernels, instead of individually launching each of the kernels from CPU to GPU. This feature minimizes CPU involvement in each iteration, substantially improving performance by minimizing latencies—especially for strong scaling scenarios.
MXNet previously added CUDA Graphs support, and CUDA Graphs support was also recently added to PyTorch. CUDA Graphs support in PyTorch resulted in around a 15% end-to-end performance gain in DeepCAM for the strong scaling scenario, which is most sensitive to latency and jitter.
Efficient data staging with MPI
For the case of weak scaling, the performance of the distributed file system cannot sustain the demand from GPUs. To increase the aggregate total storage bandwidth, we stage the dataset into node-local NVME memory for DeepCAM.
Since the individual instances are small, we can shard the data statically, and thus only need to stage a fraction of the full dataset per node. This solution is depicted in Figure 1. Here we denote the number of instances with M and the number of ranks per instance with N.

Note that across instances, each rank with the same rank ID uses the same shard of data. This means that natively, each data shard is read M times. To reduce pressure on the file system, we created subshards of the data orthogonal to the instances, depicted in Figure 2.

This way, each file is read-only once from the global file system. Finally, each instance needs to receive all the data. For this purpose, we created new MPI communicators orthogonal to the intra-instance communicator, that is, we combine all instance ranks with the same rank id into the same interinstance communicators. Then we can use MPI allgather to combine the individual subshards into M copies of the original shard.

Instead of performing these steps sequentially, we use batching to create a pipeline that overlaps data reading and distribution of the subshards. In order to improve the read and write performance, we further implemented a small helper tool that uses O_DIRECT to improve I/O bandwidth.
The optimization resulted in more than 2x end-to-end speedup for the DeepCAM benchmark. This is available in the submission repository.
Loss hybridization
An imperative approach for model definition and execution is a flexible solution for defining a ML model like a standard Python program. On the other hand, symbolic programming is a way for declaring computation upfront, before execution. This approach allows the engine to perform various optimizations, but loses flexibility of the imperative approach.
Hybridization is a way of combining those two approaches in the MXNet framework. An imperatively defined calculation can be compiled into symbolic form and optimized when possible. CosmoFlow extends the model hybridization with loss.

This allows fusing element-wise operations in loss calculation with scaled activation output from CosmoFlow model, reducing overall iteration latency. The optimization resulted in close to a 5% end-to-end performance gain for CosmoFlow.
Employing SHARP for internode all-reduce collective
SHARP allows offloading collective operations from CPU to the switches in internode network fabric. This effectively doubles the internode bandwidth of InfiniBand network for the allreduce operation. This optimization results in up to 5% performance gain for MLPerf HPC benchmarks, especially for strong scaling scenarios.
Moving forward with MLPerf HPC
Scientists are making breakthroughs at an accelerated pace, in part because AI and HPC are combining to deliver insight faster and more accurately than could be done using traditional methods.
MLPerf HPC v1.0 reflects the supercomputing industry’s need for an objective, peer-reviewed method to measure and compare AI training performance for use cases relevant to HPC. In this round, the NVIDIA compute platform demonstrated clear leadership by winning all three benchmarks for performance, and also demonstrated the highest efficiency for both throughput measurements.
NVIDIA has also worked with several supercomputing centers around the world for their submissions with NVIDIA GPUs. One of them, the Jülich Supercomputing Centre, has the fastest submissions from Europe.
Read more stories of 2021 Gordon Bell finalists, as well as a discussion of how HPC and AI are making new types of science possible.
Learn more about the MLPerf benchmarks and results from NVIDIA.
Featured image of the Juwels Booster powered by NVIDIA A100 image courtesy of „Forschungszentrum Jülich / Sascha Kreklau”
Disclaimer:
MLPerf v1.0 HPC Closed Strong & Weak Scaling – Result retrieved from https://mlcommons.org/en/training-hpc-10 on Nov. 16, 2021.
The MLPerf name and logo are trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use is strictly prohibited. See www.mlcommons.org for more information.
Learn more about the many ways scientists are applying advancements in Million-X computing and solving global challenges.
At NVIDIA GTC last week, Jensen Huang laid out the vision for realizing multi-Million-X speedups in computational performance. The breakthrough could solve the challenge of computational requirements faced in data-intensive research, helping scientists further their work.
Solving challenges with Million-X computing speedups
Million-X unlocks new worlds of potential and the applications are vast. Current examples from NVIDIA include accelerating drug discovery, accurately simulating climate change, and driving the future of manufacturing.
Drug discovery
Researchers at NVIDIA, CalTech, and the startup Entos blended machine learning and physics to create OrbNet, speeding up molecular simulations by many orders of magnitude. As a result, Entos can accelerate its drug discovery simulations by 1,000x, finishing in 3 hours what would have taken more than 3 months.
Climate change
Last week, Jensen Huang announced plans to build Earth 2, building a digital twin of the Earth in Omniverse. The world’s most powerful AI supercomputer will be dedicated to simulating climate models that predict the impacts of global warming in different places across the globe. Understanding these changes over time can help humanity plan for and mitigate these changes at a regional level.
Future manufacturing
The earth is not the first digital twin project enabled by NVIDIA. Researchers are already building physically accurate digital twins of cities and factories. The simulation frontier is still young and full of potential, waiting for the catalyst mass increases in computing will provide.
Share your Million-X challenge
Share how you are using Million-X computing on Facebook, LinkedIn, or Twitter using #MyMillionX and tagging @NVIDIAHPCDev.
The NVIDIA developer community is already changing the world, using technology to solve difficult challenges. Join the community. >>
Below are a handful of notable examples.
The community that is changing the world
Smart waterways, safer public transit, and eco-monitoring in Antarctica

The work of Johan Barthelemy is interdisciplinary and covers a variety of industries. As the head of the University of Wollongong’s Digital Living Lab, he aims to deliver innovative AIoT solutions that champion ethical and privacy-compliant AI.
Currently, Barthelemy is working on an assortment of projects including a smart waterways computer vision application that detects stormwater blockage in real-time, helping cities prevent city-wide issues.
Another project, currently being deployed in multiple cities is AI camera software, which detects and reports violence on Sydney trains through aggressive stance modeling.
An AIoT platform for remotely monitoring Antarctica’s terrestrial environment is also in the works. Built around an NVIDIA Jetson Xavier NX edge computer, the platform will be used to monitor the evolution of moss beds—their health being an early indicator of the impact of climate change. The data collected will also inform a variety of models developed by the Securing Antarctica’s Environmental Future community of researchers, in particular hydrology and microclimate models.
Connect: LinkedIn | Twitter | Digital Living Lab
Never-before-seen views of SARS-CoV-2

NVIDIA researchers and 14 partners successfully developed a platform to explore the composition, structure, and dynamics of aerosols and aerosolized viruses at the atomic level.
This work surmounts the previously limited ability to examine aerosols at the atomic and molecular level, obscuring our understanding of airborne transmission. Leveraging the platform, the team produced a series of novel discoveries regarding the SARS-CoV-2 Delta variant.
These breakthroughs dramatically extend the capabilities of multiscale computational microscopy in experimental methods. The full impact of the project has yet to be realized.
Species recognition, environmental monitoring, and adaptive streaming

Dr. Albert Bifet is the Director of the Te Ipu o te Mahara, The Artificial Intelligence Institute at the University of Waikato, and Professor of Big Data at Télécom Paris, Institute.
Bifet also leads the TAIAO project, a data science program using an NVIDIA DGX A100 to build deep learning models on species recognition. He is codeveloping a new machine-learning library in Python called River for online/streaming machine learning, and building a new data repository to improve reproducibility in environmental data science.
Additionally, researchers at TAIAO are building new approaches to compute GPU-based SHAP values for XGBoost, and developing a new adaptive streaming XGBoost.
Connect: Website | LinkedIn | Twitter
Medical imaging, therapy robots, and NLP depression detection

The current interests of Dr. Ekapol Chuangsuwanich fall within the medical imaging domain, including chest x-ray and histopathology technology. However, over the past few years his work has spanned across many industries including NLP, ASR, and medical imaging.
Last year, Chuangsuwanich and his team developed the PYLON architecture, which can learn precise pixel-level object location with only image-level annotation. This is deployed across hospitals in Thailand to provide rapid COVID-19 severity assessments and to facilitate screening of tuberculosis in high-risk communities.
Additionally, he is working on NLP and ASR robots for medical use, including a speech therapy helper and call center robot with depression detection functionality. His startup, Gowajee, is also providing state-of-the-art ASR and TTS for the Thai language. These projects have been created using the NVIDIA NeMo framework and deployed on NVIDIA Jetson Nano devices.
Connect: Website | Org | Facebook
Trillion atom quantum-accurate molecular dynamics simulations

Researchers from the University of South Florida, NVIDIA, Sandia National Labs, NERSC, and the Royal Institute of Technology collaborated to produce a LAMMPS trained machine learning kernel with interatomic potentials named SNAP (Spectral Neighborhood Analysis Potential).
SNAP was found to be accurate across a huge pressure-temperature range, from 0-50Mbars or 300-20,000 Kelvin. The peak Molecular Dynamic performance was greater than 22x the previous record—done on a 20-billion-atom system, and simulated on Summit for 1ns in a day.
The project qualified as a Gordon Bell Prize finalist, and the near perfect weak scaling of SNAP MD highlights the potential to launch quantum-accurate MD to trillion atom simulations on upcoming exascale platforms. This dramatically expands the scientific return of X-ray free electron laser diffraction experiments.
BioInformatics, smart cities, and translational research

Dr. Ng See-Kion is constantly in search of big data. A practicing data scientist, See-Kion is also a Professor of Practice and Director of Translational Research at the National University of Singapore.
Currently projects on his desk leverage the NVIDIA NeMo framework covering NLP for indigenous and vernacular languages across Singapore and New Zealand. See-Kion is also working on intelligent COVID-19 contact tracing and outbreak, intelligent social event sensing, and assessing the credibility of information in new media.
Connect: Website

![]() |
submitted by /u/lizziepika [visit reddit] [comments] |