Categories
Misc

eCommerce and Open Ethernet: Criteo Clicks with SONiC

When you see a browser ad for a new restaurant, or the perfect gift for that hard-to-please family member, you probably aren’t thinking about the infrastructure used to deliver that ad. However, that infrastructure is what allows advertising companies like Criteo to provide these insights. The NVIDIA networking portfolio is essential to Criteo technology stack. … Continued

When you see a browser ad for a new restaurant, or the perfect gift for that hard-to-please family member, you probably aren’t thinking about the infrastructure used to deliver that ad. However, that infrastructure is what allows advertising companies like Criteo to provide these insights. The NVIDIA networking portfolio is essential to Criteo technology stack.

Criteo is an online advertising platform, the tier between digital advertisers and publishers. This business requires Criteo to solve problems related to quantities of “web scale.” Criteo processes hundreds of billions of dollars in sales, driven by billions of ads a day over tens of thousands of servers, thousands of networking devices, and terabits of east-west traffic per second. The communication within and between Criteo’s 10 data centers (across three continents) is of paramount importance, with the network taking center stage.

Moving away from lock-in

Starting in 2014, Criteo embarked on an initiative to completely overhaul their networking strategy, modernizing infrastructure, and reducing costs. By multisourcing hardware from different vendors, Criteo would be able to reduce costs, gain more flexibility in the procurement process, and become less dependent on individual vendor supply chains.

Criteo's networking journey including monolithic era, multi-vendor, and network agility.
Figure 1. Criteo’s journey to change their networking approach started in 2014 and continues today

 

With a new hardware approach, software came next. Criteo needed their OS to be compatible with their networking automation stack, consisting of in-house, hardware-agnostic tooling built mostly in Python. However, each new OS added to the mix would require unique updates to the rest of the stack to support it. Additionally, while vendor hardware was often affordable, the proprietary software attached ballooned the budget. 

Picking one OS for all the platforms solved both problems. Enter SONiC: after attending the Open Compute Project (OCP) Global Summit, Criteo began to evaluate the NOS in early 2018. As an open-source OS conceived by Microsoft and the OCP to meet the needs of the hyperscalers, SONiC had the design and functionality to meet Criteo’s needs. Moreover, SONiC’s openness meshed perfectly with Criteo’s flexible hardware sourcing strategy and would fully unlock their networking stack.

Turning over a New Leaf with NVIDIA

More than viewing NVIDIA as just a vendor, Criteo and NVIDIA partner on SONiC, with NVIDIA maintaining and developing SONiC’s feature set, and Criteo helping provide input. This comes from the way NVIDIA offers SONiC to customers. Rather than build a proprietary branch from the community releases, NVIDIA supports the community release of the OS as “pure SONiC,” without any add-ons. As one of the leading contributors to the SONiC codebase, NVIDIA is uniquely positioned to influence SONiC’s roadmap, and make Criteo’s visions a reality. 

Additionally, with NVIDIA providing ASIC-to-Protocol (A2P) support, the network team can fully rely on NVIDIA to offload and triage networking issues at any level with minimal interruption. Criteo also benefits from the reach of NVIDIA in the space. NVIDIA develops the features and uploads them into the community main branch, maintaining a pure SONiC commitment and allowing Criteo freedom of choice.

Timeline of Criteo evaluation and achievements with SONiC.
Figure 2. Criteo was an early adopter of SONiC in 2018, working through early challenges on the way to full data center rollout

Summary

Evaluating the mission, the goals of Criteo’s 2014 project are on target, with costs being brought under control, deployment flexibility growing, and a network team that has picked up some handy DevOps + CI/CD skills. But the goal remains a work in progress; Criteo sees a day when all infrastructure, including their management network, is running SONiC, with truly one NOS to rule them all. So next time, when you see that killer ad, maybe you’ll also think about the network fabric that makes it possible.

For more information see the following resources:

Categories
Misc

NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim

Robots are not just limited to the assembly line. At NVIDIA, Liila Torabi works on making the next generation of robotics possible. Torabi is the senior product manager for Isaac Sim, a robotics and AI simulation platform powered by NVIDIA Omniverse. Torabi spoke with NVIDIA AI Podcast host Noah Kravitz about the new era of Read article >

The post NVIDIA’s Liila Torabi Talks the New Era of Robotics Through Isaac Sim appeared first on The Official NVIDIA Blog.

Categories
Misc

Tensorflow working in python2 not python3

Hey All, I’m facing a very weird issue with tensorflow on my remote GPU machine.

When i do import tensorflow in python2, it seems to work. However, it doesn’t work in python3.

How is this happening? Has someone faced this before and knows the fix to this?

submitted by /u/Conanobrain
[visit reddit] [comments]

Categories
Misc

Strange training results: why is a batch size of 1 more efficient than larger batch sizes, despite using a GPU/TPU?

Hey all!

I’m currently doing some tests in preparation for my first real bit of training. I’m using Google Cloud AI Platform to train, and am trying to find the optimal machine setup. It’s a work in progress, but here’s a table I’m putting together to get a sense of the efficiency of each setup. On the left you’ll see the accelerator type, ordered from least to most expensive. Here you’ll also find the number of accelerator’s used, the cost per hour, and the batch size. To the right are the average time it took to complete an entire training iteration and how long it took to complete the minimization step. You’ll notice that the values are almost identical for each setup; I’m using Google Research’s SEED RL, so I thought to record both values since I’m not sure exactly everything that happens between iterations. Turns out it’s not much. There’s also a calculation of the the time it takes to complete a single “step” (aka, a single observation from a single environment), as well as the average cost per step.

So, the problem. I was under the assumption that batching with a GPU or TPU would increase the efficiency of the training. Instead, it turns out that a batch size of 1 is the most efficient both in time per step and cost per step. I’m still fairly new to ML so maybe it’s just a matter of me being uninformed, but this goes against everything I thought I knew about ML training. Since I’m using Google’s own SEED codebase I would assume that it’s not a problem with the code, but I can’t be sure about that.

Is this just a matter of me misunderstanding how training works, or am I right in thinking something is really off?

submitted by /u/EdvardDashD
[visit reddit] [comments]

Categories
Misc

What’s New in Optical Flow SDK 3.0

NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications.

The NVIDIA Turing architecture introduced a new hardware functionality for computing optical flow between a pair of images with very high performance. NVIDIA Optical Flow SDK exposes the APIs to use this Optical Flow hardware (also referred to as NVOFA) to accelerate applications. We are excited to announce the availability of Optical Flow SDK 3.0 with the following new features: 

  1. DirectX 12 Optical Flow API 
  2. Forward-Backward Optical Flow via a single API 
  3. Global Flow vector 

DirectX 12 Optical Flow API

DirectX 12 is a low-level programming API from Microsoft which reduces driver overhead in comparison to its predecessor DirectX 11. DirectX 12 provides more flexibility and fine-grained control to the developer. Developers can now take advantage of the low-level programming APIs in DirectX 12 and optimize their applications to give better performance over earlier DirectX versions – at the same time, the client application, on its own, must take care of resource management, synchronization, etc. DirectX 12 has rapidly grown amongst game titles and other graphics applications.  

Optical Flow SDK 3.0 enables DirectX 12 applications to use the NVIDIA Optical Flow engine. The computed optical flow can be used to increase frame rate in games and videos for smoother experience or in object tracking. To increase the frame rate, Frame Rate Up Conversion (FRUC) techniques are used by inserting interpolated frames between original frames. Interpolation algorithms use the flow between frame pair(s) to generate the intermediate frame.

All generations of Optical Flow Hardware support DirectX 12 Optical Flow interface. The Optical Flow SDK package contains header(s), sample applications that demonstrate the usage, C++ wrapper classes which can be re-used or modified as required and documentation. All the other components for accessing the Optical Flow hardware are included in the NVIDIA display driver. DirectX 12 Optical Flow API is supported on Windows 20H1 or later operating system.

Barring the explicit synchronization, DirectX 12 Optical Flow API is designed to be close to the other interfaces that are already available in the SDK (CUDA and DirectX 11). The DirectX 12 Optical Flow API consists of three core functions: initialization, flow estimation and destruction.

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFINIT) (NvOFHandle hOf, const NV_OF_INIT_PARAMS* initParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFEXECUTED3D12) (NvOFHandle hOf, const NV_OF_EXECUTE_INPUT_PARAMS_D3D12* executeInParams, NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12* executeOutParams);

typedef NV_OF_STATUS(NVOFAPI* PFNNVOFDESTROY) (NvOFHandle hOf);

Initialization and destroy APIs are same across all interfaces but Execute API differs between DirectX 12 and other interfaces i.e., DirectX 11 and CUDA. Even though most of the parameters passed to the Execute API in DirectX 12 are same as those in other two interfaces, there are some functional differences. Synchronization in DirectX 11 and CUDA interfaces is automatically taken care by OS runtime and driver. However, in DirectX 12, additional information about fence and fence values are required as input parameters to the Execute API. These fence objects will be used to synchronize the CPU↔GPU and GPU↔GPU operations.  For more details, please refer to the programming guide included with the Optical Flow SDK.

Buffer management API interface in DirectX 12 also needs fence objects for synchronization.

The Optical Flow output quality is same across all interfaces. Performance in DirectX 12 should be very close compared to the other two interfaces.

Forward-Backward Optical Flow (FB flow)

No Optical Flow algorithm can give 100% accurate flow. The flow is typically distorted in occluded regions. Sometimes, the cost provided by the NVOFA may not also represent true confidence of the flow. One simple check usually employed is to compare the forward and backward flow. If the Euclidean distance between forward flow and backward flow exceeds a threshold, the flow can be marked as invalid.

To estimate flow in both directions, client must call the Execute API twice: one call with input and reference images and second call after reversing the input and reference images. Calling the Optical Flow Execute API two times like this can result in suboptimal performance due to overheads such as context switching, thread switching etc. Optical Flow SDK 3.0 exposes a new API to generate flow in both directions in a single Execute call. This feature can be enabled by setting NV_OF_INIT_PARAMS::predDirection to NV_OF_PRED_DIRECTION_BOTH in initialization and providing necessary buffers to receive backward flow and/or cost in NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputBuffer, NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::bwdOutputCostBuffer.  

Figure 1. Source Images (http://ultravideo.fi/#testsequences)
NVIDIA Optical Flow
Figure 2. (b) Forward Flow, (c) Backward Flow, (d) Consistency Check. Black pixels in the image shows inconsistent flow, (e) Nearest Neighbor Infill 

Once the flow is generated in both directions, client application can compare the flow vectors of both directions, discard the inaccurate ones based on a suitable criteria (e.g. Euclidian distance between forward and backward flow vectors), and use hole filling algorithms to fill such discarded flow vectors.

Note that the output quality of FB flow could be different from unidirectional flow due to some optimizations.

Sample code that demonstrates FB flow API programming and consistency check:

// Initialization of API
NV_OF_INIT_PARAMS initParams = { 0 };
...
initParams.predDirection = NV_OF_PRED_DIRECTION_BOTH;
...
NvOFAPI->nvOFInit(hNvOF, &initParams);
// Estimation of forward and backward flow
NV_OF_EXECUTE_INPUT_PARAMS executeInParams = { 0 };
...
NV_OF_EXECUTE_OUTPUT_PARAMS executeOutParams = { 0 };
...
executeOutParams.outputBuffer = forwardFlowBuffer;
executeOutParams.outputCostBuffer = forwardFlowCostBuffer;
executeOutParams.bwdOutputBuffer = backwardFlowBuffer;
executeOutParams.bwdOutputCostBuffer = backwardFlowCostBuffer;

NvOFAPI->nvOFExecute(hNvOF, &executeInparams, &executeOutParams)


// Invalidating flow vectors
for (int y = 0; y  width - 1 || y2  thresh) {
            SetFlowInvalid(forwardFlowBuffer, x, y);
        }
    }
}

Global Flow Estimation

Global flow in a video sequence or game is caused by camera panning motion. Global flow estimation is an important tool widely used in image segmentation, video stitching or motion-based video analysis applications.

Global Flow vector can also be heuristically used in calculating background motion. Once background motion is estimated, this can be used to fill the flow vectors in occluded regions.  It can also be used to handle collisions of warped pixels in interpolated frames.

Global flow is calculated on the forward flow vectors, based on frequency of the occurrence and a few other heuristics.

To enable generation of global flow, initialization API needs to set the flag NV_OF_INIT_PARAMS:: enableGlobalFlow and provide the additional buffer NV_OF_EXECUTE_OUTPUT_PARAMS/NV_OF_EXECUTE_OUTPUT_PARAMS_D3D12::globalFlowBuffer in Execute API.

References

  1. NVIDIA Optical Flow SDK
  2. Developer Blog: An Introduction to the NVIDIA Optical Flow SDK
  3. Developer Blog: Accelerate OpenCV: Optical Flow Algorithms with NVIDIA Turing GPUs
Categories
Misc

NVIDIA Inception Partners Won Veterans Affairs AI Tech Sprint Awards with Latest AI Technologies

Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans.

Five NVIDIA Inception partners were named finalists at the 2020-2021 Artificial Intelligence Tech Sprint, a competition aimed at improving healthcare for veterans using the latest AI technology. Hosted by the Department of Veterans Affairs (VA), the sprint is designed to foster collaboration with industry and academic partners on AI-enabled tools that leverage federal data to address a need for veterans. See the official news release for more details about the competition. 

Participating teams gave presentations and demonstrations judged by panels of Veterans and other experts. 44 teams from industry and universities participated, addressing a range of health care challenges such as chronic conditions management, cancer screening, rehabilitation, patient experiences and more. 

Majority of the solutions from NVIDIA Inception partners were powered by NVIDIA Clara Guardian, a smart hospital solution for patient monitoring, mental health, and patient care.  

  • JumpStartCSR created ‘Holmz’ – a data source agnostic explainable AI (XAI) that provides a digital physical therapy to prevent and treat overexertion/repetitive stress injuries. Holmz can predict plantar fasciitis and identify root causes two weeks in advance of its occurrence with 97% accuracy and 99% accuracy one week in advance of occurrence. Holmz is also capable of predicting fatigue related falls and identifying their root causes 15 minutes in advance of their occurrence with 99% accuracy. The trained solutions are 4 to 5x faster using TensorFlow with NVIDIA T4 GPUs on AWS. They are piloting their XAI with the VA’s Physical Medicine and Rehabilitation group and expanding their digital physical therapy solution to include performance improvement and prediction for the US Army. 
  • Ouva is an Autonomous Remote Monitoring and Ambient Intelligence platform that is designed to improve patient safety and operational efficiency. Their solution leverages cuDNN, TensorRT, Transfer Learning Toolkit in NVIDIA Clara Guardian and runs on an NVIDIA RTX 6000 GPU. During the course of VA tech sprint, Ouva was able to predict sepsis infection 24 hours ahead of clinical diagnosis by looking at nine biomarkers from EMR collected from 40,000 patients. By processing data from telehealth cameras with AI, Ouva allows nurses to keep an eye on more patients at remote or isolation conditions. They have started deploying their remote care monitoring solution in Europe and piloting it in the US. Their near-term goal is to combine Ouva’s unique patient activity data with Electronic Medical Records (EMR) to uniquely predict more cases like Sepsis with increased accuracy.
  • PATH Decision Support Software, a GPU-accelerated expert system that recommends drug regimens for patients with type 2 diabetes. Their solution was developed on CUDA using NVIDIA GPUs on AWS and achieved 20x speedup. The software’s recommendations have shown a 2.1-point reduction in HbA1c and a $770/patient/year reduction in unplanned diabetes-related Medicaid claims. The team is currently working on EMR integration and expanding the list of medicines it can evaluate.
  • Dialisa is a digital nutritional Intervention platform that detects onset and monitors progression of chronic kidney disease (CKD) and delivers digital nutritional intervention to delay the progress of the disease. They use TensorFlow and CUDA optimized on NVIDIA GPUs on AWS, achieving an accuracy of 93.2% in detecting onset of CKD using generic longitudinal dataset provided by the National Artificial Intelligence Institute (NAII) at the Veterans Administrations. Dialisa looks to demonstrate the feasibility of their algorithms and remote monitoring platform in the veterans community, and welcome additional clinical partners to conduct pilot study in the general population.
  • KELLS, an AI-powered dental diagnostic platform, used TensorFlow and PyTorch optimized on NVIDIA T4 GPUs on AWS P3 instances and achieved 3x higher efficiency in training than previously used AWS G4 instances. KELLS’ technology has demonstrated to improve accuracy of detecting common dental pathologies up to 30% for average dentists, especially for detecting early signs of oral disease, enabling more timely and effective preventative care. They continue to expand their platform to support more clinical findings across different data modalities and increase quality and accessibility of dental care for patients.

“We were overwhelmed with the overall quality of proposals in this very competitive cycle, it’s a great tribute to our mission to serve our veterans,” said Artificial Intelligence Tech Sprint lead Rafael Fricks. “Very sophisticated AI capabilities are more accessible than ever; a few years ago these proposals wouldn’t have been possible outside the realm of very specialized high performance computing.” 

NVIDIA looks forward to providing continued support to these winning Inception Partners in the coming Pilot Implementation program phase, and to the contributions their AI solutions will make to the important VA/VHA healthcare mission serving our Nation’s veterans. 

Build your patient care solutions on Clara Guardian >

Categories
Misc

Fighting Disease-Carrying Mosquitoes with Neural Networks

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy. “Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health … Continued

Targeting areas populated with disease-carrying mosquitoes just got easier thanks to a new study. The research, recently published in IEEE Explore, uses deep learning to recognize tiger mosquitoes from images taken by citizen scientists with near perfect accuracy.

“Identifying the mosquitoes is fundamental, as the diseases they transmit continue to be a major public health issue,” said lead author Gereziher Adhane.

The study, from researchers at the Scene understanding and artificial intelligence (SUNAI) research group, of the Universitat Oberta de Catalunya’s (UOC) Faculty of Computer Science, Multimedia and Telecommunications and of the eHealth Center, uses images from the Mosquito Alert app. Developed in Spain and currently expanding globally, the platform brings together citizens, entomologists, public health authorities, and mosquito control services to reduce mosquito-borne diseases. 

Anyone in the world can upload geo-tagged images of mosquitoes to the app. Once submitted, three expert entomologists inspect and validate the images before they are added to the database, classified, and mapped. 

Travel and migration, along with climate change and urbanization has broadened the range and habitat of mosquitoes. Quick identification of species such as the tiger mosquito –  known to transmit dengue, Zika, chikungunya, and yellow fever – remains a key step in assisting relevant authorities to curb their spread. 

A comparison of tiger mosquito and other mosquito images.
Sample of tiger [first row] and non-tiger [second row] mosquitoes from the Mosquito Alert data set. Credit: G. Adhane et al/IEEE Explore

“This type of analysis depends largely on human expertise and requires the collaboration of professionals, is typically time-consuming, and is not cost-effective because of the possible rapid propagation of invasive species,” said Adhane. “This is where neural networks can play a role as a practical solution for controlling the spread of mosquitoes.”

The research team developed a deep convolutional neural network (CNN) that distinguishes between mosquito species. Starting with a pre-trained model, they fine-tuned it using the hand-labeled Mosquito Alert dataset. Using NVIDIA GPUs and the cuDNN-accelerated PyTorch deep learning framework, the classification models were taught to pinpoint tiger mosquitoes based on identifiable morphological features such as white stripes on the legs, abdominal patches, head, and thorax shape.

Deep learning models typically rely on millions of samples. However, using only 6,378 images of both tiger and non-tiger mosquitoes from Mosquito Alert, the researchers were able to train the model with about 94% accuracy. 

“The neural network we have developed can perform as well or nearly as well as a human expert and the algorithm is sufficiently powerful to process massive amounts of images,” said Adhane.

According to the researchers, as Mosquito Alert scales up, the study can be expanded to classify multiple species of mosquitoes and their breeding sites across the globe. 

“The model we have developed could be used in practical applications with small modifications to work with mobile apps. Using this trained network it is possible to make predictions about images of mosquitoes taken using smartphones efficiently and in real time,” Adhane said. 

The GPU used in the research was a donation provided by the NVIDIA Academic Hardware Grant Program.

Read the full article in IEEE Explore >>

Read more >>  

Categories
Misc

Object Detection Dependencies take hours to install

I’m kind of a noob at tensorflow, built a few classifieds but that’s it. I’m now trying to use the tensorflow object detection API, but the command to install the dependencies takes hours to install everything.

This would be fine if it weren’t for the fact that I’m in a research setting that requires me to run in a docker container. I could create a custom docker image and solve it that way, but does anyone know how to make this process go faster?

submitted by /u/prinse4515
[visit reddit] [comments]

Categories
Misc

QuickDraw – an online game developed by Google, combined with AirGesture – a simple gesture recognition application

QuickDraw - an online game developed by Google, combined with AirGesture - a simple gesture recognition application submitted by /u/1991viet
[visit reddit] [comments]
Categories
Offsites

Reducing the Computational Cost of Deep Reinforcement Learning Research

It is widely accepted that the enormous growth of deep reinforcement learning research, which combines traditional reinforcement learning with deep neural networks, began with the publication of the seminal DQN algorithm. This paper demonstrated the potential of this combination, showing that it could produce agents that could play a number of Atari 2600 games very effectively. Since then, there have been several approaches that have built on and improved the original DQN. The popular Rainbow algorithm combined a number of these recent advances to achieve state-of-the-art performance on the ALE benchmark. This advance, however, came at a very high computational cost, which has the unfortunate side effect of widening the gap between those with ample access to computational resources and those without.

In “Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research”, to be presented at ICML 2021, we revisit this algorithm on a set of small- and medium-sized tasks. We first discuss the computational cost associated with the Rainbow algorithm. We explore how the same conclusions regarding the benefits of combining the various algorithmic components can be reached with smaller-scale experiments, and further generalize that idea to how research done on a smaller computational budget can provide valuable scientific insights.

The Cost of Rainbow
A major reason for the computational cost of Rainbow is that the standards in academic publishing often require evaluating new algorithms on large benchmarks like ALE, which consists of 57 Atari 2600 games that reinforcement learning agents may learn to play. For a typical game, it takes roughly five days to train a model using a Tesla P100 GPU. Furthermore, if one wants to establish meaningful confidence bounds, it is common to perform at least five independent runs. Thus, to train Rainbow on the full suite of 57 games required around 34,200 GPU hours (or 1425 days) in order to provide convincing empirical performance statistics. In other words, such experiments are only feasible if one is able to train on multiple GPUs in parallel, which can be prohibitive for smaller research groups.

Revisiting Rainbow
As in the original Rainbow paper, we evaluate the effect of adding the following components to the original DQN algorithm: double Q-learning, prioritized experience replay, dueling networks, multi-step learning, distributional RL, and noisy nets.

We evaluate on a set of four classic control environments, which can be fully trained in 10-20 minutes (compared to five days for ALE games):

Upper left: In CartPole, the task is to balance a pole on a cart that the agent can move left and right. Upper right: In Acrobot, there are two arms and two joints, where the agent applies force to the joint between the two arms in order to raise the lower arm above a threshold. Lower left: In LunarLander, the agent is meant to land the spaceship between the two flags. Lower right: In MountainCar, the agent must build up momentum between two hills to drive to the top of the rightmost hill.

We investigated the effect of both independently adding each of the components to DQN, as well as removing each from the full Rainbow algorithm. As in the original Rainbow paper, we find that, in aggregate, the addition of each of these algorithms does improve learning over the base DQN. However, we also found some important differences, such as the fact that distributional RL — commonly thought to be a positive addition on its own — does not always yield improvements on its own. Indeed, in contrast to the ALE results in the Rainbow paper, in the classic control environments, distributional RL only yields an improvement when combined with another component.

Each plot shows the training progress when adding the various components to DQN. The x-axis is training steps,the y-axis is performance (higher is better).
Each plot shows the training progress when removing the various components from Rainbow. The x-axis is training steps,the y-axis is performance (higher is better).

We also re-ran the Rainbow experiments on the MinAtar environment, which consists of a set of five miniaturized Atari games, and found qualitatively similar results. The MinAtar games are roughly 10 times faster to train than the regular Atari 2600 games on which the original Rainbow algorithm was evaluated, but still share some interesting aspects, such as game dynamics and having pixel-based inputs to the agent. As such, they provide a challenging mid-level environment, in between the classic control and the full Atari 2600 games.

When viewed in aggregate, we find our results to be consistent with those of the original Rainbow paper — the impact resulting from each algorithmic component can vary from environment to environment. If we were to suggest a single agent that balances the tradeoffs of the different algorithmic components, our version of Rainbow would likely be consistent with the original, in that combining all components produces a better overall agent. However, there are important details in the variations of the different algorithmic components that merit a more thorough investigation.

Beyond the Rainbow
When DQN was introduced, it made use of the Huber loss and the RMSProp Optimizer. It has been common practice for researchers to use these same choices when building on DQN, as most of their effort is spent on other algorithmic design decisions. In the spirit of reassessing these assumptions, we revisited the loss function and optimizer used by DQN on a lower-cost, small-scale classic control and MinAtar environments. We ran some initial experiments using the Adam optimizer, which has lately been the most popular optimizer choice, combined with a simpler loss function, the mean-squared error loss (MSE). Since the selection of optimizer and loss function is often overlooked when developing a new algorithm, we were surprised to see that we observed a dramatic improvement on all the classic control and MinAtar environments.

We thus decided to evaluate the different ways of combining the two optimizers (RMSProp and Adam) with the two losses (Huber and MSE) on the full ALE suite (60 Atari 2600 games). We found that Adam+MSE is a superior combination than RMSProp+Huber.

Measuring the improvement Adam+MSE gives over the default DQN settings (RMSProp + Huber); higher is better.

Additionally, when comparing the various optimizer-loss combinations, we find that when using RMSProp, the Huber loss tends to perform better than MSE (illustrated by the gap between the solid and dotted orange lines).

Normalized scores aggregated over all 60 Atari 2600 games, comparing the different optimizer-loss combinations.

Conclusion
On a limited computational budget we were able to reproduce, at a high-level, the findings of the Rainbow paper and uncover new and interesting phenomena. Evidently it is much easier to revisit something than to discover it in the first place. Our intent with this work, however, was to argue for the relevance and significance of empirical research on small- and medium-scale environments. We believe that these less computationally intensive environments lend themselves well to a more critical and thorough analysis of the performance, behaviors, and intricacies of new algorithms.

We are by no means calling for less emphasis to be placed on large-scale benchmarks. We are simply urging researchers to consider smaller-scale environments as a valuable tool in their investigations, and reviewers to avoid dismissing empirical work that focuses on smaller-scale environments. By doing so, in addition to reducing the environmental impact of our experiments, we will get both a clearer picture of the research landscape and reduce the barriers for researchers from diverse and often underresourced communities, which can only help make our community and scientific advances stronger.

Acknowledgments
Thank you to Johan, the first author of this paper, for his hard work and persistence in seeing this through! We would also like to thank Marlos C. Machado, Sara Hooker, Matthieu Geist, Nino Vieillard, Hado van Hasselt, Eleni Triantafillou, and Brian Tanner for their insightful comments on this work.