A team of Uber AI researchers has achieved record high scores and beaten previously unsolved Atari games with algorithms that remember and build off their past successes.
A team of Uber AI researchers has achieved record high scores and beaten previously unsolved Atari games with algorithms that remember and build off their past successes.
Highlighted this week in Nature, the Go-Explore family of algorithms to address limitations of traditional reinforcement learning algorithms, which struggle with complex games that provide sparse or deceptive feedback.
Performance on Atari games is a popular benchmark for reinforcement learning algorithms. But many algorithms fail to thoroughly explore promising avenues, instead going off track to find potential new solutions.
In this paper, the researchers applied a simple principle — “first return, then explore,” creating algorithms that remember promising states from past games, return to those states, and then intentionally explore from that point to further maximize reward.
The researchers used a variety of NVIDIA GPUs at OpenAI and Uber data centers to develop the algorithms.
The software determines which plays to revisit by storing screen grabs of past games and grouping together similar-looking images to find starting points it should return to in future rounds.
“The reason our approach hadn’t been considered before is that it differs strongly from the dominant approach that has historically been used for addressing these problems in the reinforcement learning community, called ‘intrinsic motivation,” said researchers Adrien Ecoffet, Joost Huizinga, and Jeff Clune. “In intrinsic motivation, instead of dividing exploration into returning and exploring like we do, the agent is simply rewarded for discovering new areas.”
With the return-and-explore technique, Go-Explore achieved massive improvements on a collection of 55 Atari games, beating state-of-the-art algorithms 85.5 percent of the time. The algorithm set a record — beating both the human world record and past reinforcement learning records — on the complex Montezuma’s Revenge game.
The paper also demonstrated how Go-Explore could be applied to real-world challenges including robotics, drug design, and language processing.
Baseball players have to think fast when batting against blurry-fast pitches. Now, AI might be able to assist. Nick Bild, a Florida-based software engineer, has created an application that can signal to batters whether pitches are going to be balls or strikes. Dubbed Tipper, it can be fitted on the outer edge of glasses to Read article >
Cloud gaming uses powerful, industrial-strength GPUs inside secure data centers to stream your favorite games over the internet to you. So you can play the latest games on nearly any device, even ones that can’t normally play that game. But First, What Is Cloud Gaming? While the technology is complex, the concept is simple. Cloud Read article >
Let’s be blunt. Potentially toxic waste is just about the last thing you want to get in the mail. And that’s just one of the opportunities for AI to make the business of analyzing wastewater better. It’s an industry that goes far beyond just making sure water coming from traditional sewage plants is clean. Just Read article >
Posted by Alejandro Luebs, Software Engineer and Jamieson Brettle, Product Manager, Chrome
Connecting to others online via voice and video calls is something that is increasingly a part of everyday life. The real-time communication frameworks, like WebRTC, that make this possible depend on efficient compression techniques, codecs, to encode (or decode) signals for transmission or storage. A vital part of media applications for decades, codecs allow bandwidth-hungry applications to efficiently transmit data, and have led to an expectation of high-quality communication anywhere at any time.
As such, a continuing challenge in developing codecs, both for video and audio, is to provide increasing quality, using less data, and to minimize latency for real-time communication. Even though video might seem much more bandwidth hungry than audio, modern video codecs can reach lower bitrates than some high-quality speech codecs used today. Combining low-bitrate video and speech codecs can deliver a high-quality video call experience even in low-bandwidth networks. Yet historically, the lower the bitrate for an audio codec, the less intelligible and more robotic the voice signal becomes. Furthermore, while some people have access to a consistent high-quality, high-speed network, this level of connectivity isn’t universal, and even those in well connected areas at times experience poor quality, low bandwidth, and congested network connections.
To solve this problem, we have created Lyra, a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this, we’ve applied traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.
Lyra Overview The basic architecture of the Lyra codec is quite simple. Features, or distinctive speech attributes, are extracted from speech every 40ms and are then compressed for transmission. The features themselves are log mel spectrograms, a list of numbers representing the speech energy in different frequency bands, which have traditionally been used for their perceptual relevance because they are modeled after human auditory response. On the other end, a generative model uses those features to recreate the speech signal. In this sense, Lyra is very similar to other traditional parametric codecs, such as MELP.
However traditional parametric codecs, which simply extract from speech critical parameters that can then be used to recreate the signal at the receiving end, achieve low bitrates, but often sound robotic and unnatural. These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones. DeepMind’s WaveNet was the first of these generative models that paved the way for many to come. Additionally, WaveNetEQ, the generative model-based packet-loss-concealment system currently used in Duo, has demonstrated how this technology can be used in real-world scenarios.
A New Approach to Compression with Lyra Using these models as a baseline, we’ve developed a new model capable of reconstructing speech using minimal amounts of data. Lyra harnesses the power of these new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today. The drawback of waveform codecs is that they achieve this high quality by compressing and sending over the signal sample-by-sample, which requires a higher bitrate and, in most cases, isn’t necessary to achieve natural sounding speech.
One concern with generative models is their computational complexity. Lyra avoids this issue by using a cheaper recurrent generative model, a WaveRNN variation, that works at a lower rate, but generates in parallel multiple signals in different frequency ranges that it later combines into a single output signal at the desired sample rate. This trick enables Lyra to not only run on cloud servers, but also on-device on mid-range phones in real time (with a processing latency of 90ms, which is in line with other traditional speech codecs). This generative model is then trained on thousands of hours of speech data and optimized, similarly to WaveNet, to accurately recreate the input audio.
Comparison with Existing Codecs Since the inception of Lyra, our mission has been to provide the best quality audio using a fraction of the bitrate data of alternatives. Currently, the royalty-free open-source codec Opus, is the most widely used codec for WebRTC-based VOIP applications and, with audio at 32kbps, typically obtains transparent speech quality, i.e., indistinguishable from the original. However, while Opus can be used in more bandwidth constrained environments down to 6kbps, it starts to demonstrate degraded audio quality. Other codecs are capable of operating at comparable bitrates to Lyra (Speex, MELP, AMR), but each suffer from increased artifacts and result in a robotic sounding voice.
Lyra is currently designed to operate at 3kbps and listening tests show that Lyra outperforms any other codec at that bitrate and is compared favorably to Opus at 8kbps, thus achieving more than a 60% reduction in bandwidth. Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality.
Clean Speech
Original
Opus@6kbps
Lyra@3kbps
Speex@3kbps
Noisy Environment
Original
Opus@6kbps
Lyra@3kbps
Speex@3kbps
Reference
Opus@6kbps
Lyra@3kbps
Ensuring Fairness As with any ML based system, the model must be trained to make sure that it works for everyone. We’ve trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners. One of the design goals of Lyra is to ensure universally accessible high-quality audio experiences. Lyra trains on a wide dataset, including speakers in a myriad of languages, to make sure the codec is robust to any situation it might encounter.
Societal Impact and Where We Go From Here The implications of technologies like Lyra are far reaching, both in the short and long term. With Lyra, billions of users in emerging markets can have access to an efficient low-bitrate codec that allows them to have higher quality audio than ever before. Additionally, Lyra can be used in cloud environments enabling users with various network and device capabilities to chat seamlessly with each other. Pairing Lyra with new video compression technologies, like AV1, will allow video chats to take place, even for users connecting to the internet via a 56kbps dial-in modem.
Duo already uses ML to reduce audio interruptions, and is currently rolling out Lyra to improve audio call quality and reliability on very low bandwidth connections. We will continue to optimize Lyra’s performance and quality to ensure maximum availability of the technology, with investigations into acceleration via GPUs and TPUs. We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases).
Acknowledgements Thanks to everyone who made Lyra possible including Jan Skoglund, Felicia Lim, Michael Chinen, Bastiaan Kleijn, Tom Denton, Andrew Storus, Yero Yeh (Chrome Media), Henrik Lundin, Niklas Blum, Karl Wiberg (Google Duo), Chenjie Gu, Zach Gleicher, Norman Casagrande, Erich Elsen (DeepMind).
Learn how to create a cluster of GPU machines and use Apache Spark with Deep Java Library (DJL) on Amazon EMR to leverage large-scale image classification in Scala.
Learn how to create a cluster of GPU machines and use Apache Spark with Deep Java Library (DJL) on Amazon EMR to leverage large-scale image classification in Scala.
In this post, NVIDIA engineers show you how to use production-quality AI models such as License Plate Detection (LPD) and License Plate Recognition (LPR) models in conjunction with the NVIDIA Transfer Learning Toolkit (TLT).
In this post, NVIDIA engineers show you how to use production-quality AI models such as License Plate Detection (LPD) and License Plate Recognition (LPR) models in conjunction with the NVIDIA Transfer Learning Toolkit (TLT).
Step-by-step tutorial to develop a voice-based virtual assistant and learn what it takes to integrate Jarvis ASR and TTS with Rasa NLP and Dialog Management (DM).
Develop a voice-based virtual assistant and learn what it takes to integrate Jarvis ASR and TTS with Rasa NLP and Dialog Management (DM).