VMware Inc. and NVIDIA today announced the expansion of their strategic partnership to ready the hundreds of thousands of enterprises that run on VMware’s cloud infrastructure for the era of generative AI.
Servers Featuring NVIDIA L40S GPUs and NVIDIA BlueField Coming Soon From Dell Technologies, Hewlett Packard Enterprise and Lenovo to Support VMware Private AI Foundation with NVIDIALAS VEGAS, …
The latest advancements in AI for gaming are in the spotlight today at Gamescom, the world’s largest gaming conference, as NVIDIA introduced a host of technologies, starting with DLSS 3.5, the next step forward of its breakthrough AI neural rendering technology. DLSS 3.5, NVIDIA’s latest innovation in AI-powered graphics is an image quality upgrade incorporated Read article >
On the eve of Gamescom, NVIDIA announced NVIDIA DLSS 3.5 featuring Ray Reconstruction — a new neural rendering AI model that creates more beautiful and realistic ray-traced visuals than traditional rendering methods — for real-time 3D creative apps and games.
Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects. Dally will detail advances in GPU silicon, systems and software that are delivering Read article >
Bill Dally — one of the world’s foremost computer scientists and head of NVIDIA’s research efforts — will describe the forces driving accelerated computing and AI in his keynote address at Hot Chips, an annual gathering of leading processor and system architects. Dally will detail advances in GPU silicon, systems and software that are delivering Read article >
Event: Speech AI Day
On Sept. 20, join experts from leading companies at NVIDIA-hosted Speech AI Day.
On Sept. 20, join experts from leading companies at NVIDIA-hosted Speech AI Day.
Google at Interspeech 2023
This week, the 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023) is being held in Dublin, Ireland, representing one of the world’s most extensive conferences on research and technology of spoken language understanding and processing. Experts in speech-related research fields gather to take part in oral presentations and poster sessions and to build collaborations across the globe.
We are excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonstrations of some of our latest speech technologies, which help to improve accessibility and provide convenience in communication for billions of users. In addition, online attendees are encouraged to visit our virtual booth in Topia where you can get up-to-date information on research and opportunities at Google. Visit the @GoogleAI Twitter account to find out about Google booth activities (e.g., demos and Q&A sessions). You can also learn more about the Google research being presented at INTERSPEECH 2023 below (Google affiliations in bold).
Board and Organizing Committee
ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran
Area Chairs include:
Analysis of Speech and Audio Signals: Richard Rose
Speech Synthesis and Spoken Language Generation: Rob Clark
Special Areas: Tara Sainath
Satellite events
VoxCeleb Speaker Recognition Challenge 2023 (VoxSRC-23)
Organizers include: Arsha Nagrani
ISCA Speech Synthesis Workshop (SSW12)
Speakers include: Rob Clark
Keynote talk – ISCA Medalist
Bridging Speech Science and Technology — Now and Into the Future
Speaker: Shrikanth Narayanan
Survey Talk
Speech Compression in the AI Era
Speaker: Jan Skoglund
Special session papers
Cascaded Encoders for Fine-Tuning ASR Models on Overlapped Speech
Richard Rose, Oscar Chang, Olivier Siohan
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan, Scott Wisdom, Xuankai Chang*, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey
Papers
DeePMOS: Deep Posterior Mean-Opinion-Score of Speech
Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee
O-1: Self-Training with Oracle and 1-Best Hypothesis
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi
Re-investigating the Efficient Transfer Learning of Speech Foundation Model Using Feature Fusion Methods
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno
MOS vs. AB: Evaluating Text-to-Speech Systems Reliably Using Clustered Standard Errors
Joshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark
LanSER: Language-Model Supported Speech Emotion Recognition
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou
Modular Domain Adaptation for Conformer-Based Streaming ASR
Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar
On Training a Neural Residual Acoustic Echo Suppressor for Improved ASR
Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan
MD3: The Multi-dialect Dataset of Dialogues
Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma
Dual-Mode NAM: Effective Top-K Context Injection for End-to-End ASR
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran
How to Estimate Model Transferability of Pre-trained Speech Models?
Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath
Improving Joint Speech-Text Representations Without Alignment
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath
Streaming Parrotron for On-Device Speech-to-Speech Conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal
Semantic Segmentation with Bidirectional Language Models Improves Long-Form ASR
W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath
Universal Automatic Phonetic Transcription into the International Phonetic Alphabet
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang
Mixture-of-Expert Conformer for Streaming Multilingual ASR
Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays
Real Time Spectrogram Inversion on Mobile Phone
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy
2-Bit Conformer Quantization for Automatic Speech Recognition
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He
LibriTTS-R: A Restored Multi-speaker Text-to-Speech Corpus
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna
PronScribe: Highly Accurate Multimodal Phonemic Transcription from Speech and Text
Yang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang
Label Aware Speech Representation Learning for Language Identification
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar
* Work done while at Google
The verdict is in: A GeForce NOW Ultimate membership raises the bar on gaming. Members have been tackling the Ultimate KovvaK’s challenge head-on and seeing for themselves how the power of Ultimate improves their gaming with 240 frames per second streaming. The popular training title that helps gamers improve their aim fully launches in the Read article >