Categories
Misc

NVIDIA Merlin Latest Enhancements Streamlines Recommender Workflows with .5 Release

The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders.

Billions of people in the world are online. Many discrete moments online are spent browsing, shopping, streaming entertainment, or engaging with social media. Each discrete moment, or session, online is an opportunity for recommenders to make informed decisions a bit easier, faster, and more personalized for an individual person.Yet, when considering scale, this translates into recommenders potentially supporting billions of people interacting with trillions of things online.

At GTC Spring 2021, NVIDIA shared how retail, entertainment, on-demand, and social companies are building and utilizing recommenders at scale including early adopters of NVIDIA Merlin. Merlin open source components include NVTabular for ETL, HugeCTR for training, and Triton for inference. The NVIDIA Merlin team continues to ingest feedback from early adopters to streamline recommender workflows for machine learning engineers. The latest Merlin .5 update includes a data generator for training, multi-GPU dataloader, and initial support for session-based recommenders. Also, the update continuously reaffirms NVIDIA’s commitment to democratizing and streamlining recommender workflows. 

Supporting Experimentation and Streamlining Recommender Workflows 

Ongoing experimentation is vital for fine tuning recommender models performance before models are deployed to production. A configurable data generator, using synthetic data, helps machine learning engineers calculate the probability distribution to be uniform or power-law for categorical features, without modifying the configuration file. Merlin HugeCTR’s new data generator considers categorical data and is particularly helpful for benchmarking and research purposes. 

Merlin .5’s inclusion of a multi-GPU dataloader was based on feedback from Merlin early adopters and also helps streamline workflows. Machine learning engineers are able to use the Merlin NVTabular TensorFlow (TF) dataloader for multi-GPU training on a  single node using TF Distributed. Merlin NVTabular utilizes Dask and Dask-cuDF to scale easily to multi-GPU and multi-node as well as provide a high-performance recommender specific ETL pipeline.

Merlin Session-Based Recommenders Support: Just A Beginning 

Data scientists and machine learning engineers at the forefront of e-commerce, news, and social media recommender work have added, or are considering to add, session-based recommenders. While collaborative filtering and content-based filtering are established recommender methods, session-based recommenders are gaining attention due to the potential increased accuracy of predictions when users interests are dynamic and specific to a shorter time frame (i.e., within a session). With Merlin .5, NVTabular provides new preprocessing functionality needed to transform and group data for session based-recommenders.

 Download and Try Merlin’s Latest Update

The latest preprocessing and training enhancements to NVIDIA Merlin reaffirms NVIDIA’s commitment to democratizing and accelerating recommender workflows. As machine learning engineers and data scientists use a hybrid of libraries, packages, tools, and techniques to create effective and impactful recommenders, Merlin components are designed to be easy-to-use and interoperable with existing recommender workflows. 

To discover hands-on how Merlin components streamline recommender workflows, download and try Merlin NVTabular for ETL, HugeCTR for training, and Triton for inference.

Categories
Misc

Creating Tensorflow Dataset for Object Recognition in Keras

Hi,

I was wondering if someone could aid me in solving this problem. I have been following this tutorial, which uses the COCO dataset from tfds.

I am interested in using a different dataset, but I am having trouble adapting the code.

My dataset consists of a number of images, with corresponding bounding box annotations in a .csv. It can be summarized as this: [filename, xmin, ymin, xmax, ymax, class].

The code in this tutorial uses (to my understanding) a tensor dataset with the format [image_array, xmin, ymin, xmax, ymax, class].

How can I load this data in this format? I have been having a great deal of trouble finding any resources. Any help is greatly appreciated! I will mention how I have been approaching this below.

Summary of things applied:

Essentially, I have been able to load everything into a pandas dataframe with 6 columns, consisting of [filename, xmin, ymin, xmax, ymax, class]. However, I feel this to be inefficient, and I cannot get the last step (conversion to a tensor).

I try: d = tf.data.Dataset.from_tensors((df.values))

and get: ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

submitted by /u/Puzzled_Supports
[visit reddit] [comments]

Categories
Misc

How can I do grid to grid transitions with Tensorflow?

This is a complete newbie question. I’ve used Tensorflow before but only with Keras for classifying. I’m not all that knowledgeable about Tensorflow’s capabilities beyond that.

I have a problem where I want to computer the next step in a sequence. The input and output data are both in the form of 2d grids.

eg:

0 1 0 0 0 0 0 1 0 -> 1 1 1 0 1 0 0 0 0 

The problem is a bit more complex than that, but I think that’s the simplest example. The actual domain is hydrodynamics, how waves and currents interact with objects, land, and ships. This can be done with a simulation, but that is extremely computationally expensive. If I can get a “near enough” result in a much shorter time span that is acceptable.

There are many sub problems within this domain, and practically any simulation solution is a compromise.

I have been thinking about this problem seeing this video https://www.youtube.com/watch?v=2Bw5f4vYL98& – but I’m no AI researcher so the paper is beyond my level.

Do you think this is a problem that Tensorflow can be used to tackle?

submitted by /u/fgyoysgaxt
[visit reddit] [comments]

Categories
Offsites

Google at ICLR 2021

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.

As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).

Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath

Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn

Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li

Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare

Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder

Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer

Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller

Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu

Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi

Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang

Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller

Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu

Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed

Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith

A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare

Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang

Implicit Gradient Regularization
David Barrett, Benoit Dherin

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk

C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister

No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*

Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla

Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz

The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy

Weakly Supervised Learning
Invited Speakers include: Lu Jiang

Learning to Learn
Organizers include: Yevgen Chebotar

Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine

Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh

S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin

Responsible AI (RAI)
Speakers include: Been Kim

Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch

A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu

2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro

Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker

* Indicates work done while at Google

Categories
Misc

Delivering fast recommendations from Google Analytics 360 SQL Knowledge Graph with RAPIDS cuGraph

Introduction In part 1 of this blog series, we introduced The GA360 SQL Knowledge Graph that timbr has created, acting as a user-friendly strategic tool that shortens time to value. We discussed how users can conveniently connect GA360 exports to BigQuery in no time with the use of an SQL Ontology Template, which allows users … Continued

Introduction

In part 1 of this blog series, we introduced The GA360 SQL Knowledge Graph that timbr has created, acting as a user-friendly strategic tool that shortens time to value. We discussed how users can conveniently connect GA360 exports to BigQuery in no time with the use of an SQL Ontology Template, which allows users to understand, explore and query the data by means of concepts, instead of dealing with many tables and columns. In addition to the many features and capabilities the GA360 SQL Knowledge Graph has to offer, we touched on the fact that the knowledge graph, queryable in SQL is empowered with graph algorithms.

In this second part of our blog series, we take a deep dive into the use of graph algorithms with our GA360 SQL Knowledge Graph. We do so with RAPIDS cuGraph created by NVIDIA, a collection of powerful graph algorithms implemented over NVIDIA GPUs, leveraging our ability to analyze data at unmatched speeds.

Google analytics and big query knowledge graph with rapids cugraph
Figure 1:  The timbr SQL Knowledge Graph combining Google Analytics & Big Query, empowered by RAPIDS cuGraph.

RAPIDS cuGraph is paving the way in the graph world with multi-GPU graph analytics, allowing users to scale to billion and even trillion scale graphs, with performance speeds never seen before. cuGraph is equipped with many graph algorithms, falling into the following classes: Centrality, Community, Components, Core, Layout, Linear Assignment, Link Analysis, Link Prediction, Traversal, Structure, and other unique algorithms.

As there is a large rise in interest among companies to improve their analytics and boost their performance, many companies are turning to different graph options to get more out of their data. One of the main things holding companies back is the different implementation costs and barriers of adoption of these new technologies. Learning new languages to connect between your existing data infrastructure and new graph technologies is not only a headache but also a large expense.

This is where timbr comes into the picture. Timbr dramatically reduces implementation costs, as it does not require companies to transfer data or learn new languages. Instead, timbr acts as a virtual layer over the company’s existing data infrastructure, turning simple columns and tables into an easily accessible knowledge graph empowered with tools for exploration, visualization, and querying of data using graph algorithms. This is all delivered in a semantic SQL familiar to every analyst. 

In the blog post, we demonstrate the power of RAPIDS cuGraph combined with timbr by applying the Louvain Community Detection Algorithm as well as the Jaccard Similarity Algorithm on The GA360 SQL Knowledge Graph.

The Louvain Community Detection Algorithm

The Louvain community detection algorithm is used for detecting communities in large networks with a high density of connections, helping us uncover the different connections in a network. 

In order to understand the connections and be able to quantify their strength, we use what’s called modularity. Modularity is used to measure the strength of a partitioning of a graph into groups, clusters or communities, by constructing a score representing the modularity of a particular partitioning. The higher the modularity score, the denser the connections between the nodes in that community. 

Community detection is used today in many industries for many different reasons. The banking industry uses community detection for fraud analysis to find anomalies and evaluate whether a group has just a few discrete bad behaviors or is acting as a fraud ring. The health industry uses community detection to investigate different biological networks to identify various disease modules. The stock market uses community detection to build portfolios based on the correlation of stock prices.

So, with the many different uses of community detection that exist today, what can be done with community detection when it comes to Google Analytics? And how will it function when being used with RAPIDS cuGraph in comparison to the standard CPU in the form of NetworkX.

Let’s take a look:

After connecting our Google Analytics knowledge graph to cuGraph, we began with our first query requesting to see all the products purchased, allocated to their specific communities based on the customers who purchased these products. To run this query we used the gtimbr schema, which is timbr’s virtual schema for running graph algorithms. For the algorithm, we used the Louvain algorithm for community detection. 

Here is the exact query that was used:

SELECT id as productsku, community
FROM gtimbr.louvain(
 SELECT distinct productsku, info_of[hits].has_session[ga_sessions].fullvisitorid
 FROM dtimbr.product )

To understand the difference in performance when running this algorithm query, we tested running this query with cuGraph and then ran it again using NetworkX.

Here are the performance difference:

cuGraph vs NetworkX Performance Speeds

NetworkX   RAPIDS cuGraph
Performance Speed 5 seconds 0.04 seconds

After running the query, we received a list of 982 products, each product belonging to a community ID.

Having a large list of products, we needed to understand how many communities we were dealing with, so we ran the following query:

SELECT community, count(productsku) as number_of_products
FROM dtimbr.product_community
GROUP BY community
ORDER BY 1 asc

There were our results:

timbr BI results for community detection.
Figure 2: Product community detection results.

It was pretty clear from our results that having 982 products represented by just 6 communities makes it hard to understand the connections between the different products and customers. To resolve this, we decided to drill down into each community and create sub-communities to really highlight which products belong with which customers.

The first step was to simply create a new concept in the knowledge graph called product_community and map the data to it from our first query showing each of the 982 products and the community they belong to.

 timbr knowledge graph data model concepts.
Figure 3: The product_community concept in the Knowledge Graph model.

Mapping the data was then performed in simple SQL, similar to the syntax used when creating and mapping data with tables and columns in relational databases, and looked as follows:

CREATE OR REPLACE MAPPING map_product_community into product_community AS
 SELECT id as productsku, community
 FROM gtimbr.louvain(
       SELECT distinct productsku, info_of[hits].has_session[ga_sessions].fullvisitorid
       FROM dtimbr.product )

Now that we had a concept representing our products by the community, we created a second concept called product_community_level2 to represent the sub-communities of our original 6 communities.

timbr knowledge graph ontology model.
Figure 4: The concept representing sub-communities in the Knowledge Graph model.

To create the sub-communities for our new concept we created a new mapping for each of the 6 original communities. So for example, here is the new mapping to present the sub-communities for community ID “0”:

 CREATE OR REPLACE MAPPING map_product_community_level2_0 into product_community_level2 AS
 SELECT id as productsku, community, 0 as parent_community
 FROM gtimbr.louvain(
       SELECT distinct productsku, `info_of[hits].has_session[ga_sessions].fullvisitorid`
       FROM dtimbr.product
       WHERE productsku IN (SELECT distinct productsku
                            FROM dtimbr.product_community
                            WHERE community = 0))

Once we queried and mapped the data of all the sub-communities to the new concept, we decided to view the results in our built-in BI Tool and created the following bar chart where we can clearly see the breakdown of the sub-communities for each of the original 6 communities:

timbr BI community algorithms.
Figure 5: Community results as a bar chart in timbr’s built in BI model.

Lastly, we want to view the communities and sub-communities with all their products using timbr’s data explorer. We entered the concepts with the specific communities we wanted and asked to see their products. In this case, we asked for community numbers 0,1, and 4 as well as their sub-communities showing us products by sub-community within the larger community.

timbr data exploration with community algorithms.
Figure 6: Product community detection on a graph interface.

If we zoom in for example on community ID number “0”, we can see all the different product numbers appearing as pink nodes. Each product number is connected to the different sub-community that it belongs to, which appear as light blue nodes on the graph.

timbr graph module with community algorithms
Figure 7: Close up of community number “0” and its sub communities.

Link Prediction using Jaccard Similarity Algorithm

A variety of Similarity Algorithms are used today, algorithms such as Jaccard, Cosine, Pearson, Overlap, and others. In our GA Knowledge Graph, we demonstrate the use of the Jaccard similarity with an emphasis on link prediction.

The Jaccard similarity algorithm examines the similarity between different pairs and sets of items, whether it be people, products, or anything else. When using the similarity algorithm, we become exposed to connections between different pairs of people or items that we would have never been able to identify without the use of this unique algorithm.

The use of link prediction with similarity acts as a recommendation algorithm, which extends the idea of linkage measure to a recommendation in a bipartite network. In our case, a bipartite is a network of products and customers.

The Similarity Algorithm and link prediction are used today in many different use cases. Social networks use this algorithm for many different uses, including making recommendations to users regarding who to connect with based on similar relationships and connections, to deciding what advertisements to post for which users based on common interests with other users, all the way to offering a user a product based on the fact that the similarity algorithm matched him with a different user who bought that same product, which we will touch on shortly.

Governments use the algorithm to compare populations. Scientists use the algorithm to discover connections between different biological components, enabling them to safely develop new drugs, all the way to companies using the algorithm to advance their machine learning efforts and link prediction analysis.

So, now let’s apply similarity and link prediction in our Google Analytics Knowledge Graph empowered by NVIDIA’s cuGraph and witness its strength.

We began by creating a relationship called similar in our concept ga_sessions, a concept which contains data about all the sessions and visits of users to our website. Unlike in our example earlier on community detection, where we created a relationship between two concepts in our knowledge graph, here we decided to create a relationship between `ga_sessions` and itself, where the relationship would calculate the similarity between customers that searched for more than 1 similar keyword.

Similarity relationships in the knowledge graph.
Figure 8: The similarity relationship in the Knowledge Graph model.

Once the relationship was created, it was time to map the data to the relationship. timbr allows us to not only map data to concepts but also to map data directly to relationships and extend them with their own properties (thus building a Property Graph).

The query we ran give us the similarity between customers that searched for more than 1 similar keyword, which we later mapped to our relationship similar in ga_session went as follows:

SELECT id as fullvisitorid, similar_id as similarid, similarity
FROM gtimbr.jaccard(
	SELECT `has_session[ga_sessions].fullvisitorid`, keyword
	FROM dtimbr.traffic_source
	WHERE `has_session[ga_sessions].fullvisitorid` in (
	-- Users that searched more than 1 keyword
		SELECT distinct `has_session[ga_sessions].fullvisitorid` as id
		FROM dtimbr.traffic_source
		WHERE keyword IS NOT NULL AND keyword != '(not provided)'
		GROUP BY `has_session[ga_sessions].fullvisitorid`
		HAVING count(1) > 1)
	AND keyword IS NOT NULL AND keyword != '(not provided)')

Once again, we compared the performance speeds running this algorithm query with cuGraph and NetworkX and received the following results:

cuGraph vs NetworkX Performance Speeds

  NetworkX  RAPIDS cuGraph  
       
Performance Speed 24 seconds 0.04 seconds [1] 

In the query, we asked for the fullvisitorid which is the user ID, the similar_id which returns IDs of similar users, as well as similarity which returns a Jaccard similarity score for each match of user IDs.

These were the results:

Similarity between users in timbr BI module.
Figure 9: Results of similarity between users.

Next, we wanted to create a visualization using our similar relationship that we have created. We wrote the following query to do so:

SELECT DISTINCT  `fullvisitorid` AS user_id,
 `similar[ga_sessions].fullvisitorid` AS similar_user_id,
 `similar[ga_sessions]_similarity` AS similarity
FROM  dtimbr.ga_sessions
WHERE `similar[ga_sessions].fullvisitorid` IS NOT NULL

Now that we had all the visitor IDs and similar visitor ID’s that had more than 1 similar keyword, as well as _similarity where the underline before similarity represents the similarity score in the relationship, we were ready to visualize the results.

We decided to choose the Sankey Diagram to represent our findings.

Similarity between users using the Sankey diagram in timbr's BI module.
Figure 10: Similarity results as a sankey diagram in timbr’s built in BI module.

We were able to see the different users on the left and right side connecting to their similar match in the middle. Many of these users both in the middle and on the sides connected to multiple users.

Combining the community and similarity algorithms

In our final example, we decided to build a recommendation query combing the relationships we’ve created for our community and similarity algorithms.

Here is the recommendation query:

SELECT similar[ga_sessions].has_hits[hits].ecommerce_product_data[product].productsku as productsku,
similar[ga_sessions].has_hits[hits].ecommerce_product_data[product].in_community[product_community].community AS community, 
COUNT(distinct similar[ga_sessions].fullvisitorid) num_of_users
FROM  dtimbr.ga_sessions
WHERE fullvisitorid = '9209808985108850988'
AND similar[ga_sessions].has_hits[hits].ecommerce_product_data[product].productsku is not null
GROUP BY similar[ga_sessions].has_hits[hits].ecommerce_product_data[product].productsku,  similar[ga_sessions].has_hits[hits].ecommerce_product_data[product].in_community[product_community].community
ORDER BY num_of_users DESC

What we did in this query, is we choose to focus on a random visitor with ID number ‘9209808985108850988’. We wanted the query to recommend products for our chosen user based on similar users who bought the recommended products. In order to gain more insight, we decided to ask for the communities that the recommended products belong to and see if there are any visible trends.

After running the query, the following results returned:

- Similarity and community query results in timbr's BI module.
Figure 11: Community and similarity combined results.

We were able to see the list of recommended products for user ID number ‘9209808985108850988’, and the community these products belong to, as well as the number of similar users to user ‘9209808985108850988’ who bought the specific product on the list.

Interestingly enough, the products recommended for user ‘9209808985108850988’ that have the most similar users who bought these products seem to largely fall in community ‘1’. If we were investigating further, this could have directed us to check whether it’s worth recommending user ‘9209808985108850988’ with more products belonging to community ‘1’.

Conclusion

We were able to demonstrate timbr’s advanced semantic capabilities using the Google Analytics Knowledge Graph combined with Graph Algorithms all in simple SQL, allowing us to analyze, explore and visualize our data. Not only were we able to perform in-depth analysis, but were able to do so with extremely high-performance speeds, going through large amounts of data in a matter of seconds.

We were able to accomplish our tasks and leverage our knowledge graph by connecting with RAPIDS cuGraph and NVIDIA GPU capabilities. Using cuGraph allowed us to query and analyze a mass amount of data in a fraction of the time it would have taken to do so using the standard CPU in NetworkX.

Click on timbr.ai and learn how you can leverage your data and performance speeds like never before.

Categories
Misc

Tf.data.dataset: Repeat, Batch, Shuffle

Tf.data.dataset: Repeat, Batch, Shuffle submitted by /u/Denis_Vo
[visit reddit] [comments]
Categories
Misc

TensorFlow significantly slower than PyTorch when training a small number of batches

Recently, I am learning and playing around with Deep Reinforcement Learning. Basically, for many DRL algorithms, we need to train a single batch with 1 epoch at a time. I observed that TensorFlow 2 performs significantly slower (9 – 22 times slower) than PyTorch.

It is the first time I met this problem. I used to do more supervised computer vision tasks, therefore, I suspect that the performance issue is caused by a small number of batches per epoch/training (since, unlike DRL, common CV tasks have a lot of batches and epochs, I saw only a minor performance difference between the two frameworks).

However, I could not solve the problem, I asked on StackOverflow and even opened an issue, nobody answered yet. I personally prefer TensorFlow, so I don’t want to move to PyTorch unless I have to. I just wonder if anyone can help explain why or help me to improve the performance on a small number of batches.

Github Issue with reproducible code and more detailed explanation:

https://github.com/tensorflow/tensorflow/issues/48844

Any help would be appreciated, thank you so much!

submitted by /u/seermer
[visit reddit] [comments]

Categories
Misc

Use TensorFlow to run basic regressions like Google AutoML (tabular)

So I have a powerful machine,… at least I think I do. With a Geforce 3080 and all that. Anyways, I’m fairly new to the ML game. Really liked the Google’s AutoML where I just feed a spreadsheet and it did MAE, RMSLE, etc. But because I’m new, I can’t afford paying for node hours. Is it possible to basically run the same simulation on my Windows PC? Got Tensorflow installed, didn’t enable the GPU yet.

submitted by /u/WhoKnows2019
[visit reddit] [comments]

Categories
Misc

What is the best operating system for machine learning, deep learning and TensorFlow?

Hi !!!,

What is the best operating system for machine learning, deep learning?

I would like to go deeper into this area, how can I start?

Thanks!!!

submitted by /u/KIProf
[visit reddit] [comments]

Categories
Misc

Building World-Class AI Models with NVIDIA NeMo and DefinedCrowd

Speech is the most natural form of human communication. So, it’s not surprising that we’ve always wanted to interact with and command machines by voice. However, for conversational AI to provide a seamless, natural, and human-like experience, it needs to be trained on large amounts of data representative of the problem the model is trying … Continued

Speech is the most natural form of human communication. So, it’s not surprising that we’ve always wanted to interact with and command machines by voice. However, for conversational AI to provide a seamless, natural, and human-like experience, it needs to be trained on large amounts of data representative of the problem the model is trying to solve. The difficulty for machine learning teams is the scarcity of this high-quality, domain-specific data.

Companies are trying to solve this problem and accelerate the widespread adoption of conversational AI with innovative solutions that guarantee the scalability and internationality of models. NVIDIA and DefinedCrowd are two such companies. By providing machine learning engineers with a model-building toolkit and high-quality training data respectively, NVIDIA and DefinedCrowd integrate to create world-class AI simply, easily, and quickly.

DefinedCrowd, a one-stop shop for AI training data

I am the director of machine learning at DefinedCrowd, and our core business is providing high-quality AI training data to companies building world-class AI solutions. Our customers can access this data through DefinedData, an online marketplace of off-the-shelf AI training data available in multiple languages, domains, and recording types.

If you can’t find what you’re looking for in DefinedData, our workflows can serve as standalone or end-to-end data services to build any speech– or text-enabled AI architecture from scratch, to improve solutions already developed, or to evaluate models in production, all with the DefinedCrowd quality guarantee.

Creating conversational AI applications the easy way

NVIDIA NeMo is a toolkit built by NVIDIA for creating conversational AI applications. This toolkit includes collections of pretrained modules for automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS), enabling researchers and data scientists to easily compose complex neural network architectures and focus on designing their applications.

Video: Watch how simple and fast it is to create world class conversational AI with NVIDIA NeMo and DefinedCrowd. Build a world-class model quickly and easily with the NeMo toolkit, and train it for high-performance with high-quality training data from DefinedCrowd.

NeMo and DefinedCrowd integration

Here’s how to connect DefinedCrowd speech workflows to train and improve an ASR model using NVIDIA NeMo.  The code can also be accessed on this Google Colab link.

Step 1: Install NeMo Toolkit and dependencies

# First, install NeMo Toolkit and dependencies to run this notebook
!apt-get install -y libsndfile1 ffmpeg
!pip install Cython

## Install NeMo dependencies in the correct versions
!pip install torchtext==0.8.0 torch==1.7.1 pytorch-lightning==1.2.2

## Install NeMo
!python -m pip install nemo_toolkit[all]==1.0.0b3

Step 2: Obtain data using the DefinedCrowd API

Here’s how to connect to the DefinedCrowd API to obtain speech collected data. For more information, see DefinedCrowd API (v2).

# For the demo, use a sandbox environment
auth_url = "https://sandbox-auth.definedcrowd.com"
api_url = "https://sandbox-api.definedcrowd.com"

# These variables should be obtained at the DefinedCrowd Enterprise Portal for your account.
client_id = ""
client_secret = ""
project_id = ""

Authentication

payload = {
    "client_id": client_id,
    "client_secret": client_secret,
    "grant_type": "client_credentials",
    "scope": "PublicAPIv2",
}
files = []
headers = {}

# request the Auth 2.0 access token
response = requests.request(
    "POST", f"{auth_url}/connect/token", headers=headers, data=payload, files=files
)
if response.status_code == 200:
    print("Authentication success!")
    access_token = response.json()["access_token"]
else:
    print("Authentication Failed")

Authentication success!

List of deliverables

# GET /projects/{project-id}/deliverables
headers = {"Authorization": "Bearer " + access_token}
response = requests.request(
    "GET", f"{api_url}/projects/{project_id}/deliverables", headers=headers
)

if response.status_code == 200:
    # Pretty print the response
    print(json.dumps(response.json(), indent=4))

    # Get the first deliverable ID
    deliverable_id = response.json()[0]["id"]

[
    {
        "projectId": "eb324e45-c4f9-41e7-b5cf-655aa693ae75",
        "id": "258f9e15-2937-4846-b9c3-3ae1164b7364",
        "type": "Flat",
        "fileName": "data_Flat_eb324e45-c4f9-41e7-b5cf-655aa693ae75_258f9e15-2937-4846-b9c3-3ae1164b7364_2021-03-22-14-34-37.zip",
        "createdTimestamp": "2021-03-22T14:34:37.8037259",
        "isPartial": false,
        "downloadCount": 2,
        "status": "Downloaded"
    }
]

Final deliverable for speech data collection

# Name to give to the deliverable file
filename = "scripted_monologue_en_GB.zip"

# GET /projects/{project-id}/deliverables/{deliverable-id}/download
headers = {"Authorization": "Bearer " + access_token}
response = requests.request(
    "GET",
    f"{api_url}/projects/{project_id}/deliverables/{deliverable_id}/download/",
    headers=headers,
)

if response.status_code == 200:
    # save the deliverable file
    with open(filename, "wb") as fp:
        fp.write(response.content)
    print("Deliverable file saved with success!")

Deliverable file saved with success!

# Extract the contents from the downloaded file
!unzip  scripted_monologue_en_GB.zip &> /dev/null
!rm -f en-gb_single-scripted_Dataset.zip

Step 3: Analyze the speech dataset

Here’s how to analyze the data received from DefinedCrowd. The data is built of scripted speech data collected by the DefinedCrowd Neevo platform from several speakers in the UK (crowd members from DefinedCrowd).

Each row of the dataset contains information about the speech prompt, crowd member, device used, and the recording. The following data is found with this delivery:

  • Recording:
    • RecordingId
    • PromptId
    • Prompt
  • Audio File:
    • RelativeFileName
    • Duration
    • SampleRate
    • BitDepth
    • AudioCommunicationBand
    • RecordingEnvironment
  • Crowd Member:
    • SpeakerId
    • Gender
    • Age
    • Accent
    • LivingCountry
  • Recording Device:
    • Manufacturer
    • DeviceType
    • Domain

This data can be used for multiple purposes, but in this tutorial, I use it for improving an existent ASR model for British speakers.

import pandas as pd

# Look in the metadata file
dataset = pd.read_csv("metadata.tsv", sep="t", index_col=[0])

# Check the data for the first row
dataset.iloc[0]

RecordingId                               165559628
PromptId                                   64977250
RelativeFileName                Audio/165559628.wav
Prompt                    The Avengers' extinction.
Duration                               00:00:02.815
SpeakerId                                    128209
Gender                                       Female
Age                                              26
Manufacturer                                  Apple
DeviceType                                iPhone 6s
Accent                                      Suffolk
Domain                                      generic
SampleRate                                    16000
BitDepth                                         16
AudioCommunicationBand                    Broadband
LivingCountry                        United Kingdom
Native                                         True
RecordingEnvironment                         silent
Name: 0, dtype: object

# How many rows do you have?
len(dataset)

50000

# Check some examples from the dataset
import librosa
import IPython.display as ipd

for index, row in dataset.sample(4, random_state=1).iterrows():

    print(f"Prompt: {dataset.iloc[index].Prompt}")
    audio_file = dataset.iloc[index].RelativeFileName

    # Load and listen to the audio file
    audio, sample_rate = librosa.load(audio_file)
    ipd.display(ipd.Audio(audio, rate=sample_rate))

For audio samples, see the DefinedCrowd x NeMo – ASR Training tutorial on Google Colab.

Step 4: Prepare the data

After downloading the speech data from DefinedCrowd API, you must adapt it for the format expected by NeMo for ASR training. For this, you create manifests for the training and evaluation data, including each audio file’s metadata.

NeMo requires that you adapt the data to a particular manifest format. Each line corresponding to one audio sample, so the line count equals the number of samples represented by the manifest. A line must contain the path to an audio file, the corresponding transcript, and the audio sample duration. For example, here is what one line might look like in a NeMo-compatible manifest:

{"audio_filepath": "path/to/audio.wav", "duration": 3.45, "text": "this is a nemo tutorial"}

For the creation of the manifest, also standardize the transcripts.

import os

# Function to build a manifest
def build_manifest(dataframe, manifest_path):
    with open(manifest_path, "w") as fout:
        for index, row in dataframe.iterrows():
            transcript = row["Prompt"]

            # The model uses lowercased data for training/testing
            transcript = transcript.lower()

            # Removing linguistic marks (they are not necessary for this demo)
            transcript = (
                transcript.replace("", "")
                .replace("", "")
                .replace("[b_s/]", "")
                .replace("[uni/]", "")
                .replace("[v_n/]", "")
                .replace("[filler/]", "")
                .replace('"', "")
                .replace("[n_s/]", "")
            )

            audio_path = row["RelativeFileName"]

            # Get the audio duration
            try:
                duration = librosa.core.get_duration(filename=audio_path)
            except Exception as e:
                print("An error occurred: ", e)

            if os.path.exists(audio_path):
                # Write the metadata to the manifest
                metadata = {
                    "audio_filepath": audio_path,
                    "duration": duration,
                    "text": transcript,
                }
                json.dump(metadata, fout)
                fout.write("n")
            else:
                continue

Step 5: Train and test splits

To test the quality of the model, you must reserve some data for model testing. Evaluate the model performance on this data.

import json
from sklearn.model_selection import train_test_split
# Split 10% for testing (500 prompts) and 90% for training (4500 prompts)
trainset, testset = train_test_split(dataset, test_size=0.1, random_state=1)
# Build the manifests
build_manifest(trainset, "train_manifest.json")
build_manifest(testset, "test_manifest.json")

Step 6: Configure the model

Here’s how to use the QuartzNet15x5 model as a base model for fine-tuning with the data. To improve the recognition of the dataset, benchmark the model performance on the base model and later, on the fine-tuned version. Some of the following functions were retrieved from the Nemo Tutorial on ASR.

# Import Nemo and the functions for ASR
import torch
import nemo
import nemo.collections.asr as nemo_asr
import logging
from nemo.utils import _Logger
# Set up the log level by NeMo
logger = _Logger()
logger.set_verbosity(logging.ERROR)

Step 7: Set training parameters

For training, NeMo uses a Python dictionary as data structure to keep all the parameters. For more information, see the NeMo ASR Config User Guide.

For this tutorial, load a preexisting file with the standard ASR configuration and change only the necessary fields.

## Download the config to use in this example
!mkdir configs
!wget -P configs/ https://raw.githubusercontent.com/NVIDIA/NeMo/main/examples/asr/conf/config.yaml &> /dev/null

# --- Config Information ---#
from ruamel.yaml import YAML

config_path = "./configs/config.yaml"

yaml = YAML(typ="safe")
with open(config_path) as f:
    params = yaml.load(f)

Step 8: Download the base model

For the ASR model, use a pretrained QuartzNet15x5 model from the NGC catalog.

QuartzNet15x5 model trained on six datasets: LibriSpeech, Mozilla Common Voice (validated clips from en_1488h_2019-12-10), WSJ, Fisher, Switchboard, and NSC Singapore English. It was trained with Apex/Amp optimization level O1 for 600 epochs. The model achieves a WER of 3.79% on LibriSpeech dev-clean, and a WER of 10.05% on dev-other.

# This line downloads the pretrained QuartzNet15x5 model from NGC and instantiates it for you
quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En", strict=False)

Step 9: Evaluate the base model performance

The word error rate (WER) is a valuable measurement tool for comparing different ASR model and evaluating improvements within one system. To obtain the results, assess how the model performs by using the testing set.

# Configure the model parameters for testing

# Parameters for training, validation, and testing are specified using the 
# train_ds, validation_ds, and test_ds sections of your configuration file

# Bigger batch-size = bigger throughput
params["model"]["validation_ds"]["batch_size"] = 8

# Set up the test data loader and make sure the model is on GPU
params["model"]["validation_ds"]["manifest_filepath"] = "test_manifest.json"
quartznet.setup_test_data(test_data_config=params["model"]["validation_ds"])

# Comment out this line if you don't want to use GPU acceleration
_ = quartznet.cuda()

# Compute the WER metric between the hypothesis and predictions.

wer_numerators = []
wer_denominators = []

# Loop over all test batches.
# Iterating over the model's `test_dataloader` gives you:
# (audio_signal, audio_signal_length, transcript_tokens, transcript_length)
# See the AudioToCharDataset for more details.
with torch.no_grad():
    for test_batch in quartznet.test_dataloader():
        input_signal, input_signal_length, targets, targets_lengths = [x.cuda() for x in test_batch]
                
        log_probs, encoded_len, greedy_predictions = quartznet(
            input_signal=input_signal, 
            input_signal_length=input_signal_length
        )
        # The model has a helper object to compute WER
        quartznet._wer.update(greedy_predictions, targets, targets_lengths)
        _, wer_numerator, wer_denominator = quartznet._wer.compute()
        wer_numerators.append(wer_numerator.detach().cpu().numpy())
        wer_denominators.append(wer_denominator.detach().cpu().numpy())

# First, sum all numerators and denominators. Then, divide.
print(f"WER = {sum(wer_numerators)/sum(wer_denominators)*100:.2f}%")

WER = 39.70%

Step 10: Fine-tune the model

The base model got 39.7% of WER, which is not so good. Maybe providing some data from the same domain and language dialects can improve the ASR model. For simplification, train for only one epoch using DefinedCrowd’s data.

import pytorch_lightning as pl
from omegaconf import DictConfig
import copy

# Before training, you must provide the train manifest for training
params["model"]["train_ds"]["manifest_filepath"] = "train_manifest.json"

# Use the smaller learning rate for fine-tuning
new_opt = copy.deepcopy(params["model"]["optim"])
new_opt["lr"] = 0.001
quartznet.setup_optimization(optim_config=DictConfig(new_opt))

# Batch size depends on the GPU memory available
params["model"]["train_ds"]["batch_size"] = 8

# Point to the data to be used for fine-tuning as the training set
quartznet.setup_training_data(train_data_config=params["model"]["train_ds"])

# Clean the torch cache
torch.cuda.empty_cache()

# Now you can create a PyTorch Lightning trainer.
trainer = pl.Trainer(gpus=1, max_epochs=1)

# The fit function starts the training
trainer.fit(quartznet)

Step 11: Compare model performance

Compare the final model performance with the fine-tuned model that you received from training with additional data.

# Configure the model parameters for testing
params["model"]["validation_ds"]["batch_size"] = 8

# Set up the test data loader and make sure the model is on GPU
params["model"]["validation_ds"]["manifest_filepath"] = "test_manifest.json"
quartznet.setup_test_data(test_data_config=params["model"]["validation_ds"])
_ = quartznet.cuda()

# Compute the WER metric between the hypothesis and predictions.

wer_numerators = []
wer_denominators = []

# Loop over all test batches.
# Iterating over the model's `test_dataloader` gives you:
# (audio_signal, audio_signal_length, transcript_tokens, transcript_length)
# See the AudioToCharDataset for more details.
with torch.no_grad():
    for test_batch in quartznet.test_dataloader():
        input_signal, input_signal_length, targets, targets_lengths = [x.cuda() for x in test_batch]
                
        log_probs, encoded_len, greedy_predictions = quartznet(
            input_signal=input_signal, 
            input_signal_length=input_signal_length
        )
        # The model has a helper object to compute WER
        quartznet._wer.update(greedy_predictions, targets, targets_lengths)
        _, wer_numerator, wer_denominator = quartznet._wer.compute()
        wer_numerators.append(wer_numerator.detach().cpu().numpy())
        wer_denominators.append(wer_denominator.detach().cpu().numpy())

# First, sum all numerators and denominators. Then, divide.
print(f"WER = {sum(wer_numerators)/sum(wer_denominators)*100:.2f}%")

WER = 24.36%

After training new epochs of the neural network ASR architecture, I achieved a WER of 24.36%, which is an improvement over the initial 39.7% from the base model using only one epoch for training. For better results, consider using more epochs in the training.

Conclusion

In this tutorial, I demonstrated how to load speech data collected by DefinedCrowd and how to use it to train and measure the performance of an ASR model. I hope I have shown you how easy it is to create world-class AI solutions with NVIDIA and DefinedCrowd.