[_Derived_]RecvAsync is cancelled – LSTM


Tensorflow broke in my conda environment and I cant seem to get it working again. I’m having differnt issues with getting tensorflow-gpu==2.3.0 and 2.4.1 working.

GTX 1070 GPU drivers:

-CUDA 11.0.3


installed with $conda install cudatoolkit=11.0 cudnn=8.0 -c=conda-forge

-Python 3.8.8

Tensorflow 2.4.1:

tensorflow 2.3.0 mkl_py38h8557ec7_0 tensorflow-base 2.3.0 eigen_py38h75a453f_0 tensorflow-estimator 2.4.0 pyh9656e83_0 conda-forge tensorflow-gpu 2.3.0 he13fc11_0 

installed with pip install –upgrade tensorflow-gpu==2.4.1

I have set all the environment variables correctly. Checking with print(tf.config.list_physical_devices(‘GPU’)) gives: [PhysicalDevice(name=’/physical_device:GPU:0′, device_type=’GPU’)]

So tensorflow seems to be installed and recognises my gpu. I’ve been working on a LSTM model, when training with $ , it runs for 6 epochs and then gives this error

Epoch 1/50 2021-02-27 14:50:38.552734: I tensorflow/compiler/mlir/] None of the MLIR optimization passes are enabled (registered 2) 2021-02-27 14:50:38.882403: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublas64_11.dll 2021-02-27 14:50:39.546250: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cublasLt64_11.dll 2021-02-27 14:50:39.794953: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library cudnn64_8.dll 37/37 [==============================] - 7s 55ms/step - loss: 7.0684 - accuracy: 0.1270 Epoch 2/50 37/37 [==============================] - 2s 54ms/step - loss: 4.8889 - accuracy: 0.1828 Epoch 3/50 37/37 [==============================] - 2s 54ms/step - loss: 4.7884 - accuracy: 0.1666 Epoch 4/50 37/37 [==============================] - 2s 54ms/step - loss: 4.6866 - accuracy: 0.1480 Epoch 5/50 37/37 [==============================] - 2s 55ms/step - loss: 4.5179 - accuracy: 0.1630 Epoch 6/50 17/37 [============>.................] - ETA: 1s - loss: 4.2505 - accuracy: 0.14842021-02-27 14:50:55.955000: E tensorflow/stream_executor/] CUDNN_STATUS_INTERNAL_ERROR in tensorflow/stream_executor/cuda/ 'cudnnRNNBackwardWeights( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), output_desc.handles(), output_data.opaque(), workspace.opaque(), workspace.size(), rnn_desc.params_handle(), params_backprop_data->opaque(), reserve_space_data->opaque(), reserve_space_data->size())' 2021-02-27 14:50:55.955194: W tensorflow/core/framework/] OP_REQUIRES failed at : Internal: Failed to call ThenRnnBackward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 256, 256, 1, 100, 64, 256] 2021-02-27 14:50:55,957 : MainThread : INFO : Saving model history to model_history.csv 2021-02-27 14:50:55,961 : MainThread : INFO : Saving model to D:projectproject_enginefftest_checkpointsbatch_0synthetic Traceback (most recent call last): File "", line 65, in <module> model.train() ... ... ... File "", line 201, in train_rnn, epochs=store.epochs, callbacks=_callbacks) File "", line 1100, in fit tmp_logs = self.train_function(iterator) File "", line 828, in __call__ result = self._call(*args, **kwds) File "", line 855, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "", line 2942, in __call__ return graph_function._call_flat( File "", line 1918, in _call_flat return self._build_call_outputs( File "", line 555, in call outputs = execute.execute( File "", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.CancelledError: [_Derived_]RecvAsync is cancelled. [[{{node gradient_tape/sequential/embedding/embedding_lookup/Reshape/_20}}]] [Op:__inference_train_function_4800] Function call stack: train_function 

Tensorflow forums with similar issues mention memory or driver issues but this isn’t the case as the model wouldn’t start training at all. Also I know the code is fine because I trained on the same code with no issue in an old environment I was using 2 months ago. It also runs fine in a CPU only tensorflow environment.

Does anyone have any suggestions on how to fix this?

Tensorflow 2.3.0:

Secondly, I cant even try another version of tensorflow gpu in a different environment.

conda install -c anaconda tensorflow-gpu 

Tensorflow GPU succesfully installs but doesn’t run on my GPU for reasons stated here –

I’ve now lost 2 days and a lot of will to leave, any help with either issues would be massively appreciated.

submitted by /u/nuusain
[visit reddit] [comments]


Loading Array too large for Memory?

I dumped pixel arrays with their labels to a pickle file, but it will take up too much memory to load and use Keras at the same time.

How can I load this large array into a CNN without it overloading memory?


submitted by /u/llub888
[visit reddit] [comments]


InvalidArgumentError: logits and labels must have the same first dimension

Hi, guys, I am working on a project relating to image segmentation, and I met a trouble when I tried to train a model.

I used the code from and I used use my own dataset for training. Everything was fine until running ‘’

The complete warning is: InvalidArgumentError: logits and labels must have the same first dimension, got logits shape [32,92160] and labels shape [247808] [[node sparse_categorical_crossentropy/SparseSoftmaxCrossEntropyWithLogits/SparseSoftmaxCrossEntropyWithLogits (defined at <ipython-input-48-dac0e73a3d3d>:11) ]] [Op:__inference_train_function_27962]

My code block is shown below:

from google.colab import drive

!unzip /content/drive/MyDrive/data/
!unzip /content/drive/MyDrive/data/

import os
input_dir = “image/”
target_dir = “mask/”
img_size = (88,88)
num_classes = 10
batch_size = 32
input_img_paths = sorted(
os.path.join(input_dir, fname)
for fname in os.listdir(input_dir)
if fname.endswith(“.jpg”)
target_img_paths = sorted(
for fname in os.listdir(target_dir)
if fname.endswith(“.jpg”)

from tensorflow import keras
import numpy as np
from tensorflow.keras.preprocessing.image import load_img
class SAR(keras.utils.Sequence):
def __init__(self, batch_size, img_size, input_img_paths, target_img_paths):
self.batch_size = batch_size
self.img_size = img_size
self.input_img_paths = input_img_paths
self.target_img_paths = target_img_paths
def __len__(self):
return len(self.target_img_paths) // self.batch_size
def __getitem__(self, idx):
“””Returns tuple (input, target) correspond to batch #idx.”””
i = idx * self.batch_size
batch_input_img_paths = self.input_img_paths[i : i + self.batch_size]
batch_target_img_paths = self.target_img_paths[i : i + self.batch_size]
x = np.zeros((self.batch_size,) + self.img_size + (3,), dtype=”float32″)
for j, path in enumerate(batch_input_img_paths):
img = load_img(path, target_size=self.img_size)
x[j] = img
y = np.zeros((self.batch_size,) + self.img_size + (1,), dtype=”uint8″)
for j, path in enumerate(batch_target_img_paths):
img = load_img(path, target_size=self.img_size, color_mode=”grayscale”)
y[j] = np.expand_dims(img, 2)

y[j] -= 1
return x, y

from tensorflow.keras import layers
import tensorflow as tf

def get_model(img_size, num_classes):
inputs = keras.Input(shape=img_size + (3,))
### [First half of the network: downsampling inputs] ###
# Entry block
#x = layers.Flatten()(inputs) #additional
x = layers.Conv2D(32, 3, strides=2, padding=”same”)(inputs)
x = layers.BatchNormalization()(x)
x = layers.Activation(“relu”)(x)
previous_block_activation = x # Set aside residual

for filters in [64, 128, 256]:
x = layers.Activation(“relu”)(x)
x = layers.SeparableConv2D(filters, 3, padding=”same”)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation(“relu”)(x)
x = layers.SeparableConv2D(filters, 3, padding=”same”)(x)
x = layers.BatchNormalization()(x)
x = layers.MaxPooling2D(3, strides=2, padding=”same”)(x)
# Project residual
residual = layers.Conv2D(filters, 1, strides=2, padding=”same”)(
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
### [Second half of the network: upsampling inputs] ###
for filters in [256, 128, 64, 32]:
x = layers.Activation(“relu”)(x)
x = layers.Conv2DTranspose(filters, 3, padding=”same”)(x)
x = layers.BatchNormalization()(x)
x = layers.Activation(“relu”)(x)
x = layers.Conv2DTranspose(filters, 3, padding=”same”)(x)
x = layers.BatchNormalization()(x)
x = layers.UpSampling2D(2)(x)
# Project residual
residual = layers.UpSampling2D(2)(previous_block_activation)
residual = layers.Conv2D(filters, 1, padding=”same”)(residual)
x = layers.add([x, residual]) # Add back residual
previous_block_activation = x # Set aside next residual
# Add a per-pixel classification layer
outputs = layers.Conv2D(num_classes, 3, activation=”softmax”, padding=”same”)(x)
# Define the model
model = keras.Model(inputs, outputs)
return model

# Free up RAM in case the model definition cells were run multiple times
# Build model
model = get_model(img_size, num_classes)

import random
# Split our img paths into a training and a validation set
val_samples = 1000
train_input_img_paths = input_img_paths[:-val_samples]
train_target_img_paths = target_img_paths[:-val_samples]
val_input_img_paths = input_img_paths[-val_samples:]
val_target_img_paths = target_img_paths[-val_samples:]
# Instantiate data Sequences for each split
train_gen = SAR(
batch_size, img_size, train_input_img_paths, train_target_img_paths
val_gen = SAR(batch_size, img_size, val_input_img_paths, val_target_img_paths)

model.compile(optimizer=”adam”, loss=”sparse_categorical_crossentropy”)

epochs = 15, epochs=epochs, validation_data=val_gen)

I would be grateful if you guys could help me deal with this problem.

submitted by /u/Apprehensive_Ad_6830
[visit reddit] [comments]


[Beginners Tutorial] How to Use Google Colab for Deep Learning

If you’re a programmer, you want to explore deep learning, and need a platform to help you do it – this tutorial is exactly for you.

In this tutorial you will learn: – Getting around in Google Colab – Installing python libraries in Colab – Downloading large datasets in Colab – Training a Deep learning model in Colab – Using TensorBoard in Colab

Google Colab for Deep Learning

submitted by /u/kk_ai
[visit reddit] [comments]


Tom Cruise deepfake videos are all over the internet and passing the best deepfake detectors!

Tom Cruise deepfake videos are all over the internet and passing the best deepfake detectors! submitted by /u/MLtinkerer
[visit reddit] [comments]

Retracing AI’s Steps: Go-Explore Algorithms Solve Trickiest Atari Games

A team of Uber AI researchers has achieved record high scores and beaten previously unsolved Atari games with algorithms that remember and build off their past successes.

A team of Uber AI researchers has achieved record high scores and beaten previously unsolved Atari games with algorithms that remember and build off their past successes. 

Highlighted this week in Nature, the Go-Explore family of algorithms to address limitations of traditional reinforcement learning algorithms, which struggle with complex games that provide sparse or deceptive feedback. 

Performance on Atari games is a popular benchmark for reinforcement learning algorithms. But many algorithms fail to thoroughly explore promising avenues, instead going off track to find potential new solutions.

In this paper, the researchers applied a simple principle —  “first return, then explore,” creating algorithms that remember promising states from past games, return to those states, and then intentionally explore from that point to further maximize reward.  

The researchers used a variety of NVIDIA GPUs at OpenAI and Uber data centers to develop the algorithms. 

The software determines which plays to revisit by storing screen grabs of past games and grouping together similar-looking images to find starting points it should return to in future rounds. 

“The reason our approach hadn’t been considered before is that it differs strongly from the dominant approach that has historically been used for addressing these problems in the reinforcement learning community, called ‘intrinsic motivation,” said researchers Adrien Ecoffet, Joost Huizinga, and Jeff Clune. “In intrinsic motivation, instead of dividing exploration into returning and exploring like we do, the agent is simply rewarded for discovering new areas.”

With the return-and-explore technique, Go-Explore achieved massive improvements on a collection of 55 Atari games, beating state-of-the-art algorithms 85.5 percent of the time. The algorithm set a record — beating both the human world record and past reinforcement learning records — on the complex Montezuma’s Revenge game.

The paper also demonstrated how Go-Explore could be applied to real-world challenges including robotics, drug design, and language processing. 

A preprint of the paper is available on arXiv under open-access terms. Read the full paper, “First return, then explore,” in Nature.



Meet the Maker: DIY Builder Takes AI to Bat for Calling Balls and Strikes

Baseball players have to think fast when batting against blurry-fast pitches. Now, AI might be able to assist. Nick Bild, a Florida-based software engineer, has created an application that can signal to batters whether pitches are going to be balls or strikes. Dubbed Tipper, it can be fitted on the outer edge of glasses to Read article >

The post Meet the Maker: DIY Builder Takes AI to Bat for Calling Balls and Strikes appeared first on The Official NVIDIA Blog.


What Is Cloud Gaming?

Cloud gaming uses powerful, industrial-strength GPUs inside secure data centers to stream your favorite games over the internet to you. So you can play the latest games on nearly any device, even ones that can’t normally play that game. But First, What Is Cloud Gaming? While the technology is complex, the concept is simple. Cloud Read article >

The post What Is Cloud Gaming? appeared first on The Official NVIDIA Blog.


In the Drink of an AI: Startup Opseyes Instantly Analyzes Wastewater

Let’s be blunt. Potentially toxic waste is just about the last thing you want to get in the mail. And that’s just one of the opportunities for AI to make the business of analyzing wastewater better. It’s an industry that goes far beyond just making sure water coming from traditional sewage plants is clean. Just Read article >

The post In the Drink of an AI: Startup Opseyes Instantly Analyzes Wastewater appeared first on The Official NVIDIA Blog.


Lyra: A New Very Low-Bitrate Codec for Speech Compression

Connecting to others online via voice and video calls is something that is increasingly a part of everyday life. The real-time communication frameworks, like WebRTC, that make this possible depend on efficient compression techniques, codecs, to encode (or decode) signals for transmission or storage. A vital part of media applications for decades, codecs allow bandwidth-hungry applications to efficiently transmit data, and have led to an expectation of high-quality communication anywhere at any time.

As such, a continuing challenge in developing codecs, both for video and audio, is to provide increasing quality, using less data, and to minimize latency for real-time communication. Even though video might seem much more bandwidth hungry than audio, modern video codecs can reach lower bitrates than some high-quality speech codecs used today. Combining low-bitrate video and speech codecs can deliver a high-quality video call experience even in low-bandwidth networks. Yet historically, the lower the bitrate for an audio codec, the less intelligible and more robotic the voice signal becomes. Furthermore, while some people have access to a consistent high-quality, high-speed network, this level of connectivity isn’t universal, and even those in well connected areas at times experience poor quality, low bandwidth, and congested network connections.

To solve this problem, we have created Lyra, a high-quality, very low-bitrate speech codec that makes voice communication available even on the slowest networks. To do this, we’ve applied traditional codec techniques while leveraging advances in machine learning (ML) with models trained on thousands of hours of data to create a novel method for compressing and transmitting voice signals.

Lyra Overview
The basic architecture of the Lyra codec is quite simple. Features, or distinctive speech attributes, are extracted from speech every 40ms and are then compressed for transmission. The features themselves are log mel spectrograms, a list of numbers representing the speech energy in different frequency bands, which have traditionally been used for their perceptual relevance because they are modeled after human auditory response. On the other end, a generative model uses those features to recreate the speech signal. In this sense, Lyra is very similar to other traditional parametric codecs, such as MELP.

However traditional parametric codecs, which simply extract from speech critical parameters that can then be used to recreate the signal at the receiving end, achieve low bitrates, but often sound robotic and unnatural. These shortcomings have led to the development of a new generation of high-quality audio generative models that have revolutionized the field by being able to not only differentiate between signals, but also generate completely new ones. DeepMind’s WaveNet was the first of these generative models that paved the way for many to come. Additionally, WaveNetEQ, the generative model-based packet-loss-concealment system currently used in Duo, has demonstrated how this technology can be used in real-world scenarios.

A New Approach to Compression with Lyra
Using these models as a baseline, we’ve developed a new model capable of reconstructing speech using minimal amounts of data. Lyra harnesses the power of these new natural-sounding generative models to maintain the low bitrate of parametric codecs while achieving high quality, on par with state-of-the-art waveform codecs used in most streaming and communication platforms today. The drawback of waveform codecs is that they achieve this high quality by compressing and sending over the signal sample-by-sample, which requires a higher bitrate and, in most cases, isn’t necessary to achieve natural sounding speech.

One concern with generative models is their computational complexity. Lyra avoids this issue by using a cheaper recurrent generative model, a WaveRNN variation, that works at a lower rate, but generates in parallel multiple signals in different frequency ranges that it later combines into a single output signal at the desired sample rate. This trick enables Lyra to not only run on cloud servers, but also on-device on mid-range phones in real time (with a processing latency of 90ms, which is in line with other traditional speech codecs). This generative model is then trained on thousands of hours of speech data and optimized, similarly to WaveNet, to accurately recreate the input audio.

Comparison with Existing Codecs
Since the inception of Lyra, our mission has been to provide the best quality audio using a fraction of the bitrate data of alternatives. Currently, the royalty-free open-source codec Opus, is the most widely used codec for WebRTC-based VOIP applications and, with audio at 32kbps, typically obtains transparent speech quality, i.e., indistinguishable from the original. However, while Opus can be used in more bandwidth constrained environments down to 6kbps, it starts to demonstrate degraded audio quality. Other codecs are capable of operating at comparable bitrates to Lyra (Speex, MELP, AMR), but each suffer from increased artifacts and result in a robotic sounding voice.

Lyra is currently designed to operate at 3kbps and listening tests show that Lyra outperforms any other codec at that bitrate and is compared favorably to Opus at 8kbps, thus achieving more than a 60% reduction in bandwidth. Lyra can be used wherever the bandwidth conditions are insufficient for higher-bitrates and existing low-bitrate codecs do not provide adequate quality.

Clean Speech
Noisy Environment
Reference Opus@6kbps Lyra@3kbps

Ensuring Fairness
As with any ML based system, the model must be trained to make sure that it works for everyone. We’ve trained Lyra with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners. One of the design goals of Lyra is to ensure universally accessible high-quality audio experiences. Lyra trains on a wide dataset, including speakers in a myriad of languages, to make sure the codec is robust to any situation it might encounter.

Societal Impact and Where We Go From Here
The implications of technologies like Lyra are far reaching, both in the short and long term. With Lyra, billions of users in emerging markets can have access to an efficient low-bitrate codec that allows them to have higher quality audio than ever before. Additionally, Lyra can be used in cloud environments enabling users with various network and device capabilities to chat seamlessly with each other. Pairing Lyra with new video compression technologies, like AV1, will allow video chats to take place, even for users connecting to the internet via a 56kbps dial-in modem.

Duo already uses ML to reduce audio interruptions, and is currently rolling out Lyra to improve audio call quality and reliability on very low bandwidth connections. We will continue to optimize Lyra’s performance and quality to ensure maximum availability of the technology, with investigations into acceleration via GPUs and TPUs. We are also beginning to research how these technologies can lead to a low-bitrate general-purpose audio codec (i.e., music and other non-speech use cases).

Thanks to everyone who made Lyra possible including Jan Skoglund, Felicia Lim, Michael Chinen, Bastiaan Kleijn, Tom Denton, Andrew Storus, Yero Yeh (Chrome Media), Henrik Lundin, Niklas Blum, Karl Wiberg (Google Duo), Chenjie Gu, Zach Gleicher, Norman Casagrande, Erich Elsen (DeepMind).