Categories
Offsites

Announcing WIT: A Wikipedia-Based Image-Text Dataset

Multimodal visio-linguistic models rely on rich datasets in order to model the relationship between images and text. Traditionally, these datasets have been created by either manually captioning images, or crawling the web and extracting the alt-text as the caption. While the former approach tends to result in higher quality data, the intensive manual annotation process limits the amount of data that can be created. On the other hand, the automated extraction approach can lead to bigger datasets, but these require either heuristics and careful filtering to ensure data quality or scaling-up models to achieve strong performance. An additional shortcoming of existing datasets is the dearth of coverage in non-English languages. This naturally led us to ask: Can one overcome these limitations and create a high-quality, large-sized, multilingual dataset with a variety of content?

Today we introduce the Wikipedia-Based Image Text (WIT) Dataset, a large multimodal dataset, created by extracting multiple different text selections associated with an image from Wikipedia articles and Wikimedia image links. This was accompanied by rigorous filtering to only retain high quality image-text sets. As detailed in “WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning”, presented at SIGIR ‘21, this resulted in a curated set of 37.5 million entity-rich image-text examples with 11.5 million unique images across 108 languages. The WIT dataset is available for download and use under the Creative Commons license. We are also excited to announce that we are hosting a competition with the WIT dataset in Kaggle in collaboration with Wikimedia Research and other external collaborators.

Dataset   Images     Text     Contextual Text     Languages  
Flickr30K 32K 158K < 8
SBU Captions     1M 1M 1
MS-COCO 330K 1.5M < 4; 7 (test only)
CC-3M
CC-12M
3.3M
12M
3.3M
12M

1
1
WIT 11.5M 37.5M ~119M 108
WIT’s increased language coverage and larger size relative to previous datasets.

The unique advantages of the WIT dataset are:

  1. Size: WIT is the largest multimodal dataset of image-text examples that is publicly available.
  2. Multilingual: With 108 languages, WIT has 10x or more languages than any other dataset.
  3. Contextual information: Unlike typical multimodal datasets, which have only one caption per image, WIT includes many page-level and section-level contextual information.
  4. Real world entities: Wikipedia, being a broad knowledge-base, is rich with real world entities that are represented in WIT.
  5. Challenging test set: In our recent work accepted at EMNLP, all state-of-the-art models demonstrated significantly lower performance on WIT vs. traditional evaluation sets (e.g., ~30 point drop in recall).

Generating the Dataset
The main goal of WIT was to create a large dataset without sacrificing on quality or coverage of concepts. Thus, we started by leveraging the largest online encyclopedia available today: Wikipedia.

For an example of the depth of information available, consider the Wikipedia page for Half Dome (Yosemite National Park, CA). As shown below, the article has numerous interesting text captions and relevant contextual information for the image, such as the page title, main page description, and other contextual information and metadata.

Example wikipedia page with various image-associated text selections and contexts we can extract. From the Wikipedia page for Half Dome : Photo by DAVID ILIFF. License: CC BY-SA 3.0.
Example of the Wikipedia page for this specific image of Half Dome. From the Wikipedia page for Half Dome : Photo by DAVID ILIFF. License: CC BY-SA 3.0.

We started by selecting Wikipedia pages that have images, then extracted various image-text associations and surrounding contexts. To further refine the data, we performed a rigorous filtering process to ensure data quality. This included text-based filtering to ensure caption availability, length and quality (e.g., by removing generic default filler text); image-based filtering to ensure each image is a certain size with permissible licensing; and finally, image-and-text-entity–based filtering to ensure suitability for research (e.g., excluding those classified as hate speech). We further randomly sampled image-caption sets for evaluation by human editors, who overwhelmingly agreed that 98% of the samples had good image-caption alignment.

Highly Multilingual
With data in 108 languages, WIT is the first large-scale, multilingual, multimodal dataset.

# of Image-Text Sets   Unique Languages   # of Images   Unique Languages  
> 1M 9 > 1M 6
500K – 1M 10 500K – 1M 12
  100K – 500K   36   100K – 500K   35
50K – 100K 15 50K – 100K 17
14K – 50K 38 13K – 50K 38
WIT: coverage statistics across languages.
Example of an image that is present in more than a dozen Wikipedia pages across >12 languages. From the Wikipedia page for Wolfgang Amadeus Mozart.

The First Contextual Image-Text Dataset
Most multimodal datasets only offer a single text caption (or multiple versions of a similar caption) for the given image. WIT is the first dataset to provide contextual information, which can help researchers model the effect of context on image captions as well as the choice of images.

WIT dataset example showing image-text data and additional contextual information.

In particular, key textual fields of WIT that may be useful for research include:

  • Text captions: WIT offers three different kinds of image captions. This includes the (potentially context influenced) “Reference description”, the (likely context independent) “Attribution description” and “Alt-text description”.
  • Contextual information: This includes the page title, page description, URL and local context about the Wikipedia section including the section title and text.

WIT has broad coverage across these different fields, as shown below.

Image-Text Fields of WIT     Train Val Test Total / Unique
Rows / Tuples   37.1M     261.8K     210.7K   37.6M
Unique Images 11.4M 58K 57K 11.5M
Reference Descriptions 16.9M 150K 104K   17.2M / 16.7M  
Attribution Descriptions 34.8M 193K 200K 35.2M / 10.9M
Alt-Text 5.3M 29K 29K 5.4M / 5.3M
Context Texts 119.8M
Key fields of WIT include both text captions and contextual information.

A High-Quality Training Set and a Challenging Evaluation Benchmark
The broad coverage of diverse concepts in Wikipedia means that the WIT evaluation sets serve as a challenging benchmark, even for state-of-the-art models. We found that for image-text retrieval, the mean recall scores for traditional datasets were in the 80s, whereas for the WIT test set, it was in the 40s for well-resourced languages and in the 30s for the under-resourced languages. We hope this in turn can help researchers to build stronger, more robust models.

WIT Dataset and Competition with Wikimedia and Kaggle
Additionally, we are happy to announce that we are partnering with Wikimedia Research and a few external collaborators to organize a competition with the WIT test set. We are hosting this competition in Kaggle. The competition is an image-text retrieval task. Given a set of images and text captions, the task is to retrieve the appropriate caption(s) for each image.

To enable research in this area, Wikipedia has kindly made available images at 300-pixel resolution and a Resnet-50–based image embeddings for most of the training and the test dataset. Kaggle will be hosting all this image data in addition to the WIT dataset itself and will provide colab notebooks. Further, the competitors will have access to a discussion forum in Kaggle in order to share code and collaborate. This enables anyone interested in multimodality to get started and run experiments easily. We are excited and looking forward to what will result from the WIT dataset and the Wikipedia images in the Kaggle platform.

Conclusion
We believe that the WIT dataset will aid researchers in building better multimodal multilingual models and in identifying better learning and representation techniques, ultimately leading to improved Machine Learning models in real-world tasks over visio-linguistic data. For any questions, please contact wit-dataset@google.com. We would love to hear about how you are using the WIT dataset.

Acknowledgements
We would like to thank our co-authors in Google Research: Jiecao Chen, Michael Bendersky and Marc Najork. We thank Beer Changpinyo, Corinna Cortes, Joshua Gang, Chao Jia, Ashwin Kakarla, Mike Lee, Zhen Li, Piyush Sharma, Radu Soricut, Ashish Vaswani, Yinfei Yang, and our reviewers for their insightful feedback and comments.

We thank Miriam Redi and Leila Zia from Wikimedia Research for collaborating with us on the competition and providing image pixels and image embedding data. We thank Addison Howard and Walter Reade for helping us host this competition in Kaggle. We also thank Diane Larlus (Naver Labs Europe (NLE)), Yannis Kalantidis (NLE), Stéphane Clinchant (NLE), Tiziano Piccardi Ph.D. student at EPFL, Lucie-Aimée Kaffee PhD student at University of Southampton and Yacine Jernite (Hugging Face) for their valuable contribution towards the competition.

Categories
Misc

Achieving Noise-Free Audio for Virtual Collaboration and Content Creation Applications

Maxine’s Audio Effects SDK enables you to build applications that integrate features such as noise removal and room echo removal into your applications to improve audio quality. This post showcases these effects and how to build applications that provide high audio quality.

With audio and video streaming, conferencing, and telecommunication on the rise, it has become essential for developers to build applications with outstanding audio quality and enable end users to communicate and collaborate effectively. Various background noises can disrupt communication, ranging from traffic and construction to dogs barking and babies crying. Moreover, a user could talk in a large room that amplifies echoes.

NVIDIA Maxine offers an easy-to-use Audio Effects SDK with AI neural network audio quality enhancement algorithms to address poor audio quality in virtual collaboration and content creation applications. With the Audio Effects SDK, you can remove virtually any type of noise, including room echo, and build applications that enable easy-to-understand conversations and productive meetings.

In this post, you learn how to build high audio-quality applications using containers on Linux or SDK on Windows platforms. All are demonstrated with prebuilt sample applications.

Build applications with no background noise or room echo

The Maxine Audio Effects SDK enables you to integrate noise removal, and room echo removal features for narrowband, wideband, and ultra-wideband audio into your applications. 

Video 1. Maxine’s Audio Effects SDK demo of Noise Removal and Room Echo Cancellation

Noise Removal

As we have started working from home more, there are many potential noise sources in the background of our calls, such as the sound of keystrokes or the compressor of an air conditioner. The distractions around us become a part of our surroundings, like slamming doors, moving furniture, or vacuuming.

With the Noise Removal effect, you can remove different noise profiles from audio streams while retaining the emotional aspects of the speaker’s voice. For example, when an end user is excited and pitching the new idea in an elevated tone with an air conditioner in the background, noise removal retains only the speaker’s voice.

Room Echo Cancellation

When a person speaks in a closed room, the sound bounces off all the surrounding surfaces. How much the voice gets absorbed, dampened, or continues to reflect for multiple iterations depends upon the surfaces’ size, geometry, and material. Such continued sound wave reflections build up over time and cause reverberations.

The echo is more noticeable in large rooms with more reflective surfaces, such as concrete or stone walls. For example, think about the voice sound reverberations in a high-ceiling cathedral. Such reverberant voices are unsuitable for popularly used speech encoding methods such as linear predictive coding or code-excited linear prediction. The encoding of reverberant speech results in severe distortions, rendering voices unintelligible in extreme cases.

It is essential to remove such reverberations from the voice recording before sending it. In situations where echo removal is not possible before encoding, it is essential to remove as much of the echo as possible before rendering the decoded voice through the speaker to the listener. The Room Echo Cancellation effect eliminates unwanted echoes from speech when users talk in a reverberant environment. In addition, this feature supports wideband and ultra-wideband signals.

You can combine the noise removal and room echo removal features for better end-to-end audio quality in both directions.

Get Maxine Audio Effects SDK for Windows or Linux

Using containers with Kubernetes provides a robust and easy-to-scale deployment strategy. We offer the Maxine Audio Effects SDK for Windows and Linux platforms in addition to prepackaged containers. The benefits of using containers are high scalability and time and cost savings due to faster deployment and reduced maintenance time. In addition, because of the prepackaged nature of containers, you don’t have to worry about specific installations inside the container.

In this post, we focus on how to use the Audio Effects SDK containers. Before proceeding with the installation, make sure that you meet all the hardware requirements.

If you have considerable experience with NVIDIA TensorRT and cuDNN and want to deploy the Audio Effects SDK on a bare-metal Linux system, download the SDK for your specific platform on the Maxine Getting Started page.

Audio Effects SDK Docker containers

There are four steps to install and take advantage of high-performance Audio Effects SDK and its state-of-the-art AI models on containers:

You need access to NVIDIA Turing, NVIDIA Volta, or NVIDIA Ampere Architecture generation data center GPUs: T4, V100, A100, A10, or A30.

Install the Audio Effects SDK on Windows

Installing the SDK on Windows is a straightforward process:

You must have an NVIDIA RTX card to benefit from the accelerated throughput and reduced latency of the Audio Effects SDK on Windows. To run this SDK on a datacenter card like A100, use the Linux package.

Using the Audio Effects SDK with prebuilt sample applications

The Audio Effects SDK comes with the prebuilt effects_demo and effects_delayed_streams_demo sample applications to demonstrate how to use the SDK. You can also build your own sample application. In this post, we focus on running the effects_demo sample application.

Real-time Audio Effects demonstration

The effects_demo application demonstrates how to use the SDK to apply effects to audio. It can be used to apply Noise Removal, Room Echo Cancellation, or both effects combined to input audio files and write the outputs to file.

To run this application, navigate to the samples/effects_demo directory and run the application using one of the following scripts:

$ ./run_effect.sh -a turing -s 16 -b 1 -e denoiser
$ ./run_effect.sh -a turing -s 48 -b 1 -e dereverb
$ ./run_effect.sh -a turing -s 16 -b 400 -e denoiser
$ ./run_effect.sh -a turing -s 48 -b 400 -e dereverb_denoiser

The run_effect.sh bash script accepts the following arguments:

  • -a: Architecture can be NVIDIA Turing, NVIDIA Volta, A100, or A10, depending on your GPU.
  • -s: Sample rate to use 48/16 in KHz.
  • -b: Batch size.
  • -e: Effect to run:
    • denoiser (NR)
    • dereverb (RER)
    • dereverb_denoiser (combined)

You can also execute the effects_demo binary by passing a configuration file as follows:

# For running denoiser on NVIDIA Turing GPU with 48kHz input and batch size 1
$ ./effects_demo -c turing_denoise48k_1_cfg.txt

This config file should contain the following parameters:

  • effect
  • sample_rate
  • model : Models are available in the /usr/local/AudioFX/models directory within the container.
  • real_time : Simulates audio reception from the physical device or stream.
  • intensity_ratio  : Specifies the denoising intensity ratio.
  • input_wav_list
  • output_wav_list

After you run the effects_demo sample application, the denoised output files are available in the same directory as the executable.

Audio Effects SDK demonstration on delayed streams

The effects_delayed_streams_demo application demonstrates handling delayed streams. In telecommunication, where the user’s audio might not reach the server in real time, we recommend applying the denoising effect in a delayed manner. In this sample application, each of the input streams fall under one of the following categories:

  • one_step_delay_streams: These streams have a delay of one frame. For example, if the frame size is 5 ms, these streams have a delay of 5 ms.
  • two_step_delay_streams: These streams have a delay of two frames. For example, if the frame size is 5 ms, these streams have a delay of 10 ms.
  • always_active_streams: These streams have no delay and are always active.

To run this application, navigate to the samples/effects_delayed_streams_demo directory and execute the binary as follows:

$ ./effects_delayed_streams_demo -c config-file

Here, -c config-file is the path to the configuration file, for example, turing_denoise48k_10_cfg.txt. The configuration file accepts the following parameters:

  • effect
  • frame_size: An unsigned integer that specifies the number of samples per frame per audio stream for the audio effect.
  • sample_rate 
  • model : Models are available in the /usr/local/AudioFX/models directory within the container.
  • one_step_delay_streams: Specifies the stream identifiers that belong to the one_step_delay_streams category.
  • two_step_delay_streams: Specifies the stream identifiers that belong to the two_step_delay_streams category.
  • input_wav_list
  • output_wav_list

After you run the effects_delayed_streams_demo sample application, the denoised output files are available in the same directory as the executable.

Run Audio Effects features with the API

The sample applications use easy-to-use Audio Effects SDK APIs to run the effects. They capitalize on significant performance advantages and control over batching of low-level APIs. Creating and running the audio effects in Maxine is a simple three-step process (Figure 1).

Running audio effects in Maxine starts with creating the effect, moves to loading the model, and ends with using the effect.
Figure 1. Steps and functions to run the Audio Effects SDK

Create the effect

To create the effect for either noise removal or room echo removal, call the NvAFX_CreateEffect function that takes a handle with the required parameters. This function returns the status code after creating the desired effect. Check for any errors using this status code before proceeding further.

// Create and handle

NvAFX_Handle handle;

// Call CreateEffect function and pass any one of the desired effects:
// NVAFX_EFFECT_DENOISER, NVAFX_EFFECT_DEREVERB,
// NVAFX_EFFECT_DEREVERB_DENOISER

NvAFX_Status err = NvAFX_CreateEffect(NVAFX_EFFECT_DENOISER, &handle);

Each provided model supports a specific audio sample rate that can be specified by calling NvAFX_SetU32. The sample_rate value should be an unsigned 32-bit integer value (48000/16000). Additionally, the proper model path for the GPU platform used should be passed using the NvAFX_SetString API call as follows:

// Pass parameter selector NVAFX_PARAM_SAMPLE_RATE and unsigned int
// Pass parameter selector NVAFX_PARAM_MODEL_PATH and character string
NvAFX_Status err;
err = NvAFX_SetU32(handle, NVAFX_PARAM_SAMPLE_RATE, sample_rate);
err = NvAFX_SetString(handle, NVAFX_PARAM_MODEL_PATH, model_file.c_str());

As the number of I/O audio channels and the number of samples per frame are preset for each effect, you must pass these parameters to the effects function. To get the list of supported values, call the NvAFX_GetU32 function, which returns the list of preset values.

// Pass the selector string to get specific information like:
// NVAFX_PARAM_NUM_SAMPLES_PER_FRAME,
// NVAFX_PARAM_NUM_CHANNELS,

unsigned num_samples_per_frame, num_channels;
NvAFX_Status err;
err = NvAFX_GetU32(handle, NVAFX_PARAM_NUM_SAMPLES_PER_FRAME,
&num_samples_per_frame);
err = NvAFX_GetU32(handle, NVAFX_PARAM_NUM_CHANNELS, &num_channels);

To run the effect on a GPU, you must get the list of supported devices using the NvAFX_GetSupportedDevices function, which fetches the number of supported GPUs.

// The function fills the array with the CUDA device indices of devices 
// that are supported by the model, in descending order of preference,
// where the first device is the most preferred device.

int numSupportedDevices = 0;
NvAFX_GetSupportedDevices(handle, &numSupportedDevices, nullptr);
std::vector ret(num);
NvAFX_GetSupportedDevices(handle, &numSupportedDevices, ret.data());

You can then set the GPU device to be used by passing the correct GPU device number, as follows:

NvAFX_SetU32(handle, NVAFX_PARAM_USE_DEFAULT_GPU, use_default_gpu_)

Load an audio effect

After the effect is created, the model must be loaded using the NvAFX_Load function. Loading an effect selects and loads a model and validates the parameters that were set for the effect. This function loads the model into the GPU memory and makes it ready for inference. To load an audio effect, call the NvAFX_Load function and specify the effect handle that was created.

NvAFX_Status err = NvAFX_Load(handle);

Run the audio effect

Finally, run the loaded audio effect to apply the desired effect on the input data. After an effect is run, the contents of the input memory buffer are read, the audio effect is applied, and the output is written to the output memory buffer. Call the NvAFX_Run function for running the loaded audio effect on the input buffer.

// Pass the effect handle, input, and output memory buffer, and the parameters of the effect

NvAFX_Status err = NvAFX_Run(handle, input, output, num_samples,num_channels);

After the audio effect is applied on the input memory buffer and is no longer required, clean up the resources using the NvAFX_DestroyEffect(handle) function call by passing the effect handle.

NvAFX_Status err = NvAFX_DestroyEffect(handle);

Summary

Now that we have explored details on Maxine Audio Effects features, shown you how to run the sample applications with appropriate parameters, and explored the easy-to-use, high-performance API, you can start integrating these amazing AI audio features into your applications using Maxine containers or bare metal on Windows, and on Linux.
 

For more information, see the Maxine Getting Started page. Let us know what you think or if you have any questions.


Categories
Misc

Trash Talk: Startup’s AI-Driven Detection System Primed to Take a Bite Out of Global Waste

Of the 8.3 billion tons of virgin plastic waste created each year, despite decades of efforts to reduce the amount that ends up in landfills, only about 9 percent gets recycled. London-based computer vision startup Recycleye looks to give those recycling numbers a big boost with its AI-driven system for identifying waste materials. By automating Read article >

The post Trash Talk: Startup’s AI-Driven Detection System Primed to Take a Bite Out of Global Waste appeared first on The Official NVIDIA Blog.

Categories
Misc

Architecture Firm Brings New Structure to Design Workflows With Real-Time Rendering and Virtual Collaboration

When working on future skyscrapers, bridges or other projects, Kohn Pedersen Fox looks beyond traditional processes. The global architecture firm aims to find the most creative and optimal design using advanced technologies like generative design, deep learning and immersive visualization. And during design reviews, KPF relies on collaborative sessions so their teams, clients and stakeholders Read article >

The post Architecture Firm Brings New Structure to Design Workflows With Real-Time Rendering and Virtual Collaboration appeared first on The Official NVIDIA Blog.

Categories
Misc

Find the Love We Shared in September: NVIDIA Canvas Update Paints With New Styles

NVIDIA Canvas, the AI-powered painting app that enables artists to paint by material, using AI to turn doodles into beautiful artwork, released an update today introducing custom styles. Now users can apply the look and feel or “style” of their own images to their final Canvas painting. Supporting the new Canvas update is the September Read article >

The post Find the Love We Shared in September: NVIDIA Canvas Update Paints With New Styles appeared first on The Official NVIDIA Blog.

Categories
Misc

Second partial derivative of ANN with respect to model input returns NoneType

This post is a follow up on this one: https://www.reddit.com/r/tensorflow/comments/pk5dqj/custom_loss_function_error_attributeerror/

Basically I need to compute 3 derivatives of the ANN I’m training with respect to (wrt) some input variables. I need those derivatives for a custom loss function.

I finally managed to calculate the 2 first order partial derivatives. The problem is in the second order derivative. It returns NoneType and I don’t know why. I’ve already tried different examples to no avail. For example tried the Jacobian (https://www.tensorflow.org/api_docs/python/tf/GradientTape#jacobian).

import pandas as pd from tensorflow import keras import tensorflow as tf from tensorflow.keras import layers, losses import numpy as np # Hyperparameters n_hidden_layers = 2 # Number of hidden layers. n_units = 128 # Number of neurons of the hidden layers. n_batch = 64 # Number of observations used per gradient update. n_epochs = 30 # Sample data x_train = {'strike': [200, 2925], 'Time to Maturity': [0.312329, 0.0356164], "RF Rate": [0.08, 2.97], "Sigma 20 Days Annualized": [0.123251, 0.0837898], "Underlying Price": [1494.82, 2840.69] } call_X_train = pd.DataFrame(x_train, columns = ['strike', "Time to Maturity", "RF Rate", "Sigma 20 Days Annualized", "Underlying Price"] ) x_test = {'strike': [200], 'Time to Maturity': [0.0356164], "RF Rate": [2.97], "Sigma 20 Days Annualized": [0.0837898], "Underlying Price": [2840.69] } call_X_test = pd.DataFrame(x_test, columns = ['strike', "Time to Maturity", "RF Rate", "Sigma 20 Days Annualized", "Underlying Price"] ) y_train = np.array([1285.25, 0.8]) call_y_train = pd.Series(y_train) y_test = np.array([0.8]) call_y_test = pd.Series(y_test) # Creates hidden layers def hl(tensor, n_units): hl_output = layers.Dense(n_units, activation = layers.LeakyReLU(alpha = 1))(tensor) # alpha = 1 makes the function LeakyReLU C^inf return hl_output # Create model using Keras' Functional API def mlp3_call(n_hidden_layers, n_units): # Create input layer inputs = keras.Input(shape = (call_X_train.shape[1],)) x = layers.LeakyReLU(alpha = 1)(inputs) # Create hidden layers for _ in range(n_hidden_layers): x = hl(x, n_units) # Create output layer outputs = layers.Dense(1, activation = keras.activations.softplus)(x) # Actually create the model model = keras.Model(inputs=inputs, outputs=outputs) return model # Custom loss function def constrained_mse(y_true, y_pred): mse = losses.mse(y_true, y_pred) x = tf.convert_to_tensor(call_X_train, np.float32) with tf.GradientTape() as tape: tape.watch(x) with tf.GradientTape(persistent=True) as tape2: tape2.watch(x) y = model(x) grad_y = tape2.gradient(y, x) dy_dstrike = grad_y[0, 0] dy_dttm = grad_y[0, 1] d2y_dstrike2 = tape.gradient(dy_dstrike, x[:,0]) loss = mse + dy_dstrike + dy_dttm + d2y_dstrike2 return loss model = mlp3_call(n_hidden_layers, n_units) model.compile(loss = constrained_mse, optimizer = keras.optimizers.Adam(),) history = model.fit(call_X_train, call_y_train, batch_size = n_batch, epochs = n_epochs, validation_split = 0.01, verbose = 1) 

submitted by /u/Snoo37084
[visit reddit] [comments]

Categories
Misc

Confused about tf.keras.layers.Flatten

The following example

import numpy as np import tensorflow as tf model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(2, activation='relu', input_shape=(2,2,))) model.add(tf.keras.layers.Flatten()) model.add(tf.keras.layers.Dense(1)) tx = np.random.rand(2,2) res = model(tx) print(res) 

Gives error

ValueError: Input 0 of layer dense_1 is incompatible with the layer: expected axis -1 of input shape to have value 4 but received input with shape (2, 2) 

But if I comment out the line with Flatten layer, then everything works fine

What is wrong with this code and how do I properly flatten output layer?

submitted by /u/warpod
[visit reddit] [comments]

Categories
Misc

What is the best way to recalculate a recommendation system, if the dataset changes?

submitted by /u/uvcrtok
[visit reddit] [comments]

Categories
Misc

How do you create a model which takes as input a string and passes it to a tokenizer?

As I asked here on StackOverflow, I’m having problems building a model with strings as input since the input layer is a tf.keras.Input(shape=(1,), dtype=tf.string, name=’text’) but the BERT tokenizer expects a string. How do you extract the input string from the keras input?

submitted by /u/childintime9
[visit reddit] [comments]

Categories
Misc

What is the Yolov4 MakeFile Config for 3080 GPU?

What is the Yolov4 MakeFile Config for 3080 GPU?

submitted by /u/-JuliusSeizure
[visit reddit] [comments]