Categories
Misc

NVIDIA Chief Scientist Highlights New AI Research in GTC Keynote

NVIDIA researchers are defining ways to make faster AI chips in systems with greater bandwidth that are easier to program, said Bill Dally, NVIDIA’s chief scientist, in a keynote released today for a virtual GTC China event. He described three projects as examples of how the 200-person research team he leads is working to stoke Read article >

The post NVIDIA Chief Scientist Highlights New AI Research in GTC Keynote appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA and Tencent Cloud Demonstrate XR Streaming From the Cloud

NVIDIA CloudXR platform uses Tencent Cloud’s stable and efficient cloud GPU computing power to turn any end device, including head-mounted displays (HMD) and connected Windows and Android devices, into a high-fidelity XR display that can showcase professional-quality graphics.

At GTC China, NVIDIA announced that Tencent Cloud demonstrated CloudXR streaming an immersive high-rise office building. NVIDIA CloudXR platform uses Tencent Cloud’s stable and efficient cloud GPU computing power to turn any end device, including head-mounted displays (HMD) and connected Windows and Android devices, into a high-fidelity XR display that can showcase professional-quality graphics.

The CloudXR platform includes the NVIDIA CloudXR software development kit, NVIDIA Quadro Virtual Workstation software and NVIDIA AI SDKs to deliver photorealistic graphics, with the mobile convenience of all-in-one XR headsets. 

Independent software vendors from industries spanning manufacturing, architecture, media and entertainment, and healthcare are adopting the CloudXR platform and accessing it from a growing number of major edge and cloud service providers. 

The ability to stream high-fidelity experiences from the cloud removes the need for users to be tethered to workstations or external VR tracking systems. With CloudXR, professionals can now easily set up, scale and access immersive experiences from anywhere in the world. 

The key to a great XR experience is extremely low perceived latency, and a core feature of CloudXR is its ability to manage perceived latency. However, users still require fast access to the client and the server. Tencent Cloud is allowing users to access their regional data centers, which allows for ultra-low latency XR experiences.

“The CloudXR experience was amazing, it was indistinguishable from a tethered experience,” said Zhu Yi Ting, CTO of Sheencity. “CloudXR streaming from Tencent’s cloud allows us to reach even more customers with our rich immersive software package.”

NVIDIA’s early access partner, Sheencity, has deployed CloudXR on Tencent Cloud GPU Cloud Computing instances allowing them to stream high-quality VR and AR experiences to XR users across China.

Sheencity developed a smart visual design platform software named Mars, which provides software cloud services for more than 1,000 well-known design institutes and 200 architectural landscape universities. Current models are large with rich textures, and that requires the highest fidelity for design decisions. 

By viewing designs in virtual reality and changing features such as building height, facade material, color, green area, building spacing, and lighting conditions, professionals can view and compare multiple design schemes in real time.

“Tencent Cloud will work with NVIDIA to deepen the comprehensive cooperation in the VR/AR industry and create unique, high-quality immersive experiences for users anytime, anywhere,” said Song dan dan, director for Heterogeneous Computing Products at Tencent Cloud. “Super computing power combined with the performance of the cloud, we can jointly accelerate the popularization and application of VR/AR in smart life.”

Private Beta Now Available

NVIDIA is working with Tencent to make CloudXR generally available via the Tencent Marketplace. In the meantime, CloudXR is available on Tencent through a Private Beta program. Sign up now to get the latest news and updates on upcoming CloudXR releases, including the Private Beta.

Categories
Misc

Stuttgart Supercomputing Center Shifts into AI Gear

Stuttgart’s supercomputer center has been cruising down the autobahn of high performance computing like a well-torqued coupe, and now it’s making a pitstop for some AI fuel. Germany’s High-Performance Computing Center Stuttgart (HLRS), one of Europe’s largest supercomputing centers, has tripled the size of its staff and increased its revenues from industry collaborations 20x since Read article >

The post Stuttgart Supercomputing Center Shifts into AI Gear appeared first on The Official NVIDIA Blog.

Categories
Misc

[Question] Sampling Images from a normal distribution for VAE

Hello there,

I am currently working on a VAE using
tensorflow-probability. I would like to later train it on celeb_a,
but right now I am using mnist to test everything.

My model looks like this, inspired by
this example
“` prior =
tfd.Independent(tfd.Normal(loc=tf.zeros(encoded_size), scale=1),
reinterpreted_batch_ndims=1)

inputs = tfk.Input(shape=input_shape) x = tfkl.Lambda(lambda x:
tf.cast(x, tf.float32) – 0.5)(inputs) x = tfkl.Conv2D(base_depth,
5, strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2D(base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(2 * base_depth, 5,
strides=1, padding=’same’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2D(2 * base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2D(4 * encoded_size,
7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Flatten()(x) x =
tfkl.Dense(tfpl.IndependentNormal.params_size(encoded_size))(x) x =
tfpl.IndependentNormal(encoded_size,
activity_regularizer=tfpl.KLDivergenceRegularizer(prior))(x)

encoder = tfk.Model(inputs, x, name=’encoder’)
encoder.summary()

inputs = tfk.Input(shape=(encoded_size,)) x = tfkl.Reshape([1,
1, encoded_size])(inputs) x = tfkl.Conv2DTranspose(2 * base_depth,
7, strides=1, padding=’valid’, activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(2 * base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) x = tfkl.Conv2DTranspose(2 *
base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=2, padding=’same’,
activation=tf.nn.leaky_relu)(x) x =
tfkl.Conv2DTranspose(base_depth, 5, strides=1, padding=’same’,
activation=tf.nn.leaky_relu)(x) mu = tfkl.Conv2D(filters=1,
kernel_size=5, strides=1, padding=’same’, activation=None)(x) mu =
tfkl.Flatten()(mu) sigma = tfkl.Conv2D(filters=1, kernel_size=5,
strides=1, padding=’same’, activation=None)(x) sigma =
tf.exp(sigma) sigma = tfkl.Flatten()(sigma) x = tf.concat((mu,
sigma), axis=1) x = tfkl.LeakyReLU()(x) x =
tfpl.IndependentNormal(input_shape)(x)

decoder = tfk.Model(inputs, x) decoder.summary()

negloglik = lambda x, rv_x: -rv_x.log_prob(x)

vae.compile(optimizer=tf.optimizers.Adam(learning_rate=1e-4),
loss=negloglik)

mnist_digits are normed between 0.0 and 1.0

history = vae.fit(mnist_digits, mnist_digits, epochs=100,
batch_size=300) “`

My problem here is that the loss function stops decreasing at
around ~470 and the images sampled from the returned distribution
look like random noise. When using a bernoulli distribution instead
of the normal distribution in the decoder, the loss steadily
decrease and the sampled images look like they should. I can’t use
a bernoulli distribution for rgb tho, which I have to when I want
to train the model on celeb_a. I also can’t just use a
deterministic decoder, as I want to later decompose the elbo (loss
term – KL divergence) as seen in this.

Can someone explain to me why the normal distribution just
“doesn’t work”? How can I improve it so that it actually learns a
distribution that I can sample.

submitted by /u/tadachs

[visit reddit]

[comments]

Categories
Misc

How to use a consecutive sequence of one channel images to predict next frame label with Conv1D and LSTM?

Hi,

I am quite new to temporal forecast with images and LSTM. I
really appreciate your help.

Input is a sequence of images where each image size is 28*28,
and the number of this sequence of images is set as the batch size
as None the first argument of input_shape.

suppose 4 consecutive seconds of images were fed into the NN,
and the expected output of the NN would be No. 5 second label.

But I have hard time making Conv1D and LSTM working together and
ending up with one numerical label.

model = Sequential() model.add(Conv1D(40,2, strides=2,padding='same', activation='relu', input_shape=(None,28,28))) model.add(Reshape((None,576))) # or model.add(Flatten()) model.add(LSTM(10, activation='relu',stateful=True,return_sequences=True)) 
  1. Is the Batch size set properly?
  2. how to make Conv1D and LSTM linked together? I mean the data
    dimensionality stuff. Is it necessary to get the numerical labels
    from Conv1D and then pass them to LSTM? or pass the original
    dimensional data directly from Conv1D to LSTM then from the LSTM
    result, to compute one numerical label as the final result of
    NN?
  3. also is TimeDistributed() layer needed?

Thank you so much!

submitted by /u/boydbuilding

[visit reddit]

[comments]

Categories
Misc

Warhammer 40,000 The New Edition – Trailer (Remastered 8K 60FPS) Resolution increased using neural networks to 8K 60FPS


Warhammer 40,000 The New Edition - Trailer (Remastered 8K 60FPS) Resolution increased using neural networks to 8K 60FPS
submitted by /u/stepanmetior

[visit reddit]

[comments]
Categories
Misc

Free browser extension for ML community that thousands of machine learning engineers/data scientists use everyday! Drop a comment for any questions/feature requests you may have!


Free browser extension for ML community that thousands of machine learning engineers/data scientists use everyday! Drop a comment for any questions/feature requests you may have!
submitted by /u/MLtinkerer

[visit reddit]

[comments]
Categories
Misc

A TensorFlow tip to Optimize your Training


A TensorFlow tip to Optimize your Training

Originally posted here.

💡 #TensorFlowTip

Use .prefetch to reduce your step time of training and
extracting data

  • overlap the preprocessing and model execution
  • while the model executes training step n the input pipeline is
    reading the data for n+1 step
  • reduces the idle time for the GPU and CPU

See the speedup with .prefetch in this image. Try it for
yourself
here
.


Speedup with .prefetch

submitted by /u/Rishit-dagli

[visit reddit]

[comments]

Categories
Offsites

Metrics and summaries in TensorFlow 2

In this relatively short post, I’m going to show you how to deal with metrics and summaries in TensorFlow 2. Metrics, which can be used to monitor various important variables during the training of deep learning networks (such as accuracy or various losses), were somewhat unwieldy in TensorFlow 1.X. Thankfully in the new TensorFlow 2.0 they are much easier to use. Summary logging, for visualization of training in the TensorBoard interface, has also undergone some changes in TensorFlow 2 that I will be demonstrating. Please note – at time of writing, only the alpha version of TensorFlow 2 is available, but it is probably safe to assume that the syntax and forms demonstrated in this tutorial will remain the same in TensorFlow 2.0. To install the alpha version, use the following command:
pip install tensorflow==2.0.0-alpha0
In this tutorial, I’ll be using a generic MNIST Convolutional Neural Network example, but utilizing full TensorFlow 2 design paradigms. To learn more about CNNs, see this tutorial – to understand more about TensorFlow 2 paradigms, see this tutorial. All the code for this tutorial can be found as a Google Colaboratory file on my Github repository.

Eager to build deep learning systems? Get the book here

 

TensorFlow 2 metrics

Metrics in TensorFlow 2 can be found in the TensorFlow Keras distribution – tf.keras.metrics. Metrics, along with the rest of TensorFlow 2, are now computed in an Eager fashion. In TensorFlow 1.X, metrics were gathered and computed using the imperative declaration, tf.Session style. All that is required now is to declare the metrics as a Python variable, use the method update_state() to add a state to the metric, result() to summarize the metric, and finally reset_states() to reset all the states of the metric.  The code below shows a simple implementation of a Mean metric:
mean_metric = tf.keras.metrics.Mean()
mean_metric.update_state(2.0)
mean_metric.update_state(3.0)
mean_metric.update_state(4.0)
print(mean_metric.result().numpy())
This will print the average result -> 3.0. As can be observed, there is an internal memory for the metric, which can be appended to using update_state(). The Mean metric operation is executed when result() is called. Finally, to reset the memory of the metric, we can use reset_states() as follows:
mean_metric.reset_states()
print(mean_metric.result().numpy())
This will print the default response of an empty metric – 0.0.

TensorFlow 2 summaries

Metrics fit hand-in-glove with summaries in TensorFlow 2. In order to log summaries in TensorFlow 2, the developer uses the with Python context manager. First, one creates a summary_writer object like so:
summary_writer = tf.summary.create_file_writer('/log')
To log something to the summary writer, the developer must first enclose the “space” within your code which does the logging with a Python with statement. The logging looks like so:
with summary_writer.as_default():
  tf.summary.scalar('mean', mean_metric.result(), step=1)
The with context can surround the full training loop, or just the area of the code where you are storing the summaries. As can be observed, the logged scalar value is set by using the metric result() method. The step value needs to be provided to the summary – this allows TensorBoard to plot the variation of various values, images etc. between training steps. The step number can be tracked manually, but the easiest way is to use the iterations property of whatever optimizer you are using. This will be demonstrated in the example below.

TensorFlow 2 metrics and summaries – CNN example

In this example, I’ll show how to use metrics and summaries in the context of a CNN MNIST classification example. In this example, I’ll use a custom training loop, rather than a Keras fit loop. In the next section, I’ll show you how to implement custom metrics even within the Keras fit functionality. As usual for any machine learning task, the first step is to prepare the training and validation data. In this case, we’ll be using the prepackaged Keras MNIST dataset, then converting the numpy data arrays into a TensorFlow dataset (for more on TensorFlow datasets, see here and here). This looks like the following:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
BATCH_SIZE=64
# first the training set
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(BATCH_SIZE).shuffle(10000)
train_dataset = train_dataset.map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y))
train_dataset = train_dataset.map(lambda x, y: (tf.expand_dims(x, -1) / 255.0, y))
train_dataset = train_dataset.repeat()
# now the validation set
valid_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(5000).shuffle(10000)
valid_dataset = valid_dataset.map(lambda x, y: (tf.cast(x, tf.float32) / 255.0, y))
valid_dataset = valid_dataset.map(lambda x, y: (tf.expand_dims(x, -1) / 255.0, y))
valid_dataset = valid_dataset.repeat()
In the lines above, some preprocessing is applied to the image data to normalize it (divide the pixel values by 255, make the tensors 4D for consumption into CNN layers). Next I define the CNN model, using the Keras sequential paradigm:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(32, 2, 1, activation='relu', input_shape=(28, 28, 1)))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Conv2D(32, 2, 1, activation='relu'))
model.add(tf.keras.layers.MaxPool2D(2))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10))
The model declaration above is all standard Keras – for more on the sequential model type of Keras, see here. Next, we create a custom training loop function in TensorFlow. It is now best practice to encapsulate core parts of your code in Python functions – this is so that the @tf.function decorator can be applied easily to the function. This signals to TensorFlow to perform Just In Time (JIT) compilation of the relevant code into a graph, which allows the performance benefits of a static graph as per TensorFlow 1.X. Otherwise, the code will execute eagerly, which is not a big deal, but if one is building production or performance dependent code it is better to decorate with @tf.function. Here’s the training loop and optimization/loss function definitions:
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
def train(ds_train, optimizer, loss_fn, model, num_batches, log_freq=10):
  avg_loss = tf.keras.metrics.Mean()
  avg_acc = tf.keras.metrics.SparseCategoricalAccuracy()
  batch_idx = 0
  for batch_idx, (images, labels) in enumerate(ds_train):
    images = tf.expand_dims(images, -1)
    with tf.GradientTape() as tape:
      logits = model(images)
      loss_value = loss_fn(labels, logits)
    grads = tape.gradient(loss_value, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))
    avg_loss.update_state(loss_value)
    avg_acc.update_state(labels, logits)
    if batch_idx % log_freq == 0:
      print(f"Batch {batch_idx}, average loss is {avg_loss.result().numpy()}, average accuracy is {avg_acc.result().numpy()}")
      tf.summary.scalar('loss', avg_loss.result(), step=optimizer.iterations)
      tf.summary.scalar('acc', avg_acc.result(), step=optimizer.iterations)
      avg_loss.reset_states()
      avg_acc.reset_states()
    if batch_idx > num_batches:
      break
As can be observed, I have created two metrics for use in this training loop – avg_loss and avg_acc. These are Mean and SparseCategoricalAccuracy metrics, respectively. The Mean metric has been discussed previously. The SparseCategoricalAccuracy metric takes, as input, the training labels and logits (raw, unactivated outputs from your model). Because it is a sparse categorical accuracy measure, it can take the training labels in scalar integer form, rather than one-hot encoded label vectors. Calling result() on this metric will calculate the average accuracy of all the labels/logits pairs passed during the update_state() call – see line 15 above. Every log_freq number of batches, the results of the metrics are printed and also passed as summary scalars. After the metrics are logged in the summaries, their states are reset. You will notice that I have not provided a with context for these summaries – this is applied in the outer epoch loop is shown below:
num_epochs = 10
summary_writer = tf.summary.create_file_writer('./log/{}'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")))
for i in range(num_epochs):
  print(f"Epoch {i + 1} of {num_epochs}")
  with summary_writer.as_default():
    train(train_dataset, optimizer, loss_fn, model, 10000//BATCH_SIZE)
As can be observed, the summary_writer.as_default() is supplied as context to the whole train function. So far so good. However, this is utilizing a “manual” TensorFlow training loop, which is no longer the easiest way to train in TensorFlow 2, given the tight Keras integration. In the next example, I’ll show you how to include run of the mill metrics in the Keras API, but also custom metrics.

TensorFlow 2 Keras metrics and summaries

To include normal metrics such as the accuracy in Keras is straight-forward – one supplies a list of metrics to be logged in the compile statement like so:
metric_model.compile(optimizer=tf.optimizers.Adam(),
                     loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                     metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
However, if one wishes to log more complicated or custom metrics, it becomes difficult to see how to set this up in Keras. One easy way of doing so is by creating a custom Keras layer whose sole purpose is to add a metric to the model / training. In the example below, I have created a custom layer which adds the standard deviation of the kernel weights as a metric:
class MetricLayer(tf.keras.layers.Layer):
  def __init__(self, layer_to_log):
    super(MetricLayer, self).__init__()
    self.layer_to_log = layer_to_log
    
  def call(self, input):
    self.add_metric(tf.keras.backend.std(self.layer_to_log.variables[0]),
                    name=f'std_of_{self.layer_to_log.name}_kernel',
                    aggregation='mean')
    return input
A few things to notice about the creation of the custom layer above. First, notice that the layer is defined as a Python class object which inherits from the keras.layers.Layer object. The only variable passed to the initialization of this custom class is the layer with the kernel weights which we wish to log. The call method tells Keras / TensorFlow what to do when the layer is called in a feed forward pass. In this case, the input is passed straight through to the output – it is, in essence, a dummy layer. However, you’ll notice within the call a metric is added. The value of the metric is the standard deviation of layer_to_log.variables[0]. For a CNN layer, the zero index [0] of the layer variables is the kernel weights. A name is provided to the metric for ease of viewing during training, and finally the aggregation method of the metric is specified – in this case, a ‘mean’ aggregation of the standard deviations. To include this layer, one can just add it as a sequential element in the Keras model. In the below I take the existing CNN model created in the previous example, and create a new model with the custom metric layer appended to the end:
metric_model = tf.keras.Sequential()
metric_model.add(model)
metric_model.add(MetricLayer(model.layers[0]))
As can be observed in the above, the first layer of the previous model is passed to the custom MetricLayer. Running the fit training method on this model will now generate both the SparseCategoricalAccuracy metric, along with the custom standard deviation from the first layer. To monitor in TensorBoard, one must also include the TensorBoard callback. All of this looks like the following:
metric_model.compile(optimizer=tf.optimizers.Adam(),
                     loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                     metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])

callbacks = [
  # Write TensorBoard logs to `./logs` directory
  tf.keras.callbacks.TensorBoard(log_dir='./log/{}'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")), update_freq='batch')
]

metric_model.fit(train_dataset, steps_per_epoch=10000//BATCH_SIZE, epochs=5,
                 validation_data=valid_dataset, validation_steps=5,
                 callbacks=callbacks)
The code above will perform the training and ensure all the metrics (including the metric added in the custom metric layer) are output to TensorBoard via the TensorBoard callback. This concludes my quick introduction to metrics and summaries in TensorFlow 2. Watch out for future posts and updates of existing posts as the transition to TensorFlow 2 develops.  

Eager to build deep learning systems? Get the book here

 

The post Metrics and summaries in TensorFlow 2 appeared first on Adventures in Machine Learning.

Categories
Offsites

An introduction to Global Average Pooling in convolutional neural networks

For those familiar with convolutional neural networks (if you’re not, check out this post), you will know that, for many architectures, the final set of layers are often of the fully connected variety. This is like bolting a standard neural network classifier onto the end of an image processor. The convolutional neural network starts with a series of convolutional (and, potentially, pooling) layers which create feature maps which represent different components of the input images. The fully connected layers at the end then “interpret” the output of these features maps and make category predictions. However, as with many things in the fast moving world of deep learning research, this practice is starting to fall by the wayside in favor of something called Global Average Pooling (GAP). In this post, I’ll introduce the benefits of Global Average Pooling and apply it on the Cats vs Dogs image classification task using TensorFlow 2. In the process, I’ll compare its performance to the standard fully connected layer paradigm. The code for this tutorial can be found in a Jupyter Notebook on this site’s Github repository, ready for use in Google Colaboratory.


Eager to build deep learning systems? Get the book here


Global Average Pooling

Global Average Pooling is an operation that calculates the average output of each feature map in the previous layer. This fairly simple operation reduces the data significantly and prepares the model for the final classification layer. It also has no trainable parameters – just like Max Pooling (see here for more details). The diagram below shows how it is commonly used in a convolutional neural network:

Global Average Pooling in a CNN architecture

Global Average Pooling in a CNN architecture

As can be observed, the final layers consist simply of a Global Average Pooling layer and a final softmax output layer. As can be observed, in the architecture above, there are 64 averaging calculations corresponding to the 64, 7 x 7 channels at the output of the second convolutional layer. The GAP layer transforms the dimensions from (7, 7, 64) to (1, 1, 64) by performing the averaging across the 7 x 7 channel values. Global Average Pooling has the following advantages over the fully connected final layers paradigm:

  • The removal of a large number of trainable parameters from the model. Fully connected or dense layers have lots of parameters. A 7 x 7 x 64 CNN output being flattened and fed into a 500 node dense layer yields 1.56 million weights which need to be trained. Removing these layers speeds up the training of your model.
  • The elimination of all these trainable parameters also reduces the tendency of over-fitting, which needs to be managed in fully connected layers by the use of dropout.
  • The authors argue in the original paper that removing the fully connected classification layers forces the feature maps to be more closely related to the classification categories – so that each feature map becomes a kind of “category confidence map”.
  • Finally, the authors also argue that, due to the averaging operation over the feature maps, this makes the model more robust to spatial translations in the data. In other words, as long as the requisite feature is included / or activated in the feature map somewhere, it will still be “picked up” by the averaging operation.

To test out these ideas in practice, in the next section I’ll show you an example comparing the benefits of the Global Average Pooling with the historical paradigm. This example problem will be the Cats vs Dogs image classification task and I’ll be using TensorFlow 2 to build the models. At the time of writing, only TensorFlow 2 Alpha is available, and the reader can follow this link to find out how to install it.

Global Average Pooling with TensorFlow 2 and Cats vs Dogs

To download the Cats vs Dogs data for this example, you can use the following code:

import tensorflow as tf
from tensorflow.keras import layers
import tensorflow_datasets as tfds

split = (80, 10, 10)
splits = tfds.Split.TRAIN.subsplit(weighted=split)

(cat_train, cat_valid, cat_test), info = tfds.load('cats_vs_dogs', split=list(splits), with_info=True, as_supervised=True)

The code above utilizes the TensorFlow Datasets repository which allows you to import common machine learning datasets into TF Dataset objects.  For more on using Dataset objects in TensorFlow 2, check out this post. A few things to note. First, the split tuple (80, 10, 10) signifies the (training, validation, test) split as percentages of the dataset. This is then passed to the tensorflow_datasets split object which tells the dataset loader how to break up the data. Finally, the tfds.load() function is invoked. The first argument is a string specifying the dataset name to load. Following arguments relate to whether a split should be used, whether to return an argument with information about the dataset (info) and whether the dataset is intended to be used in a supervised learning problem, with labels being included. In order to examine the images in the data set, the following code can be run:

import matplotlib.pylab as plt

for image, label in cat_train.take(2):
  plt.figure()
  plt.imshow(image)

This produces the following images: As can be observed, the images are of varying sizes. This will need to be rectified so that the images have a consistent size to feed into our model. As usual, the image pixel values (which range from 0 to 255) need to be normalized – in this case, to between 0 and 1. The function below performs these tasks:

IMAGE_SIZE = 100
def pre_process_image(image, label):
  image = tf.cast(image, tf.float32)
  image = image / 255.0
  image = tf.image.resize(image, (IMAGE_SIZE, IMAGE_SIZE))
  return image, label

In this example, we’ll be resizing the images to 100 x 100 using tf.image.resize. To get state of the art levels of accuracy, you would probably want a larger image size, say 200 x 200, but in this case I’ve chosen speed over accuracy for demonstration purposes. As can be observed, the image values are also cast into the tf.float32 datatype and normalized by dividing by 255. Next we apply this function to the datasets, and also shuffle and batch where appropriate:

TRAIN_BATCH_SIZE = 64
cat_train = cat_train.map(pre_process_image).shuffle(1000).repeat().batch(TRAIN_BATCH_SIZE)
cat_valid = cat_valid.map(pre_process_image).repeat().batch(1000)

For more on TensorFlow datasets, see this post. Now it is time to build the model – in this example, we’ll be using the Keras API in TensorFlow 2. In this example, I’ll be using a common “head” model, which consists of layers of standard convolutional operations – convolution and max pooling, with batch normalization and ReLU activations:

head = tf.keras.Sequential()
head.add(layers.Conv2D(32, (3, 3), input_shape=(IMAGE_SIZE, IMAGE_SIZE, 3)))
head.add(layers.BatchNormalization())
head.add(layers.Activation('relu'))
head.add(layers.MaxPooling2D(pool_size=(2, 2)))

head.add(layers.Conv2D(32, (3, 3)))
head.add(layers.BatchNormalization())
head.add(layers.Activation('relu'))
head.add(layers.MaxPooling2D(pool_size=(2, 2)))

head.add(layers.Conv2D(64, (3, 3)))
head.add(layers.BatchNormalization())
head.add(layers.Activation('relu'))
head.add(layers.MaxPooling2D(pool_size=(2, 2)))

Next, we need to add the “back-end” of the network to perform the classification.

Standard fully connected classifier results

In the first instance, I’ll show the results of a standard fully connected classifier, without dropout. Because, for this example, there are only two possible classes – “cat” or “dog” – the final output layer is a dense / fully connected layer with a single node and a sigmoid activation.

standard_classifier = tf.keras.Sequential()
standard_classifier.add(layers.Flatten())
standard_classifier.add(layers.BatchNormalization())
standard_classifier.add(layers.Dense(100))
standard_classifier.add(layers.Activation('relu'))
standard_classifier.add(layers.BatchNormalization())
standard_classifier.add(layers.Dense(100))
standard_classifier.add(layers.Activation('relu'))
standard_classifier.add(layers.Dense(1))
standard_classifier.add(layers.Activation('sigmoid'))

As can be observed, in this case, the output classification layers includes 2 x 100 node dense layers. To combine the head model and this standard classifier, the following commands can be run:

standard_model = tf.keras.Sequential([
    head, 
    standard_classifier
])

Finally, the model is compiled, a TensorBoard callback is created for visualization purposes, and the Keras fit command is executed:

standard_model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='binary_crossentropy',
              metrics=['accuracy'])

callbacks = [tf.keras.callbacks.TensorBoard(log_dir='./log/{}'.format(dt.datetime.now().strftime("%Y-%m-%d-%H-%M-%S")))]

standard_model.fit(cat_train, steps_per_epoch = 23262//TRAIN_BATCH_SIZE, epochs=10, validation_data=cat_valid, validation_steps=10, callbacks=callbacks)

Note that the loss used is binary crossentropy, due to the binary classes for this example. The training progress over 7 epochs can be seen in the figure below:

Standard classifier without average pooling

Standard classifier accuracy (red – training, blue – validation)

Standard classifier loss without average pooling

Standard classifier loss (red – training, blue – validation)

As can be observed, with a standard fully connected classifier back-end to the model (without dropout), the training accuracy reaches high values but it overfits with respect to the validation dataset. The validation dataset accuracy stagnates around 80% and the loss begins to increase – a sure sign of overfitting.

Global Average Pooling results

The next step is to test the results of the Global Average Pooling in TensorFlow 2. To build the GAP layer and associated model, the following code is added:

average_pool = tf.keras.Sequential()
average_pool.add(layers.AveragePooling2D())
average_pool.add(layers.Flatten())
average_pool.add(layers.Dense(1, activation='sigmoid'))

pool_model = tf.keras.Sequential([
    head, 
    average_pool
])

The accuracy results for this model, along with the results of the standard fully connected classifier model, are shown below:

Global Average Pooling accuracy

Global average pooling accuracy vs standard fully connected classifier model (pink – GAP training, green – GAP validation, blue – FC classifier validation)

As can be observed from the graph above, the Global Average Pooling model has a higher validation accuracy by the 7th epoch than the fully connected model. The training accuracy is lower than the FC model, but this is clearly due to overfitting being reduced in the GAP model. A final comparison including the case of the FC model with a dropout layer inserted is shown below:

standard_classifier_with_do = tf.keras.Sequential()
standard_classifier_with_do.add(layers.Flatten())
standard_classifier_with_do.add(layers.BatchNormalization())
standard_classifier_with_do.add(layers.Dense(100))
standard_classifier_with_do.add(layers.Activation('relu'))
standard_classifier_with_do.add(layers.Dropout(0.5))
standard_classifier_with_do.add(layers.BatchNormalization())
standard_classifier_with_do.add(layers.Dense(100))
standard_classifier_with_do.add(layers.Activation('relu'))
standard_classifier_with_do.add(layers.Dense(1))
standard_classifier_with_do.add(layers.Activation('sigmoid'))
Global Average Pooling accuracy vs FC with dropout

Global average pooling validation accuracy vs FC classifier with and without dropout (green – GAP model, blue – FC model without DO, orange – FC model with DO)

As can be seen, of the three model options sharing the same convolutional front end, the GAP model has the best validation accuracy after 7 epochs of training (x – axis in the graph above is the number of batches). Dropout improves the validation accuracy of the FC model, but the GAP model is still narrowly out in front. Further tuning could be performed on the fully connected models and results may improve. However, one would expect Global Average Pooling to be at least equivalent to a FC model with dropout – even though it has hundreds of thousands of fewer parameters. I hope this short tutorial gives you a good understanding of Global Average Pooling and its benefits. You may want to consider it in the architecture of your next image classifier design.


Eager to build deep learning systems? Get the book here

The post An introduction to Global Average Pooling in convolutional neural networks appeared first on Adventures in Machine Learning.