Categories
Misc

Completing a TensorFlow android app

Hello everyone!

I have some questions on finishing the implementation of my TensorFlow application. I need advice on how to optimize my model.

Background

I have been working on an object detection Android app based on the one provided by TensorFlow. I have added bluetooth capabilities and implemented my own standalone Simple Online and Realtime Tracking algorithm (just so I could understand the code better in case I have to tune things). I do not want to get into the specifics of my application of the Android app but the simplest analogy is an Android app looking down at a conveyor belt. When the Android app sees a specific object on the conveyor belt at a certain location, it sends a bluetooth signal for some mechanism to take action on the specific object at the certain location (this probably describes half the possible apps here haha).

My application has been tested and works successfully when using one of the default tflite models in a simulation environment. However, the objects I plan to track are not in the standard tflite models. Therefore I need to create my own custom model. This is the final step of my app development.

I have (with much pain) figured out how to create a model generation pipeline: Tfrecords > train > convert to tflite > test on android app. I have not studied machine learning but realize that with my technical/programming/math skills I can kind of brute force a basic model and then learn the theory in more detail once my prototype is working. I have spent a fair bit of time browsing the TensorFlow’s github issues to produce a model that can somewhat detect my objects but not well enough and slower than the example tflite model (on my phone inference time is now 150ms instead of average of 50ms). I am now looking to decrease inference time and accuracy of my model.

My current model generation pipeline uses ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8 (as I couldn’t get ssd_mobilenet_v2_320x320_coco17_tpu-8 to work), uses my tfrecords then trains on the data, then converts to tflite (with optimization tf.lite.Optimize.DEFAULT flag) and finally attaches metadata. I plug this into the android app and then test.

My computer is slow, so I eventually plan on renting an EC2 and going through a bunch of parameters in ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config and thus generating a bunch of tflite models and rate their accuracy. As a final final test step, I will test the models for speed on my phone. Combination of fastest/accurate will be the tflite model of choice.

Questions

In ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config what parameters are good to vary to get a good parameter sweep?

What parameters are good to vary so that the resultant tflite model is faster ( 5mb tflite model is 50ms inference while 10mb model is 150ms inference time)?

What EC2 machine do you recommend using? I understand that amazon has machine learning tools, but with the time I spent creating my model and generating pipeline I am very hesitant to jump into additional exploratory work.

I’ll add the ssd_mobilenet_v2_fpnlite_320x320_coco17_tpu-8’s pipeline.config file in the comments.

submitted by /u/tensorpipelinetest
[visit reddit] [comments]

Categories
Misc

What Will NVIDIA CEO Jensen Huang Cook Up This Time at NVIDIA GTC?

Don’t blink. Accelerated computing is moving innovation forward faster than ever. And there’s no way to get smarter, quicker, about how it’s changing your world than to tune in to NVIDIA CEO Jensen Huang’s GTC keynote Monday, April 12, starting at 8:30 a.m. PT. The keynote, delivered again from the kitchen in Huang’s home, will Read article >

The post What Will NVIDIA CEO Jensen Huang Cook Up This Time at NVIDIA GTC? appeared first on The Official NVIDIA Blog.

Categories
Misc

GTC 21: 5 Data Center Networking and Ecosystem Sessions You Shouldn’t Miss!

We at NVIDIA are on a mission to bring the next generation data center vision to reality. Join us at NVIDIA GTC’21 (Apr 12-16, 2021) to witness the data center innovation we are pioneering.

As NVIDIA CEO Jensen Huang stated in last year’s GTC, “the data center is the new unit of computing.” Traditional way of using the server as the unit of computing is fading away quickly. More and more applications are moving to data centers that are located at the edge, in different availability zones or in private enterprise clouds. Modern workloads such as AI/ML, edge computing, cloud-native microservices and 5G services are becoming increasingly disaggregated, distributed and data hungry. These applications demand efficient, secure and accelerated computing and data processing across all the layers of the application stack. 

Computing accelerated by NVIDIA GPUs and data processing units (DPUs) is at the heart of modern data centers. DPUs are a game-changing new technology that accelerates GPU and CPU access to data while enabling software-defined, hardware-accelerated infrastructure. With DPUs, organizations can efficiently and effectively deploy networking, cyber security, and storage in virtualized as well as containerized environments.

We at NVIDIA are on a mission to bring the next generation data center vision to reality. Join us at NVIDIA GTC’21 (Apr 12-16, 2021) to witness the data center innovation we are pioneering.  Register for the top DPU sessions at GTC to learn how NVIDIA Networking solutions are powering the next generation data centers.

Palo Alto Networks and NVIDIA present  Accelerated 5G Security: DPU-Based Acceleration of Next-Generation Firewalls [S31671]

5G offers many new capabilities such as lower latency, higher reliability, throughput, agile service deployment through cloud-native architectures, greater device density, etc. A new approach is needed to achieve L7 security at these rates with software-based firewalls. Integrating Palo Alto Network next generation firewall with the NVIDIA DPU enables industry-leading high-performance security. NVIDIA’s Bluefield-2 DPU provides a rich set of network offload engines designed to address the acceleration needs of security-focused network functions in today’s most demanding markets such as 5G and the cloud.

Speakers:

  • Sree Koratala, VP Product Management Mobility Security, Palo Alto Networks
  • John McDowall, Senior Distinguished Engineer, Palo Alto Networks
  • Ash Bhalgat, Senior Director, Cloud, Telco & Security Market Development, NVIDIA

China Mobile, Nokia and NVIDIA present Turbocharge Cloud-Native Applications with Virtual Data Plane Accelerated Networking [S31563]

Great progress has been made leveraging hardware to accelerate cloud networking for a wide range of cloud-based applications. This talk will examine how cloud networking for public cloud infrastructure as a service can be accelerated using the NVIDIA’s new open standards technology called virtual data plane acceleration (vDPA). In addition, this presentation will examine the early validation results and acceleration benefits of deploying NVIDIA ASAP2 vDPA technology in China Mobile’s BigCloud cloud service.

Speakers:

  • Sharko Cheng, Senior Network Architect, Cloud Networking Products Department, CMCC
  • Mark Iskra, Director, Nokia/Nuage Networks
  • Ash Bhalgat, Senior Director, Cloud, Telco & Security Market Development, NVIDIA

Red Hat and NVIDIA present  Implementing Virtual Network Offloading Using Open Source tools on BlueField-2 [S31380]

NVIDIA and Red Hat have been working together to provide an elegant and 100% open-source solution using the BlueField DPU for hardware offloading of the software-defined networking tasks in cloud-native environments. With BlueField DPUs, we can encrypt, encapsulate, switch, and route packets right on the DPU, effectively dedicating all the server’s processing capacity to running business applications. This talk will discuss typical use cases and demonstrate the performance advantages of using BlueField’s hardware offload capabilities with Red Hat Enterprise Linux and the Red Hat OpenShift container platform.

Speakers:

  • Rashid Khan, Director of Networking, Red Hat
  • Rony Efraim, Networking Software and Systems Architect, NVIDIA

NVIDIA DPU team presents  Program Data Center Infrastructure Acceleration with the Release of DOCA and the Latest DPU Software [S32205]

NVIDIA is releasing the first version of DOCA, a set of libraries, SDKs, and tools for programming the NVIDIA BlueField DPU, as well as the new version 3.6 of the DPU software. Together these enable new infrastructure acceleration and management features in BlueField, simplify programming and application integration. DPU developers can offload and accelerate networking, virtualization, security, and storage features including VirtIO for NFV/VNFs, BlueField SNAP for elastic storage virtualization, regular expression matching for malware detection, and deep packet inspection to enable sophisticated routing, firewall, and load-balancing applications.

Speakers:

  • Ami Badani, VP Marketing, NVIDIA
  • Ariel Kit, Director of Product Marketing for Networking, NVIDIA

F5/NGINX and NVIDIA presents  kTLS Hardware Offload Performance Benchmarking for NGINX Web Server [S31551]

Encrypted communication usage is steadily growing across most internet services. TLS is the leading security protocol implemented on top of TCP. kTLS (kernel TLS) serves as a unit that offers TLS operations support in the Linux kernel layer. It was introduced in kernel v4.13 as a software offload for user-space TLS libraries and was extended in kernel v4.18 with an infrastructure for performing hardware-accelerated encryption/decryption in the SmartNIC and DPUs. This session will review the life cycle of a hardware-offloaded kTLS connection and the driver-hardware interaction to support it while demonstrating and analyzing the significant performance gain by offloading kTLS operations to the hardware, using NGINX as the target workload and NVIDIA’s mlx5e driver on top of a ConnectX-6 Dx SmartNIC.

Speakers:

  • Damian Curry, Business Development Technical Director, F5
  • Bar Tuaf, Software engineer, NVIDIA

Register today for free and start building your schedule. 

Once you are signed in, you can explore all GTC conference topics here. Topics include areas of interest such as data center networking and virtualization, HPC, deep learning, data science, and autonomous machines, or industries including healthcare, public sector, retail, and telecommunications.

See you at the GTC’21!

Categories
Misc

World of Difference: GTC to Spotlight AI Developers in Emerging Markets

Startups don’t just come from Silicon Valley — they hail from Senegal, Saudi Arabia, Pakistan, and beyond. And hundreds will take the stage at the GPU Technology Conference. GTC, running April 12-16, will spotlight developers and startups advancing AI in Africa, Latin America, Southeast Asia, and the Middle East. Registration is free, and provides access Read article >

The post World of Difference: GTC to Spotlight AI Developers in Emerging Markets appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA-Powered Systems Ready to Bask in Ice Lake

Data-hungry workloads such as machine learning and data analytics have become commonplace. To cope with these compute-intensive tasks, enterprises need accelerated servers that are optimized for high performance. Intel’s 3rd Gen Intel Xeon Scalable processors (code-named “Ice Lake”), launched today, are based on a new architecture that enables a major leap in performance and scalability. Read article >

The post NVIDIA-Powered Systems Ready to Bask in Ice Lake appeared first on The Official NVIDIA Blog.

Categories
Misc

We built Datature, a platform that allows anyone to train their own TensorFlow Object Detection / Segmentation model using drag and drop!

We built Datature, a platform that allows anyone to train their own TensorFlow Object Detection / Segmentation model using drag and drop! submitted by /u/xusty
[visit reddit] [comments]
Categories
Misc

Tutorial: Building a Magic Wand

Tutorial: Building a Magic Wand submitted by /u/AR_MR_XR
[visit reddit] [comments]
Categories
Misc

Jetson Project of the Month: DeepWay, AI-based navigation aid for the visually impaired

Satinder Singh won the Jetson Project of the Month for DeepWay, an AI-based navigation assistance system for the visually impaired. The project, which runs on an NVIDIA Jetson Nano Developer Kit, monitors the path of a person and provides guidance to walk on the correct side and avoid any oncoming pedestrians.  In addition to the … Continued

Satinder Singh won the Jetson Project of the Month for DeepWay, an AI-based navigation assistance system for the visually impaired. The project, which runs on an NVIDIA Jetson Nano Developer Kit, monitors the path of a person and provides guidance to walk on the correct side and avoid any oncoming pedestrians. 

In addition to the Jetson Nano, Satinder’s setup includes an Arduino Nano, a webcam, two servo motors, a USB audio adapter, 3D printed eye-glasses and a few peripherals. The Arduino Nano is used to control the servo motors, which nudge the user in the correct direction by gently tapping them on either side of the head. If the Jetson Nano identifies pedestrians, a voice prompt sent through the USB audio adapter can be used to warn the user. 

To train his convolutional neural network, Satinder collected 10,000 images across three classes for left, center and right positions within a lane, and trained his model on a GPU-enabled virtual machine on Microsoft Azure. He used both PyTorch and Keras for training a U-net semantic segmentation model. After further analysis, he picked the U-Net model trained in Keras for its better performance.

DeepWay – Project Walk Through 

Satinder believes that his solution stands out among others because of its portability and affordability (plan including further optimizing it using NVIDIA TensorRT. We can’t wait to see how this project evolves.

The World Health Organization estimates at least 2.2 billion people around the world have a vision impairment. Many among this group would benefit from navigation aids. Open-source prototypes such as DeepWay foster innovation around these use-cases and are great examples of AI being used for social good.

If you’re interested in building your own ‘DeepWay’ or extending it, Satinder has shared the instructions and the code here.

Categories
Misc

Harvesting AI: Startup’s Weed Recognition for Herbicides Grows Yield for Farmers

When French classmates Guillaume Jourdain, Hugo Serrat and Jules Beguerie were looking at applying AI to agriculture in 2014 to form a startup, it was hardly a sure bet. It was early days for such AI applications, and people said it couldn’t be done. But farmers they spoke with wanted it. So they rigged together Read article >

The post Harvesting AI: Startup’s Weed Recognition for Herbicides Grows Yield for Farmers appeared first on The Official NVIDIA Blog.

Categories
Misc

Python TensorFlow Tutorial – Build a Neural Network

Updated for TensorFlow 2

Google’s TensorFlow has been a hot topic in deep learning recently.  The open source software, designed to allow efficient computation of data flow graphs, is especially suited to deep learning tasks.  It is designed to be executed on single or multiple CPUs and GPUs, making it a good option for complex deep learning tasks.  In its most recent incarnation – version 1.0 – it can even be run on certain mobile operating systems.  This introductory tutorial to TensorFlow will give an overview of some of the basic concepts of TensorFlow in Python.  These will be a good stepping stone to building more complex deep learning networks, such as Convolution Neural Networks, natural language models, and Recurrent Neural Networks in the package.  We’ll be creating a simple three-layer neural network to classify the MNIST dataset.  This tutorial assumes that you are familiar with the basics of neural networks, which you can get up to scratch with in the neural networks tutorial if required.  To install TensorFlow, follow the instructions here. The code for this tutorial can be found in this site’s GitHub repository.  Once you’re done, you also might want to check out a higher level deep learning library that sits on top of TensorFlow called Keras – see my Keras tutorial.

First, let’s have a look at the main ideas of TensorFlow.

1.0 TensorFlow graphs

TensorFlow is based on graph based computation – “what on earth is that?”, you might say.  It’s an alternative way of conceptualising mathematical calculations.  Consider the following expression $a = (b + c) * (c + 2)$.  We can break this function down into the following components:

begin{align}
d &= b + c \
e &= c + 2 \
a &= d * e
end{align}

Now we can represent these operations graphically as:

TensorFlow tutorial - simple computational graph

Simple computational graph

This may seem like a silly example – but notice a powerful idea in expressing the equation this way: two of the computations ($d=b+c$ and $e=c+2$) can be performed in parallel.  By splitting up these calculations across CPUs or GPUs, this can give us significant gains in computational times.  These gains are a must for big data applications and deep learning – especially for complicated neural network architectures such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs).  The idea behind TensorFlow is to the ability to create these computational graphs in code and allow significant performance improvements via parallel operations and other efficiency gains.

We can look at a similar graph in TensorFlow below, which shows the computational graph of a three-layer neural network.

TensorFlow tutorial - data flow graph

TensorFlow data flow graph

The animated data flows between different nodes in the graph are tensors which are multi-dimensional data arrays.  For instance, the input data tensor may be 5000 x 64 x 1, which represents a 64 node input layer with 5000 training samples.  After the input layer, there is a hidden layer with rectified linear units as the activation function.  There is a final output layer (called a “logit layer” in the above graph) that uses cross-entropy as a cost/loss function.  At each point we see the relevant tensors flowing to the “Gradients” block which finally flows to the Stochastic Gradient Descent optimizer which performs the back-propagation and gradient descent.

Here we can see how computational graphs can be used to represent the calculations in neural networks, and this, of course, is what TensorFlow excels at.  Let’s see how to perform some basic mathematical operations in TensorFlow to get a feel for how it all works.

2.0 A Simple TensorFlow example

So how can we make TensorFlow perform the little example calculation shown above – $a = (b + c) * (c + 2)$? First, there is a need to introduce TensorFlow variables.  The code below shows how to declare these objects:

import tensorflow as tf
# create TensorFlow variables
const = tf.Variable(2.0, name="const")
b = tf.Variable(2.0, name='b')
c = tf.Variable(1.0, name='c')

As can be observed above, TensorFlow variables can be declared using the tf.Variable function.  The first argument is the value to be assigned to the variable. The second is an optional name string which can be used to label the constant/variable – this is handy for when you want to do visualizations.  TensorFlow will infer the type of the variable from the initialized value, but it can also be set explicitly using the optional dtype argument.  TensorFlow has many of its own types like tf.float32, tf.int32 etc.

The objects assigned to the Python variables are actually TensorFlow tensors. Thereafter, they act like normal Python objects – therefore, if you want to access the tensors you need to keep track of the Python variables. In previous versions of TensorFlow, there were global methods of accessing the tensors and operations based on their names. This is no longer the case.

To examine the tensors stored in the Python variables, simply call them as you would a normal Python variable. If we do this for the “const” variable, you will see the following output:

<tf.Variable ‘const:0’ shape=() dtype=float32, numpy=2.0>

This output gives you a few different pieces of information – first, is the name ‘const:0’ which has been assigned to the tensor. Next is the data type, in this case, a TensorFlow float 32 type. Finally, there is a “numpy” value. TensorFlow variables in TensorFlow 2 can be converted easily into numpy objects. Numpy stands for Numerical Python and is a crucial library for Python data science and machine learning. If you don’t know Numpy, what it is, and how to use it, check out this site. The command to access the numpy form of the tensor is simply .numpy() – the use of this method will be shown shortly.

Next, some calculation operations are created:

# now create some operations
d = tf.add(b, c, name='d')
e = tf.add(c, const, name='e')
a = tf.multiply(d, e, name='a')

Note that d and e are automatically converted to tensor values upon the execution of the operations. TensorFlow has a wealth of calculation operations available to perform all sorts of interactions between tensors, as you will discover as you progress through this book.  The purpose of the operations shown above are pretty obvious, and they instantiate the operations b + c, c + 2.0, and d * e. However, these operations are an unwieldy way of doing things in TensorFlow 2. The operations below are equivalent to those above:

d = b + c
e = c + 2
a = d * e

To access the value of variable a, one can use the .numpy() method as shown below:

print(f”Variable a is {a.numpy()}”)

The computational graph for this simple example can be visualized by using the TensorBoard functionality that comes packaged with TensorFlow. This is a great visualization feature and is explained more in this post. Here is what the graph looks like in TensorBoard:

TensorFlow tutorial - simple graph

Simple TensorFlow graph

The larger two vertices or nodes, b and c, correspond to the variables. The smaller nodes correspond to the operations, and the edges between the vertices are the scalar values emerging from the variables and operations.

The example above is a trivial example – what would this look like if there was an array of b values from which an array of equivalent a values would be calculated? TensorFlow variables can easily be instantiated using numpy variables, like the following:

b = tf.Variable(np.arange(0, 10), name='b')

Calling b shows the following:

<tf.Variable ‘b:0’ shape=(10,) dtype=int32, numpy=array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])>

Note the numpy value of the tensor is an array. Because the numpy variable passed during the instantiation is a range of int32 values, we can’t add it directly to c as c is of float32 type. Therefore, the tf.cast operation, which changes the type of a tensor, first needs to be utilized like so:

d = tf.cast(b, tf.float32) + c

Running the rest of the previous operations, using the new b tensor, gives the following value for a:

Variable a is [ 3.  6.  9. 12. 15. 18. 21. 24. 27. 30.]

In numpy, the developer can directly access slices or individual indices of an array and change their values directly. Can the same be done in TensorFlow 2? Can individual indices and/or slices be accessed and changed? The answer is yes, but not quite as straight-forwardly as in numpy. For instance, if b was a simple numpy array, one could easily execute the following b[1] = 10 – this would change the value of the second element in the array to the integer 10.

b[1].assign(10)

This will then flow through to a like so:

Variable a is [ 3. 33.  9. 12. 15. 18. 21. 24. 27. 30.]

The developer could also run the following, to assign a slice of b values:

b[6:9].assign([10, 10, 10])

A new tensor can also be created by using the slice notation:

f = b[2:5]

The explanations and code above show you how to perform some basic tensor manipulations and operations. In the section below, an example will be presented where a neural network is created using the Eager paradigm in TensorFlow 2. It will show how to create a training loop, perform a feed-forward pass through a neural network and calculate and apply gradients to an optimization method.

3.0 A Neural Network Example

In this section, a simple three-layer neural network build in TensorFlow is demonstrated.  In following chapters more complicated neural network structures such as convolution neural networks and recurrent neural networks are covered.  For this example, though, it will be kept simple.

In this example, the MNIST dataset will be used that is packaged as part of the TensorFlow installation. This MNIST dataset is a set of 28×28 pixel grayscale images which represent hand-written digits.  It has 60,000 training rows, 10,000 testing rows, and 5,000 validation rows. It is a very common, basic, image classification dataset that is used in machine learning.

The data can be loaded by running the following:

from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

As can be observed, the Keras MNIST data loader returns Python tuples corresponding to the training and test set respectively (Keras is another deep learning framework, now tightly integrated with TensorFlow, as mentioned earlier). The data sizes of the tuples defined above are:

  • x_train: (60,000 x 28 x 28)
  • y_train: (60,000)
  • x_test: (10,000 x 28 x 28)
  • y_test: (10,000)

The x data is the image information – 60,000 images of 28 x 28 pixels size in the training set. The images are grayscale (i.e black and white) with maximum values, specifying the intensity of whites, of 255. The x data will need to be scaled so that it resides between 0 and 1, as this improves training efficiency. The y data is the matching image labels – signifying what digit is displayed in the image. This will need to be transformed to “one-hot” format.

When using a standard, categorical cross-entropy loss function (this will be shown later), a one-hot format is required when training classification tasks, as the output layer of the neural network will have the same number of nodes as the total number of possible classification labels. The output node with the highest value is considered as a prediction for that corresponding label. For instance, in the MNIST task, there are 10 possible classification labels – 0 to 9. Therefore, there will be 10 output nodes in any neural network performing this classification task. If we have an example output vector of [0.01, 0.8, 0.25, 0.05, 0.10, 0.27, 0.55, 0.32, 0.11, 0.09], the maximum value is in the second position / output node, and therefore this corresponds to the digit “1”. To train the network to produce this sort of outcome when the digit “1” appears, the loss needs to be calculated according to the difference between the output of the network and a “one-hot” array of the label 1. This one-hot array looks like [0, 1, 0, 0, 0, 0, 0, 0, 0, 0].

This conversion is easily performed in TensorFlow, as will be demonstrated shortly when the main training loop is covered.

One final thing that needs to be considered is how to extract the training data in batches of samples. The function below can handle this:

def get_batch(x_data, y_data, batch_size):
    idxs = np.random.randint(0, len(y_data), batch_size)
    return x_data[idxs,:,:], y_data[idxs]

As can be observed in the code above, the data to be batched i.e. the x and y data is passed to this function along with the batch size. The first line of the function generates a random vector of integers, with random values between 0 and the length of the data passed to the function. The number of random integers generated is equal to the batch size. The x and y data are then returned, but the return data is only for those random indices chosen. Note, that this is performed on numpy array objects – as will be shown shortly, the conversion from numpy arrays to tensor objects will be performed “on the fly” within the training loop.

There is also the requirement for a loss function and a feed-forward function, but these will be covered shortly.

# Python optimisation variables
epochs = 10
batch_size = 100

# normalize the input images by dividing by 255.0
x_train = x_train / 255.0
x_test = x_test / 255.0
# convert x_test to tensor to pass through model (train data will be converted to
# tensors on the fly)
x_test = tf.Variable(x_test)

First, the number of training epochs and the batch size are created – note these are simple Python variables, not TensorFlow variables. Next, the input training and test data, x_train and x_test, are scaled so that their values are between 0 and 1. Input data should always be scaled when training neural networks, as large, uncontrolled, inputs can heavily impact the training process. Finally, the test input data, x_test is converted into a tensor. The random batching process for the training data is most easily performed using numpy objects and functions. However, the test data will not be batched in this example, so the full test input data set x_test is converted into a tensor.

The next step is to setup the weight and bias variables for the three-layer neural network.  There are always L – 1 number of weights/bias tensors, where L is the number of layers.  These variables are defined in the code below:

# now declare the weights connecting the input to the hidden layer
W1 = tf.Variable(tf.random.normal([784, 300], stddev=0.03), name='W1')
b1 = tf.Variable(tf.random.normal([300]), name='b1')
# and the weights connecting the hidden layer to the output layer
W2 = tf.Variable(tf.random.normal([300, 10], stddev=0.03), name='W2')
b2 = tf.Variable(tf.random.normal([10]), name='b2')

The weight and bias variables are initialized using the tf.random.normal function – this function creates tensors of random numbers, drawn from a normal distribution. It allows the developer to specify things like the standard deviation of the distribution from which the random numbers are drawn.

Note the shape of the variables. The W1 variable is a [784, 300] tensor – the 784 nodes are the size of the input layer. This size comes from the flattening of the input images – if we have 28 rows and 28 columns of pixels, flattening these out gives us 1 row or column of 28 x 28 = 784 values.  The 300 in the declaration of W1 is the number of nodes in the hidden layer. The W2 variable is a [300, 10] tensor, connecting the 300-node hidden layer to the 10-node output layer. In each case, a name is given to the variable for later viewing in TensorBoard – the TensorFlow visualization package. The next step in the code is to create the computations that occur within the nodes of the network. If the reader recalls, the computations within the nodes of a neural network are of the following form:

$$z = Wx + b$$

$$h=f(z)$$

Where W is the weights matrix, x is the layer input vector, b is the bias and f is the activation function of the node. These calculations comprise the feed-forward pass of the input data through the neural network. To execute these calculations, a dedicated feed-forward function is created:

def nn_model(x_input, W1, b1, W2, b2):
    # flatten the input image from 28 x 28 to 784
    x_input = tf.reshape(x_input, (x_input.shape[0], -1))
    x = tf.add(tf.matmul(tf.cast(x_input, tf.float32), W1), b1)
    x = tf.nn.relu(x)
    logits = tf.add(tf.matmul(x, W2), b2)
    return logits

Examining the first line, the x_input data is reshaped from (batch_size, 28, 28) to (batch_size, 784) – in other words, the images are flattened out. On the next line, the input data is then converted to tf.float32 type using the TensorFlow cast function. This is important – the x­_input data comes in as tf.float64 type, and TensorFlow won’t perform a matrix multiplication operation (tf.matmul) between tensors of different data types. This re-typed input data is then matrix-multiplied by W1 using the TensorFlow matmul function (which stands for matrix multiplication). Then the bias b1 is added to this product. On the line after this, the ReLU activation function is applied to the output of this line of calculation. The ReLU function is usually the best activation function to use in deep learning – the reasons for this are discussed in this post.

The output of this calculation is then multiplied by the final set of weights W2, with the bias b2 added. The output of this calculation is titled logits. Note that no activation function has been applied to this output layer of nodes (yet). In machine/deep learning, the term “logits” refers to the un-activated output of a layer of nodes.

The reason no activation function has been applied to this layer is that there is a handy function in TensorFlow called tf.nn.softmax_cross_entropy_with_logits. This function does two things for the developer – it applies a softmax activation function to the logits, which transforms them into a quasi-probability (i.e. the sum of the output nodes is equal to 1). This is a common activation function to apply to an output layer in classification tasks. Next, it applies the cross-entropy loss function to the softmax activation output. The cross-entropy loss function is a commonly used loss in classification tasks. The theory behind it is quite interesting, but it won’t be covered in this book – a good summary can be found here. The code below applies this handy TensorFlow function, and in this example,  it has been nested in another function called loss_fn:

def loss_fn(logits, labels):
    cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels,
                                                                              logits=logits))
    return cross_entropy

The arguments to softmax_cross_entropy_with_logits are labels and logits. The logits argument is supplied from the outcome of the nn_model function. The usage of this function in the main training loop will be demonstrated shortly. The labels argument is supplied from the one-hot y values that are fed into loss_fn during the training process. The output of the softmax_cross_entropy_with_logits function will be the output of the cross-entropy loss value for each sample in the batch. To train the weights of the neural network, the average cross-entropy loss across the samples needs to be minimized as part of the optimization process. This is calculated by using the tf.reduce_mean function, which, unsurprisingly, calculates the mean of the tensor supplied to it.

The next step is to define an optimizer function. In many examples within this book, the versatile Adam optimizer will be used. The theory behind this optimizer is interesting, and is worth further examination (such as shown here) but won’t be covered in detail within this post. It is basically a gradient descent method, but with sophisticated averaging of the gradients to provide appropriate momentum to the learning. To define the optimizer, which will be used in the main training loop, the following code is run:

# setup the optimizer
optimizer = tf.keras.optimizers.Adam()

The Adam object can take a learning rate as input, but for the present purposes, the default value is used.

3.1 Training the network

Now that the appropriate functions, variables and optimizers have been created, it is time to define the overall training loop. The training loop is shown below:

total_batch = int(len(y_train) / batch_size)
for epoch in range(epochs):
    avg_loss = 0
    for i in range(total_batch):
        batch_x, batch_y = get_batch(x_train, y_train, batch_size=batch_size)
        # create tensors
        batch_x = tf.Variable(batch_x)
        batch_y = tf.Variable(batch_y)
        # create a one hot vector
        batch_y = tf.one_hot(batch_y, 10)
        with tf.GradientTape() as tape:
            logits = nn_model(batch_x, W1, b1, W2, b2)
            loss = loss_fn(logits, batch_y)
        gradients = tape.gradient(loss, [W1, b1, W2, b2])
        optimizer.apply_gradients(zip(gradients, [W1, b1, W2, b2]))
        avg_loss += loss /..