**Updated for TensorFlow 2**

Google’s TensorFlow has been a hot topic in deep learning

recently. The open source software, designed to allow efficient

computation of data flow graphs, is especially suited to deep

learning tasks. It is designed to be executed on single or

multiple CPUs and GPUs, making it a good option for complex deep

learning tasks. In its most recent incarnation â version 1.0 â

it can even be run on certain mobile operating systems. This

introductory tutorial to TensorFlow will give an overview of some

of the basic concepts of TensorFlow in Python. These will be a

good stepping stone to building more complex deep learning

networks, such as

Convolution Neural Networks,

natural language models, and

Recurrent Neural Networks in the package. We’ll be creating a

simple three-layer neural network to classify the MNIST dataset.

This tutorial assumes that you are familiar with the basics of

neural networks, which you can get up to scratch with in the

neural networks tutorial if required. To install TensorFlow,

follow the instructions here. The code for this

tutorial can be found in this

site’s GitHub repository. Once you’re done, you also might

want to check out a higher level deep learning library that sits on

top of TensorFlow called Keras â see

my Keras tutorial.

First, let’s have a look at the main ideas of TensorFlow.

1.0 TensorFlow graphs

TensorFlow is based on graph based computation â âwhat on

earth is that?â, you might say. It’s an alternative way

of conceptualising mathematical calculations. Consider the

following expression $a = (b + c) * (c + 2)$. We can break this

function down into the following components:

begin{align}

d &= b + c \

e &= c + 2 \

a &= d * e

end{align}

Now we can represent these operations graphically as:

Simple computational graph

This may seem like a silly example â but notice a powerful

idea in expressing the equation this way: two of the computations

($d=b+c$ and $e=c+2$) can be performed in parallel. By splitting

up these calculations across CPUs or GPUs, this can give us

significant gains in computational times. These gains are a must

for big data applications and deep learning â especially for

complicated neural network architectures such as Convolutional

Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). The

idea behind TensorFlow is to the ability to create these

computational graphs in code and allow significant performance

improvements via parallel operations and other efficiency

gains.

We can look at a similar graph in TensorFlow below, which shows

the computational graph of a three-layer neural network.

TensorFlow data flow graph

The animated data flows between different nodes in the graph are

tensors which are multi-dimensional data arrays. For instance, the

input data tensor may be 5000 x 64 x 1, which represents a 64 node

input layer with 5000 training samples. After the input layer,

there is a hidden layer with rectified

linear units as the activation function. There is a final

output layer (called a âlogit layerâ in the above graph) that

uses cross-entropy as a cost/loss function. At each point we see

the relevant tensors flowing to the âGradientsâ block which

finally flows to the

Stochastic Gradient Descent optimizer which performs the

back-propagation and gradient descent.

Here we can see how computational graphs can be used to

represent the calculations in neural networks, and this, of course,

is what TensorFlow excels at. Let’s see how to perform some basic

mathematical operations in TensorFlow to get a feel for how it all

works.

2.0 A Simple TensorFlow example

So how can we make TensorFlow perform the little example

calculation shown above â $a = (b + c) * (c + 2)$? First, there

is a need to introduce TensorFlow variables. The code below shows

how to declare these objects:

import tensorflow as tf # create TensorFlow variables const = tf.Variable(2.0, name="const") b = tf.Variable(2.0, name='b') c = tf.Variable(1.0, name='c')

As can be observed above, TensorFlow variables can be declared

using the tf.Variable function. The first argument is the value to

be assigned to the variable. The second is an optional name string

which can be used to label the constant/variable â this is handy

for when you want to do visualizations. TensorFlow will infer the

type of the variable from the initialized value, but it can also be

set explicitly using the optional dtype argument. TensorFlow has

many of its own types like tf.float32, tf.int32 etc.

The objects assigned to the Python variables are actually

TensorFlow tensors. Thereafter, they act like normal Python objects

â therefore, if you want to access the tensors you need to keep

track of the Python variables. In previous versions of TensorFlow,

there were global methods of accessing the tensors and operations

based on their names. This is no longer the case.

To examine the tensors stored in the Python variables, simply

call them as you would a normal Python variable. If we do this for

the âconstâ variable, you will see the following output:

<tf.Variable âconst:0′ shape=() dtype=float32,

numpy=2.0>

This output gives you a few different pieces of information â

first, is the name âconst:0â which has been assigned to the

tensor. Next is the data type, in this case, a TensorFlow float 32

type. Finally, there is a ânumpyâ value. TensorFlow variables

in TensorFlow 2 can be converted easily into numpy objects. Numpy

stands for Numerical Python and is a crucial library for Python

data science and machine learning. If you donât know Numpy, what

it is, and how to use it, check out this site. The command to access the

numpy form of the tensor is simply .numpy() â the use of this

method will be shown shortly.

Next, some calculation operations are created:

# now create some operations d = tf.add(b, c, name='d') e = tf.add(c, const, name='e') a = tf.multiply(d, e, name='a')

Note that d and e are automatically converted to tensor values

upon the execution of the operations. TensorFlow has a wealth of

calculation operations available to perform all sorts of

interactions between tensors, as you will discover as you progress

through this book. The purpose of the operations shown above are

pretty obvious, and they instantiate the operations b + c, c + 2.0,

and d * e. However, these operations are an unwieldy way of doing

things in TensorFlow 2. The operations below are equivalent to

those above:

d = b + c e = c + 2 a = d * e

To access the value of variable a, one can use the .numpy()

method as shown below:

print(**fâVariable a is {a.numpy()}â**)

The computational graph for this simple example can be

visualized by using the TensorBoard functionality that comes

packaged with TensorFlow. This is a great visualization feature and

is explained more in

this post. Here is what the graph looks like in

TensorBoard:

Simple TensorFlow graph

The larger two vertices or nodes, b and c, correspond to the

variables. The smaller nodes correspond to the operations, and the

edges between the vertices are the scalar values emerging from the

variables and operations.

The example above is a trivial example â what would this look

like if there was an array of b values from which an array of

equivalent a values would be calculated? TensorFlow variables can

easily be instantiated using numpy variables, like the

following:

b = tf.Variable(np.arange(0, 10), name='b')

Calling b shows the following:

<tf.Variable âb:0′ shape=(10,) dtype=int32, numpy=array([0,

1, 2, 3, 4, 5, 6, 7, 8, 9])>

Note the numpy value of the tensor is an array. Because the

numpy variable passed during the instantiation is a range of int32

values, we canât add it directly to c as c is of float32 type.

Therefore, the tf.cast operation, which changes the type of a

tensor, first needs to be utilized like so:

d = tf.cast(b, tf.float32) + c

Running the rest of the previous operations, using the new b

tensor, gives the following value for a:

Variable a is [ 3. 6. 9. 12. 15. 18. 21. 24. 27. 30.]

In numpy, the developer can directly access slices or individual

indices of an array and change their values directly. Can the same

be done in TensorFlow 2? Can individual indices and/or slices be

accessed and changed? The answer is yes, but not quite as

straight-forwardly as in numpy. For instance, if b was a simple

numpy array, one could easily execute the following b[1] = 10 â

this would change the value of the second element in the array to

the integer 10.

b[1].assign(10)

This will then flow through to a like so:

Variable a is [ 3. 33. 9. 12. 15. 18. 21. 24. 27. 30.]

The developer could also run the following, to assign a slice of

b values:

b[6:9].assign([10, 10, 10])

A new tensor can also be created by using the slice

notation:

f = b[2:5]

The explanations and code above show you how to perform some

basic tensor manipulations and operations. In the section below, an

example will be presented where a neural network is created using

the Eager paradigm in TensorFlow 2. It will show how to create a

training loop, perform a feed-forward pass through a neural network

and calculate and apply gradients to an optimization method.

3.0 A Neural Network Example

In this section, a simple three-layer neural network build in

TensorFlow is demonstrated. In following chapters more complicated

neural network structures such as convolution neural networks and

recurrent neural networks are covered. For this example, though,

it will be kept simple.

In this example, the MNIST dataset will be used that is packaged

as part of the TensorFlow installation. This MNIST dataset is a set

of 28Ã28 pixel grayscale images which represent hand-written

digits. It has 60,000 training rows, 10,000 testing rows, and

5,000 validation rows. It is a very common, basic, image

classification dataset that is used in machine learning.

The data can be loaded by running the following:

from tensorflow.keras.datasets import mnist (x_train, y_train), (x_test, y_test) = mnist.load_data()

As can be observed, the Keras MNIST data loader returns Python

tuples corresponding to the training and test set respectively

(Keras is another deep learning framework, now tightly integrated

with TensorFlow, as mentioned earlier). The data sizes of the

tuples defined above are:

- x_train: (60,000 x 28 x 28)
- y_train: (60,000)
- x_test: (10,000 x 28 x 28)
- y_test: (10,000)

The x data is the image information â 60,000 images of 28 x 28

pixels size in the training set. The images are grayscale (i.e

black and white) with maximum values, specifying the intensity of

whites, of 255. The x data will need to be scaled so that it

resides between 0 and 1, as this improves training efficiency. The

y data is the matching image labels â signifying what digit is

displayed in the image. This will need to be transformed to

âone-hotâ format.

When using a standard, categorical cross-entropy loss function

(this will be shown later), a one-hot format is required when

training classification tasks, as the output layer of the neural

network will have the same number of nodes as the total number of

possible classification labels. The output node with the highest

value is considered as a prediction for that corresponding label.

For instance, in the MNIST task, there are 10 possible

classification labels â 0 to 9. Therefore, there will be 10

output nodes in any neural network performing this classification

task. If we have an example output vector of [0.01, 0.8, 0.25,

0.05, 0.10, 0.27, 0.55, 0.32, 0.11, 0.09], the maximum value is in

the second position / output node, and therefore this corresponds

to the digit â1â. To train the network to produce this sort of

outcome when the digit â1â appears, the loss needs to be

calculated according to the difference between the output of the

network and a âone-hotâ array of the label 1. This one-hot

array looks like [0, 1, 0, 0, 0, 0, 0, 0, 0, 0].

This conversion is easily performed in TensorFlow, as will be

demonstrated shortly when the main training loop is covered.

One final thing that needs to be considered is how to extract

the training data in batches of samples. The function below can

handle this:

def get_batch(x_data, y_data, batch_size): idxs = np.random.randint(0, len(y_data), batch_size) return x_data[idxs,:,:], y_data[idxs]

As can be observed in the code above, the data to be batched

i.e. the x and y data is passed to this function along with the

batch size. The first line of the function generates a random

vector of integers, with random values between 0 and the length of

the data passed to the function. The number of random integers

generated is equal to the batch size. The x and y data are then

returned, but the return data is only for those random indices

chosen. Note, that this is performed on numpy array objects â as

will be shown shortly, the conversion from numpy arrays to tensor

objects will be performed âon the flyâ within the training

loop.

There is also the requirement for a loss function and a

feed-forward function, but these will be covered shortly.

# Python optimisation variables epochs = 10 batch_size = 100 # normalize the input images by dividing by 255.0 x_train = x_train / 255.0 x_test = x_test / 255.0 # convert x_test to tensor to pass through model (train data will be converted to # tensors on the fly) x_test = tf.Variable(x_test)

First, the number of training epochs and the batch size are

created â note these are simple Python variables, not TensorFlow

variables. Next, the input training and test data, x_train and

x_test, are scaled so that their values are between 0 and 1. Input

data should always be scaled when training neural networks, as

large, uncontrolled, inputs can heavily impact the training

process. Finally, the test input data, x_test is converted into a

tensor. The random batching process for the training data is most

easily performed using numpy objects and functions. However, the

test data will not be batched in this example, so the full test

input data set x_test is converted into a tensor.

The next step is to setup the weight and bias variables for the

three-layer neural network. There are always L â 1 number of

weights/bias tensors, where L is the number of layers. These

variables are defined in the code below:

# now declare the weights connecting the input to the hidden layer W1 = tf.Variable(tf.random.normal([784, 300], stddev=0.03), name='W1') b1 = tf.Variable(tf.random.normal([300]), name='b1') # and the weights connecting the hidden layer to the output layer W2 = tf.Variable(tf.random.normal([300, 10], stddev=0.03), name='W2') b2 = tf.Variable(tf.random.normal([10]), name='b2')

The weight and bias variables are initialized using the

tf.random.normal function â this function creates tensors of

random numbers, drawn from a normal distribution. It allows the

developer to specify things like the standard deviation of the

distribution from which the random numbers are drawn.

Note the shape of the variables. The W1 variable is a [784, 300]

tensor â the 784 nodes are the size of the input layer. This size

comes from the flattening of the input images â if we have 28

rows and 28 columns of pixels, flattening these out gives us 1 row

or column of 28 x 28 = 784 values. The 300 in the declaration of

W1 is the number of nodes in the hidden layer. The W2 variable is a

[300, 10] tensor, connecting the 300-node hidden layer to the

10-node output layer. In each case, a name is given to the variable

for later viewing in TensorBoard â the TensorFlow visualization

package. The next step in the code is to create the computations

that occur within the nodes of the network. If the reader recalls,

the computations within the nodes of a neural network are of the

following form:

$$z = Wx + b$$

$$h=f(z)$$

Where W is the weights matrix, x is the layer input vector, b is

the bias and f is the activation function of the node. These

calculations comprise the feed-forward pass of the input data

through the neural network. To execute these calculations, a

dedicated feed-forward function is created:

def nn_model(x_input, W1, b1, W2, b2): # flatten the input image from 28 x 28 to 784 x_input = tf.reshape(x_input, (x_input.shape[0], -1)) x = tf.add(tf.matmul(tf.cast(x_input, tf.float32), W1), b1) x = tf.nn.relu(x) logits = tf.add(tf.matmul(x, W2), b2) return logits

Examining the first line, the x_input data is reshaped from

(batch_size, 28, 28) to (batch_size, 784) â in other words, the

images are flattened out. On the next line, the input data is then

converted to tf.float32 type using the TensorFlow cast function.

This is important â the xÂ_input data comes in as tf.float64

type, and TensorFlow wonât perform a matrix multiplication

operation (tf.matmul) between tensors of different data types. This

re-typed input data is then matrix-multiplied by W1 using the

TensorFlow matmul function (which stands for matrix

multiplication). Then the bias b1 is added to this product. On the

line after this, the ReLU activation function is applied to the

output of this line of calculation. The ReLU function is usually

the best activation function to use in deep learning â the

reasons for this are discussed in

this post.

The output of this calculation is then multiplied by the final

set of weights W2, with the bias b2 added. The output of this

calculation is titled logits. Note that no activation function has

been applied to this output layer of nodes (yet). In machine/deep

learning, the term âlogitsâ refers to the un-activated output

of a layer of nodes.

The reason no activation function has been applied to this layer

is that there is a handy function in TensorFlow called

tf.nn.softmax_cross_entropy_with_logits. This function does two

things for the developer â it applies a softmax activation

function to the logits, which transforms them into a

quasi-probability (i.e. the sum of the output nodes is equal to 1).

This is a common activation function to apply to an output layer in

classification tasks. Next, it applies the cross-entropy loss

function to the softmax activation output. The cross-entropy loss

function is a commonly used loss in classification tasks. The

theory behind it is quite interesting, but it wonât be covered in

this book â a good summary can be found

here. The code below applies this handy TensorFlow function,

and in this example, it has been nested in another function called

loss_fn:

def loss_fn(logits, labels): cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits)) return cross_entropy

The arguments to softmax_cross_entropy_with_logits are labels

and logits. The logits argument is supplied from the outcome of the

nn_model function. The usage of this function in the main training

loop will be demonstrated shortly. The labels argument is supplied

from the one-hot y values that are fed into loss_fn during the

training process. The output of the

softmax_cross_entropy_with_logits function will be the output of

the cross-entropy loss value for each sample in the batch. To train

the weights of the neural network, the average cross-entropy loss

across the samples needs to be minimized as part of the

optimization process. This is calculated by using the

tf.reduce_mean function, which, unsurprisingly, calculates the mean

of the tensor supplied to it.

The next step is to define an optimizer function. In many

examples within this book, the versatile Adam optimizer will be

used. The theory behind this optimizer is interesting, and is worth

further examination (such as shown here)

but wonât be covered in detail within this post. It is basically

a gradient descent method, but with sophisticated averaging of the

gradients to provide appropriate momentum to the learning. To

define the optimizer, which will be used in the main training loop,

the following code is run:

# setup the optimizer optimizer = tf.keras.optimizers.Adam()

The Adam object can take a learning rate as input, but for the

present purposes, the default value is used.

3.1 Training the network

Now that the appropriate functions, variables and optimizers

have been created, it is time to define the overall training loop.

The training loop is shown below:

total_batch = int(len(y_train) / batch_size) for epoch in range(epochs): avg_loss = 0 for i in range(total_batch): batch_x, batch_y = get_batch(x_train, y_train, batch_size=batch_size) # create tensors batch_x = tf.Variable(batch_x) batch_y =..