Categories
Misc

Defining loss when number of inputs is greater than number of outputs

I have trained a model which outputs multiple images (say 2) at
a time, and takes in multiple inputs (say 5) to do so. However my
loss (MSE) is supposed to apply to only 2 pairs made of 2 of the
input and output. Meaning I define my model as:

themodel = Model( [ip1,ip2,ip3,ip4,ip5], [op1,op2] ) themodel.compile( optimizer='Adam', loss=['mse','mse'] ) 

My model seems to train correctly, I just couldn’t find
confirmation in the docs (compile
method)
of how Tf takes care of which tensors to apply the loss
to. My assumption is that it applies the first loss to the first
ip-op pair, and so on till output tensors are available and nothing
is applied to remaining input tensors. Is this what is happening?
Also another silly question, does this mean for multiple output
models, loss has to be defined for all the outputs, and is not
related to the number of inputs?

submitted by /u/juggy94

[visit reddit]

[comments]

Categories
Misc

Kubeflow: Cloud Native ML Toolbox


Kubeflow: Cloud Native ML Toolbox
submitted by /u/SoulmanIqbal

[visit reddit]

[comments]
Categories
Offsites

Convolutional LSTM for spatial forecasting

This post is the first in a loose series exploring forecasting
of spatially-determined data over time. By spatially-determined I
mean that whatever the quantities we’re trying to predict – be
they univariate or multivariate time series, of spatial
dimensionality or not – the input data are given on a spatial
grid.

For example, the input could be atmospheric measurements, such
as sea surface temperature or pressure, given at some set of
latitudes and longitudes. The target to be predicted could then
span that same (or another) grid. Alternatively, it could be a
univariate time series, like a meteorological index.

But wait a second, you may be thinking. For time-series
prediction, we have that time-honored set of recurrent
architectures (e.g., LSTM, GRU), right? Right. We do; but, once we
feed spatial data to an RNN, treating different locations as
different input features, we lose an essential structural
relationship. Importantly, we need to operate in both space and
time. We want both: recurrence relations and convolutional filters.
Enter convolutional RNNs.

What to expect from this post

Today, we won’t jump into real-world applications just yet.
Instead, we’ll take our time to build a convolutional LSTM
(henceforth: convLSTM) in torch. For one, we have to – there is
no official PyTorch implementation.

Keras, on the other hand, has one. If you’re interested in
quickly playing around with a Keras convLSTM, check out this
nice example
.

What’s more, this post can serve as an introduction to
building your own modules. This is something you may be familiar
with from Keras or not – depending on whether you’ve used
custom models or rather, preferred the declarative define ->
compile -> fit style. (Yes, I’m implying there’s some
transfer going on if one comes to torch from Keras custom training.
Syntactic and semantic details may be different, but both share the
object-oriented style that allows for great flexibility and
control.)

Last but not least, we’ll also use this as a hands-on
experience with RNN architectures (the LSTM, specifically). While
the general concept of recurrence may be easy to grasp, it is not
necessarily self-evident how those architectures should, or could,
be coded. Personally, I find that independent of the framework
used, RNN-related documentation leaves me confused. What exactly is
being returned from calling an LSTM, or a GRU? (In Keras this
depends on how you’ve defined the layer in question.) I suspect
that once we’ve decided what we want to return, the actual code
won’t be that complicated. Consequently, we’ll take a detour
clarifying what it is that torch and Keras are giving us.
Implementing our convLSTM will be a lot more straightforward
thereafter.

A torch convLSTM

The code discussed here may be found on GitHub. (Depending on
when you’re reading this, the code in that repository may have
evolved though.)

My starting point was one of the PyTorch implementations found
on the net, namely,
this one
. If you search for “PyTorch convGRU” or “PyTorch
convLSTM”, you will find stunning discrepancies in how these are
realized – discrepancies not just in syntax and/or engineering
ambition, but on the semantic level, right at the center of what
the architectures may be expected to do. As they say, let the buyer
beware. (Regarding the implementation I ended up porting, I am
confident that while numerous optimizations will be possible, the
basic mechanism matches my expectations.)

What do I expect? Let’s approach this task in a top-down
way.

Input and output

The convLSTM’s input will be a time series of spatial data,
each observation being of size (time steps, channels, height,
width).

Compare this with the usual RNN input format, be it in torch or
Keras. In both frameworks, RNNs expect tensors of size (timesteps,
input_dim)1. input_dim is
(1) for univariate time series and greater than (1) for
multivariate ones. Conceptually, we may match this to convLSTM’s
channels dimension: There could be a single channel, for
temperature, say – or there could be several, such as for
pressure, temperature, and humidity. The two additional dimensions
found in convLSTM, height and width, are spatial indexes into the
data.

In sum, we want to be able to pass data that:

  • consist of one or more features,

  • evolve in time, and

  • are indexed in two spatial dimensions.

How about the output? We want to be able to return forecasts for
as many time steps as we have in the input sequence. This is
something that torch RNNs do by default, while Keras equivalents do
not. (You have to pass return_sequences = TRUE to obtain that
effect.) If we’re interested in predictions for just a single
point in time, we can always pick the last time step in the output
tensor.

However, with RNNs, it is not all about outputs. RNN
architectures also carry through hidden states.

What are hidden states? I carefully phrased that sentence to be
as general as possible – deliberately circling around the
confusion that, in my view, often arises at this point. We’ll
attempt to clear up some of that confusion in a second, but let’s
first finish our high-level requirements specification.

We want our convLSTM to be usable in different contexts and
applications. Various architectures exist that make use of hidden
states, most prominently perhaps, encoder-decoder architectures.
Thus, we want our convLSTM to return those as well. Again, this is
something a torch LSTM does by default, while in Keras it is
achieved using return_state = TRUE.

Now though, it really is time for that interlude. We’ll sort
out the ways things are called by both torch and Keras, and inspect
what you get back from their respective GRUs and LSTMs.

Interlude: Outputs, states, hidden values … what’s what?

For this to remain an interlude, I summarize findings on a high
level. The code snippets in the appendix show how to arrive at
these results. Heavily commented, they probe return values from
both Keras and torch GRUs and LSTMs. Running these will make the
upcoming summaries seem a lot less abstract.

First, let’s look at the ways you create an LSTM in both
frameworks. (I will generally use LSTM as the “prototypical RNN
example”, and just mention GRUs when there are differences
significant in the context in question.)

In Keras, to create an LSTM you may write something like
this:

lstm <- layer_lstm(units = 1)

The torch equivalent would be:

lstm <- nn_lstm( input_size = 2, # number of input features hidden_size = 1 # number of hidden (and output!) features )

Don’t focus on torch‘s input_size parameter for this
discussion. (It’s the number of features in the input tensor.)
The parallel occurs between Keras’ units and torch’s
hidden_size. If you’ve been using Keras, you’re probably
thinking of units as the thing that determines output size
(equivalently, the number of features in the output). So when torch
lets us arrive at the same result using hidden_size, what does that
mean? It means that somehow we’re specifying the same thing,
using different terminology. And it does make sense, since at every
time step current input and previous hidden state are added2:

[ mathbf{h}_t = mathbf{W}_{x}mathbf{x}_t +
mathbf{W}_{h}mathbf{h}_{t-1} ]

Now, about those hidden states.

When a Keras LSTM is defined with return_state = TRUE, its
return value is a structure of three entities called output, memory
state, and carry state. In torch, the same entities are referred to
as output, hidden state, and cell state. (In torch, we always get
all of them.)

So are we dealing with three different types of entities? We are
not.

The cell, or carry state is that special thing that sets apart
LSTMs from GRUs deemed responsible for the “long” in “long
short-term memory”. Technically, it could be reported to the user
at all points in time; as we’ll see shortly though, it is
not.

What about outputs and hidden, or memory states? Confusingly,
these really are the same thing. Recall that for each input item in
the input sequence, we’re combining it with the previous state,
resulting in a new state, to be made used of in the next
step3:

[ mathbf{h}_t = mathbf{W}_{x}mathbf{x}_t +
mathbf{W}_{h}mathbf{h}_{t-1} ]

Now, say that we’re interested in looking at just the final
time step – that is, the default output of a Keras LSTM. From
that point of view, we can consider those intermediate computations
as “hidden”. Seen like that, output and hidden states feel
different.

However, we can also request to see the outputs for every time
step. If we do so, there is no difference – the
outputs (plural) equal the hidden states. This can
be verified using the code in the appendix.

Thus, of the three things returned by an LSTM, two are really
the same. How about the GRU, then? As there is no “cell state”,
we really have just one type of thing left over – call it outputs
or hidden states.

Let’s summarize this in a table.

Table 1: RNN terminology. Comparing torch-speak and
Keras-speak. In row 1, the terms are parameter names. In rows 2 and
3, they are pulled from current documentation.

Referring to this entity: torch says: Keras says:

Number of features in the output

This determines both how many output features there are and the
dimensionality of the hidden states.

hidden_size units

Per-time-step output; latent state; intermediate state …

This could be named “public state” in the sense that we, the
users, are able to obtain all values.

hidden state memory state

Cell state; inner state … (LSTM only)

This could be named “private state” in that we are able to
obtain a value only for the last time step. More on that in a
second.

cell state carry state

Now, about that public vs.private distinction. In both
frameworks, we can obtain outputs (hidden states) for every time
step. The cell state, however, we can access only for the very last
time step. This is purely an implementation decision. As we’ll
see when building our own recurrent module, there are no obstacles
inherent in keeping track of cell states and passing them back to
the user.

If you dislike the pragmatism of this distinction, you can
always go with the math. When a new cell state has been computed
(based on prior cell state, input, forget, and cell gates – the
specifics of which we are not going to get into here), it is
transformed to the hidden (a.k.a. output) state making use of yet
another, namely, the output gate:

[ h_t = o_t odot tanh(c_t) ]

Definitely, then, hidden state (output, resp.) builds on cell
state, adding additional modeling power.

Now it is time to get back to our original goal and build that
convLSTM. First though, let’s summarize the return values
obtainable from torch and Keras.

Table 2: Contrasting ways of obtaining various return values
in torch vs. Keras. Cf. the appendix for complete examples.

To achieve this goal: in torch do: in Keras do:
access all intermediate outputs ( = per-time-step outputs) ret[[1]] return_sequences = TRUE
access both “hidden state” (output) and “cell state”
from final time step (only!)
ret[[2]] return_state = TRUE
access all intermediate outputs and the final “cell
state”
both of the above return_sequences = TRUE, return_state = TRUE
access all intermediate outputs and “cell states” from all
time steps
no way no way

convLSTM, the plan

In both torch and Keras RNN architectures, single time steps are
processed by corresponding Cell classes: There is an LSTM Cell
matching the LSTM, a GRU Cell matching the GRU, and so on. We do
the same for ConvLSTM. In convlstm_cell(), we first define what
should happen to a single observation; then in convlstm(), we build
up the recurrence logic.

Once we’re done, we create a dummy dataset, as
reduced-to-the-essentials as can be. With more complex datasets,
even artificial ones, chances are that if we don’t see any
training progress, there are hundreds of possible explanations. We
want a sanity check that, if failed, leaves no excuses. Realistic
applications are left to future posts.

A single step: convlstm_cell

Our convlstm_cell’s constructor takes arguments input_dim ,
hidden_dim, and bias, just like a torch LSTM Cell.

But we’re processing two-dimensional input data. Instead of
the usual affine combination of new input and previous state, we
use a convolution of kernel size kernel_size. Inside convlstm_cell,
it is self$conv that takes care of this.

Note how the channels dimension, which in the original input
data would correspond to different variables, is creatively used to
consolidate four convolutions into one: Each channel output will be
passed to just one of the four cell gates. Once in possession of
the convolution output, forward() applies the gate logic, resulting
in the two types of states it needs to send back to the caller.

library(torch) library(zeallot) convlstm_cell <- nn_module( initialize = function(input_dim, hidden_dim, kernel_size, bias) { self$hidden_dim <- hidden_dim padding <- kernel_size %/% 2 self$conv <- nn_conv2d( in_channels = input_dim + self$hidden_dim, # for each of input, forget, output, and cell gates out_channels = 4 * self$hidden_dim, kernel_size = kernel_size, padding = padding, bias = bias ) }, forward = function(x, prev_states) { c(h_prev, c_prev) %<-% prev_states combined <- torch_cat(list(x, h_prev), dim = 2) # concatenate along channel axis combined_conv <- self$conv(combined) c(cc_i, cc_f, cc_o, cc_g) %<-% torch_split(combined_conv, self$hidden_dim, dim = 2) # input, forget, output, and cell gates (corresponding to torch's LSTM) i <- torch_sigmoid(cc_i) f <- torch_sigmoid(cc_f) o <- torch_sigmoid(cc_o) g <- torch_tanh(cc_g) # cell state c_next <- f * c_prev + i * g # hidden state h_next <- o * torch_tanh(c_next) list(h_next, c_next) }, init_hidden = function(batch_size, height, width) { list( torch_zeros(batch_size, self$hidden_dim, height, width, device = self$conv$weight$device), torch_zeros(batch_size, self$hidden_dim, height, width, device = self$conv$weight$device)) } )

Now convlstm_cell has to be called for every time step. This is
done by convlstm.

Iteration over time steps: convlstm

A convlstm may consist of several layers, just like a torch
LSTM. For each layer, we are able to specify hidden and kernel
sizes individually.

During initialization, each layer gets its own convlstm_cell. On
call, convlstm executes two loops. The outer one iterates over
layers. At the end of each iteration, we store the final pair
(hidden state, cell state) for later reporting. The inner loop runs
over input sequences, calling convlstm_cell at each time step.

We also keep track of intermediate outputs, so we’ll be able
to return the complete list of hidden_states seen during the
process. Unlike a torch LSTM, we do this for every layer.

convlstm <- nn_module( # hidden_dims and kernel_sizes are vectors, with one element for each layer in n_layers initialize = function(input_dim, hidden_dims, kernel_sizes, n_layers, bias = TRUE) { self$n_layers <- n_layers self$cell_list <- nn_module_list() for (i in 1:n_layers) { cur_input_dim <- if (i == 1) input_dim else hidden_dims[i - 1] self$cell_list$append(convlstm_cell(cur_input_dim, hidden_dims[i], kernel_sizes[i], bias)) } }, # we always assume batch-first forward = function(x) { c(batch_size, seq_len, num_channels, height, width) %<-% x$size() # initialize hidden states init_hidden <- vector(mode = "list", length = self$n_layers) for (i in 1:self$n_layers) { init_hidden[[i]] <- self$cell_list[[i]]$init_hidden(batch_size, height, width) } # list containing the outputs, of length seq_len, for each layer # this is the same as h, at each step in the sequence layer_output_list <- vector(mode = "list", length = self$n_layers) # list containing the last states (h, c) for each layer layer_state_list <- vector(mode = "list", length = self$n_layers) cur_layer_input <- x hidden_states <- init_hidden # loop over layers for (i in 1:self$n_layers) { # every layer's hidden state starts from 0 (non-stateful) c(h, c) %<-% hidden_states[[i]] # outputs, of length seq_len, for this layer # equivalently, list of h states for each time step output_sequence <- vector(mode = "list", length = seq_len) # loop over time steps for (t in 1:seq_len) { c(h, c) %<-% self$cell_list[[i]](cur_layer_input[ , t, , , ], list(h, c)) # keep track of output (h) for every time step # h has dim (batch_size, hidden_size, height, width) output_sequence[[t]] <- h } # stack hs for all time steps over seq_len dimension # stacked_outputs has dim (batch_size, seq_len, hidden_size, height, width) # same as input to forward (x) stacked_outputs <- torch_stack(output_sequence, dim = 2) # pass the list of outputs (hs) to next layer cur_layer_input <- stacked_outputs # keep track of list of outputs or this layer layer_output_list[[i]] <- stacked_outputs # keep track of last state for this layer layer_state_list[[i]] <- list(h, c) } list(layer_output_list, layer_state_list) } )

Calling the convlstm

Let’s see the input format expected by convlstm, and how to
access its different outputs.

Here is a suitable input tensor.

# batch_size, seq_len, channels, height, width x <- torch_rand(c(2, 4, 3, 16, 16))

First we make use of a single layer.

model <- convlstm(input_dim = 3, hidden_dims = 5, kernel_sizes = 3, n_layers = 1) c(layer_outputs, layer_last_states) %<-% model(x)

We get back a list of length two, which we immediately split up
into the two types of output returned: intermediate outputs from
all layers, and final states (of both types) for the last
layer.

With just a single layer, layer_outputs[[1]]holds all of the
layer’s intermediate outputs, stacked on dimension two.

dim(layer_outputs[[1]]) # [1] 2 4 5 16 16

layer_last_states[[1]]is a list of tensors, the first of which
holds the single layer’s final hidden state, and the second, its
final cell state.

dim(layer_last_states[[1]][[1]]) # [1] 2 5 16 16 dim(layer_last_states[[1]][[2]]) # [1] 2 5 16 16

For comparison, this is how return values look for a multi-layer
architecture.

model <- convlstm(input_dim = 3, hidden_dims = c(5, 5, 1), kernel_sizes = rep(3, 3), n_layers = 3) c(layer_outputs, layer_last_states) %<-% model(x) # for each layer, tensor of size (batch_size, seq_len, hidden_size, height, width) dim(layer_outputs[[1]]) # 2 4 5 16 16 dim(layer_outputs[[3]]) # 2 4 1 16 16 # list of 2 tensors for each layer str(layer_last_states) # List of 3 # $ :List of 2 # ..$ :Float [1:2, 1:5, 1:16, 1:16] # ..$ :Float [1:2, 1:5, 1:16, 1:16] # $ :List of 2 # ..$ :Float [1:2, 1:5, 1:16, 1:16] # ..$ :Float [1:2, 1:5, 1:16, 1:16] # $ :List of 2 # ..$ :Float [1:2, 1:1, 1:16, 1:16] # ..$ :Float [1:2, 1:1, 1:16, 1:16] # h, of size (batch_size, hidden_size, height, width) dim(layer_last_states[[3]][[1]]) # 2 1 16 16 # c, of size (batch_size, hidden_size, height, width) dim(layer_last_states[[3]][[2]]) # 2 1 16 16

Now we want to sanity-check this module with the
simplest-possible dummy data.

Sanity-checking the convlstm

We generate black-and-white “movies” of diagonal beams
successively translated in space.

Each sequence consists of six time steps, and each beam of six
pixels. Just a single sequence is created manually. To create that
one sequence, we start from a single beam:

library(torchvision) beams <- vector(mode = "list", length = 6) beam <- torch_eye(6) %>% nnf_pad(c(6, 12, 12, 6)) # left, right, top, bottom beams[[1]] <- beam

Using torch_roll() , we create a pattern where this beam moves
up diagonally, and stack the individual tensors along the timesteps
dimension.

for (i in 2:6) { beams[[i]] <- torch_roll(beam, c(-(i-1),i-1), c(1, 2)) } init_sequence <- torch_stack(beams, dim = 1)

That’s a single sequence. Thanks to
torchvision::transform_random_affine(), we almost effortlessly
produce a dataset of a hundred sequences. Moving beams start at
random points in the spatial frame, but they all share that
upward-diagonal motion.

sequences <- vector(mode = "list", length = 100) sequences[[1]] <- init_sequence for (i in 2:100) { sequences[[i]] <- transform_random_affine(init_sequence, degrees = 0, translate = c(0.5, 0.5)) } input <- torch_stack(sequences, dim = 1) # add channels dimension input <- input$unsqueeze(3) dim(input) # [1] 100 6 1 24 24

That’s it for the raw data. Now we still need a dataset and a
dataloader. Of the six time steps, we use the first five as input
and try to predict the last one.

dummy_ds <- dataset( initialize = function(data) { self$data <- data }, .getitem = function(i) { list(x = self$data[i, 1:5, ..], y = self$data[i, 6, ..]) }, .length = function() { nrow(self$data) } ) ds <- dummy_ds(input) dl <- dataloader(ds, batch_size = 100)

Here is a tiny-ish convLSTM, trained for motion prediction:

model <- convlstm(input_dim = 1, hidden_dims = c(64, 1), kernel_sizes = c(3, 3), n_layers = 2) optimizer <- optim_adam(model$parameters) num_epochs <- 100 for (epoch in 1:num_epochs) { model$train() batch_losses <- c() for (b in enumerate(dl)) { optimizer$zero_grad() # last-time-step output from last layer preds <- model(b$x)[[2]][[2]][[1]] loss <- nnf_mse_loss(preds, b$y) batch_losses <- c(batch_losses, loss$item()) loss$backward() optimizer$step() } if (epoch %% 10 == 0) cat(sprintf("nEpoch %d, training loss:%3fn", epoch, mean(batch_losses))) }
Epoch 10, training loss:0.008522 Epoch 20, training loss:0.008079 Epoch 30, training loss:0.006187 Epoch 40, training loss:0.003828 Epoch 50, training loss:0.002322 Epoch 60, training loss:0.001594 Epoch 70, training loss:0.001376 Epoch 80, training loss:0.001258 Epoch 90, training loss:0.001218 Epoch 100, training loss:0.001171

Loss decreases, but that in itself is not a guarantee the model
has learned anything. Has it? Let’s inspect its forecast for the
very first sequence and see.

For printing, I’m zooming in on the relevant region in the
24×24-pixel frame. Here is the ground truth for time step six:

b$y[1, 1, 6:15, 10:19]
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

And here is the forecast. This does not look bad at all, given
there was neither experimentation nor tuning involved.

round(as.matrix(preds[1, 1, 6:15, 10:19]), 2)
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 [2,] -0.02 0.36 0.01 0.06 0.00 0.00 0.00 0.00 0.00 0 [3,] 0.00 -0.01 0.71 0.01 0.06 0.00 0.00 0.00 0.00 0 [4,] -0.01 0.04 0.00 0.75 0.01 0.06 0.00 0.00 0.00 0 [5,] 0.00 -0.01 -0.01 -0.01 0.75 0.01 0.06 0.00 0.00 0 [6,] 0.00 0.01 0.00 -0.07 -0.01 0.75 0.01 0.06 0.00 0 [7,] 0.00 0.01 -0.01 -0.01 -0.07 -0.01 0.75 0.01 0.06 0 [8,] 0.00 0.00 0.01 0.00 0.00 -0.01 0.00 0.71 0.00 0 [9,] 0.00 0.00 0.00 0.01 0.01 0.00 0.03 -0.01 0.37 0 [10,] 0.00 0.00 0.00 0.00 0.00 0.00 -0.01 -0.01 -0.01 0

This should suffice for a sanity check. If you made it till the
end, thanks for your patience! In the best case, you’ll be able
to apply this architecture (or a..

Categories
Misc

Paid ML gigs: Get compensated while further sharpening your skills on your own schedule.

submitted by /u/MLtinkerer

[visit reddit]

[comments]

Categories
Misc

`set_session` is not available when using TensorFlow 2.0.

Hi All.

I am using Keras and Tensorflow 2.0. I have code that tries to
set the number of inter and intra op threads. I have added the
session stuff for compatability, but it still won’t work right.

from keras import backend as K

….

….

import tensorflow as tf

session_conf =
tf.compat.v1.ConfigProto(inter_op_parallelism_threads=int(os.environ[‘NUM_INTER_THREADS’]),

intra_op_parallelism_threads=int(os.environ[‘NUM_INTRA_THREADS’]))

sess =
tf.compat.v1.Session(graph=tf.compat.v1.get_default_graph(),
config=session_conf)

K.set_session(sess)

Then it blows up with:

RuntimeError: `set_session` is not available when using
TensorFlow 2.0.

Any advice?

submitted by /u/dunn_ditty

[visit reddit]

[comments]

Categories
Misc

All AIs on Quality: Startup’s NVIDIA Jetson-Enabled Inspections Boost Manufacturing

Once the founder of a wearable computing startup, Arye Barnehama understands the toils of manufacturing consumer devices. He moved to Shenzhen in 2014 to personally oversee production lines for his brain waves-monitoring headband, Melon. It was an experience that left an impression: manufacturing needed automation. His next act is Elementary Robotics, which develops robotics for Read article >

The post All AIs on Quality: Startup’s NVIDIA Jetson-Enabled Inspections Boost Manufacturing appeared first on The Official NVIDIA Blog.

Categories
Misc

Pinterest Trains Visual Search Faster with Optimized Architecture on NVIDIA GPUs

Pinterest now has more than 440 million reasons to offer the best visual search experience. That’s because its monthly active users are tracking this high for its popular image sharing and social media service. Visual search enables Pinterest users to search for images using text, screenshots or camera photos. It’s the core AI behind how Read article >

The post Pinterest Trains Visual Search Faster with Optimized Architecture on NVIDIA GPUs appeared first on The Official NVIDIA Blog.

Categories
Misc

NVIDIA SimNet v20.12 Released

With this release, use cases such as heat sinks, data center cooling, aerodynamics and deformation of solids in linear elastic regime can be solved.

NVIDIA recently announced the release of SimNet v20.12 with support for new physics such as Fluid Mechanics, Linear Elasticity and Conductive as well as Convective Heat Transfer. Systems governed by Ordinary Differential Equations (ODEs) as well as Partial Differential Equations (PDEs) can now be solved. With this release, use cases such as heat sinks, data center cooling, aerodynamics and deformation of solids in linear elastic regime can be solved. 

Previously announced in Sep, NVIDIA SimNet is a Physics Informed Neural Networks (PINNs) toolkit for students and researchers who are either looking to get started with AI-driven physics simulations or are looking to leverage a powerful framework to implement their domain knowledge to solve complex nonlinear physics problems with real-world applications. 

SimNet v20.12 highlights 

Multi-parameter training of Complex Geometries and Physics:  

As a result of enhancements in network architectures as well as performance improvements, SimNet v20.12 converges to a lower loss faster. This enables training on several parameters in a single run. For a 10-parameter Limerock, training and inference for 59,049 configurations (3 values for each design parameter) took 1000 V100 GPU hours. For same number of solver runs, the solver would take over 18.4 million hours (with 26 hours/configuration for a 12-core workstation) 

Linear Elasticity in Solids:  

Linear elastic solid deformation is now included in the release in both Navier-Cauchy as well as Equilibrium forms. The solution has good agreement with finite element results.  

The stresses from the linear elasticity formulation from SimNet were used in a digital twin model, developed by University of Central Florida, using RNN to model fatigue crack growth in an aircraft panel.   

Improved STL geometry library: 

The PySDF library for STL geometries has been enhanced for about 10x more performance with better accuracy for complex geometries. 

Integral form of Partial Differential Equations:  

Some physics problems have no classical PDE (or strong) form but only a variational (or weak) form. This requires handling the PDEs in a different approach other than its original (classical) form, especially for interface problem, concave domain, singular problem, etc. In SimNet, the PDEs can be solved not only in their strong form, but also in their weak form. 

For example, a point source represented by delta Dirac function cannot be solved by the differential equations based PINNs but an integral form can capture the singular behavior at the center. 

Strong Scaling Performance:   

For the multi-GPU cases, the learning rate is gradually increased from the baseline case and this allows the model to train without diverging early on and allows the model to converge faster as a result of the increased global batch size coupled with the increased learning rate. The loss function evolution as the number of GPUs is increased from 1 to 16 for the NVSwitch heat sink case shows a progressive scaling from 2x for the 2 GPU case to 8x for the 16 GPU case. 

SimNet in other news / events: 

Read the paper, NVIDIA SimNet: an AI-accelerated multi-physics simulation framework here

Give SimNet v20.12 a try by requesting access today. 

Categories
Misc

Refreshing a Live Service Game

We talked to Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline.

How NetEase Thunder Fire Games keeps their Massively Multiplayer Online Game (MMO)  “Justice” looking new years after release.

Delivering an endless stream of content to players in a live service game is an enormous undertaking. Managing that responsibility while staying graphically competitive is a herculean feat. 

The fidelity bar is constantly being raised. Most games released in 2018 didn’t support real-time ray-tracing. Now, it’s a feature that players expect in cutting-edge games, and it’s been integrated into a wide range of titles. Justice – NetEase’s popular Chinese MMO – runs on an engine that debuted in 2012, but the game is beautiful by 2020 standards. This is thanks to talented artists, smart engine design, and the integration of real-time ray tracing and DLSS.

We talked with Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline.

NVIDIA: What is the development team size for Justice?

Qian: More than 300 members in the whole development team, while there are 20 members in the game engine teach team.

NVIDIA: Why did you decide to add ray traced effects into the game?

Qian: Applying ray tracing technology into the real-time rendering field, especially the gaming field, has always been the dream of our game developers, but it was impossible to achieve before due to the performance limitation. In 2018, NVIDIA launched the first RTX GPU, which paved the way for this dream to become true, and we did not hesitate to decide trying it in Justice.

NVIDIA: A lot of developers starting out with real-time ray tracing struggle with performance because they try to make everything reflective. Do you have any advice on materials to use when building an environment that will be ray traced?

Qian: There are still many optimization methods. For example, materials with high roughness in the scene do not need to participate in raytracing. In addition, if the game engine is based on the Deferred Rendering architecture, rays can be emitted in the screen space based on the GBuffer information to reduce the times of ray bouncing.

NVIDIA: How long did it take to add RTXGI to your game? What does RTXGI do to improve the look of the game?

Qian: Before integration of RTXGI, we have already completed the DX12 upgrade to our game engine and RT & DLSS integration. With these works done, adding RTXGI to the game is an easy task, which took about 2 weeks to finish. RTXGI solves some problems of traditional GI: light leaking and excessively long baking time, and it supports dynamic light sources, which greatly improves the expressiveness of the scene.

NVIDIA: What were your team’s biggest personal learnings about real-time ray tracing from working on Justice?

Qian: First of all, if there is a breakthrough in technology, there must be sufficient accumulation. Secondly, the combined team effort is very important. Without the close cooperation among our team members and the dedicated collaboration with NVIDIA China team, this could not be possible.

NVIDIA: How were you able to balance computationally expensive real-time ray tracing features with performance?

Qian: Justice is an MMO open world game, RT features are now available to several suitable scenes, which can achieve a good balance between image quality and performance. And of course, with the help of the killer app: DLSS, we will gradually open more and more RT scenes later.

NVIDIA: Did you experience any bottlenecks or challenges when incorporating real-time ray tracing into the demo? If so, how did you overcome them?

Qian: As the first in-house game engine which integrated RTX function in China, there were tons of difficulties and challenges. For more than 2 months, our entire team basically only slept 3 or 4 hours a day. There were endless tech issues left to be solved. Here I would like to take this chance to thank the NVIDIA China team for their generous help, helping us overcome difficulties one by one, and we finally made it to achieve today’s accomplishment. 

NVIDIA: What has been made easier for your team with the integration of real-time ray tracing and DLSS into your pipeline?

Qian: It`s the advanced architecture of our engine. We can adopt RT and DLSS with only minor modifications to the render pipeline. Instead, DX12 API upgrade took the largest workload during the whole RTX development progress. 

NVIDIA: How is real-time ray tracing and DLSS changing game development?

Qian: From the perspective of artists, it brings out a brand-new content creation pipeline and a better, richer visual quality. And from a game design perspective, RT can bring whole new gameplay elements.

NVIDIA: How has your audience responded to the new look of your game, after real-time raytracing and DLSS has been added?

Qian: Players are very excited. They are fully affirmed for the performance of RT and DLSS. You can see this from Weibo and Baidu Tieba gamer communities. There are also many feedbacks from overseas players on YouTube. Of course, after being stage RTX content on GTC China2018 and CES2019 of Jensen`s Keynote, we were so confident on China game content being accepted world widely.

NVIDIA: If you had built the game from the ground up with real-time ray tracing and DLSS in mind, what would you have done differently?

Qian: We probably consider having DX12 API support in our engine in the first place.

NVIDIA: Are you planning to release any other games with real-time ray tracing and DLSS?

Qian: Yes, there are several games under development by NetEase Thunder Fire studio which will feature RTX technologies. Please stay tuned. 

NVIDIA: What real-time ray tracing effect in Justice are you most excited for your players to see?

Qian: All RT features can bring out more realistic representation of the game world. The most exciting one among them must be ray tracing reflections. 

NVIDIA: What advice would you give to other developers who are building live service games, and want to keep their games looking competitive graphically?

Qian: Our strategy is to provide players with the best experience regardless it’s still in development or has been released to the public. Therefore, as long as it is a technology that can enhance the player’s gaming experience, we will go all out to implement it in game. And of course, first of all, you should have a good engine with relatively good extensibility, because none of us knows what an advanced technology will look like in the future.

NVIDIA: Can you talk about any future plan you want to incorporate NVIDIA technology into Justice (such as NVIDIA Real-Time Denoiser, RTX Direct Illumination, etc.)?

Qian: In terms of technology, we have always been radical, and we will constantly push various new tech features, including those you mentioned. As long as these technologies can improve our players’ experience, we`ll work on it. 

NVIDIA: What made you decide on integrating the latest ray tracing technology which no games have tried before, e.g. releasing the first real-time RTX demo in China and the first RTXGI powered game in the world?

Qian: I will summarize here, mainly two points: 1) It fulfills the dream that real time ray tracing that can be applied to the game field; 2) We think these technologies can bring our players a better game experience. 

NVIDIA: Adding ray tracing effects into an in-house game engine could be more challenging than using existing commercial engines. If so, what are the challenges and what are the strengths of Justice’s engine?

Qian:I think the difficulties are the same, but the difficulty of using commercial engines has been solved by others. For in-house engine we must overcome these difficulties by ourselves. As I said before, in this matter, the biggest challenge was to upgrade the engine to DX12, because when we designed this engine 8 years ago, DX12 had not yet been released, and its features were unforeseen at the time. Another big challenge is how to balance between RT effects and performance. Fortunately, our team has very rich experience in independent research, and our engine architecture also has full freedom of horizontal and vertical expansion capabilities. NVIDIA China content team also gave us very strong support. Eventually, these tasks were successfully accomplished.

Categories
Misc

Deep regression

Have people been using deep learning to do regression? I noticed
that fitting polynomials using least squares leads to much better
accuracy! Is there any rule of thumb to get arbitrary accuracy with
deep regression?

submitted by /u/matibilkis

[visit reddit]

[comments]