Categories
Misc

Data Science

Hi, very fresh to ML in general, I’m overwhelmed by choice for loss functions and accuracy measurements. Doing my masters project and my supervisor wants me to teach myself tf and throw the data through a model.

I’m feeding in an array of particle data from a decay, and asking for it to predict a 1 or -1 at the end representing parity of the original particle.

What loss fn and accuracy metric is best to use for something like this.

Other parameters, about 2 million events, 6particles in each decay/event. Their 4 momenta in each. That makes 24 in put strings.

First layer is a tf.flatten,

submitted by /u/DriftingRumour
[visit reddit] [comments]

Categories
Misc

Generating negative outputs with only nonnegative weights and biases.

I am trying to have a layer with only positive weights generate a partially negative output. What would the least hacky way to achieve this be? My initial idea was using a modified activation function that is shifted by 1 along the x axis, however, this feels a bit hacky to me and I was wondering if there was a better way to achieve this

submitted by /u/salted_kinase
[visit reddit] [comments]

Categories
Misc

Storing a large Tensor in multiple GPUs

Hi Guys,

Is there a way to store a tensor in multiple GPUs? I have a tensor that is so large that requires >16Gb of GPU RAM to store. I was wonder how that can be achieved. Thanks!

submitted by /u/dr_meme_69
[visit reddit] [comments]

Categories
Misc

Correct way to get output of hidden layers after replacing some hidden layers?

Hello,

I am working on a project to replace layers by new layers to see if the changes affected positively or negatively. I want to then get the output feature map and input feature map after replacement. The issue I am having is that after a couple of changes, I get that I have multiple connections and a new column called ‘connnected to’ appears. Here are the summaries and the code I am using for replacing layers. I sometimes get this warning after replacing a convolutional layer with the code provided.

pastebin to warning and model summary

I have tried to create an Input layer and then use the same functional approach. My first layer being the input layer and the second the conv2d_0 layer. However, I get ValueError Disconnected from Graph for the input layer after two layer changes.

Code:

inputs = self.model.layers[0].input x = self.model.layers[0](inputs) for layer in self.model.layers[1:]: if layer.name == layer_name: new_layer = #creation of custom layer that generates output of same shape as replaced layer. x = new_layer(x) else: layer.trainable = False x = layer(x) self.model = tf.keras.Model(inputs, x) 

submitted by /u/ElvishChampion
[visit reddit] [comments]

Categories
Misc

Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans

Four words: smart, sustainable, Super Bowl. Polestar’s commercial during the big game made it clear no-compromise electric vehicles are now mainstream. Polestar Chief Operating Officer Dennis Nobelius sees driving enjoyment and autonomous-driving capabilities complementing one another in sustainable vehicles that keep driving — and the driver — front and center. NVIDIA’s Katie Washabaugh spoke with Read article >

The post Polestar’s Dennis Nobelius on the Sustainable Performance Brand’s Plans appeared first on NVIDIA Blog.

Categories
Misc

TTS mobile help

I’m trying to implement fastspeech_quat.tflite into a flutter app. I’m using tflite_flutter package. I’ve loaded up the model like this Interpreter _interpreter = await Interpreter.fromAsset(‘fastspeech_quant.tflite’);

Next I wanted to run an inference on some text so I would use _interpreter.runForMultipleInputs(input, output)

I just don’t understand how to format the input and output for the model. So I ran _interpreter.getInputTensors() and I get

[ Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16ef80, name: input_ids, type: TfLiteType.int32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16eff0, name: speaker_ids, type: TfLiteType.int32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f060, name: speed_ratios, type: TfLiteType.float32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f0d0, name: f0_ratios, type: TfLiteType.float32, shape: [1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6b16f140, name: energy_ratios, type: TfLiteType.float32, shape: [1], data: 4 ]

_interpreter.getOutputTensors() give me

[Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae108e0, name: Identity, type: TfLiteType.float32, shape: [1, 1, 80], data: 320, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae118a0, name: Identity_1, type: TfLiteType.float32, shape: [1, 1, 80], data: 320, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae05820, name: Identity_2, type: TfLiteType.int32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae04e10, name: Identity_3, type: TfLiteType.float32, shape: [1, 1], data: 4, Tensor{_tensor: Pointer<TfLiteTensor>: address=0x6f6ae03750, name: Identity_4, type: TfLiteType.float32, shape: [1, 1], data: 4]

I need an example of how I would go about it. I’ve combed through examples but it’s just not clicking for me.

submitted by /u/kai_zen_kid
[visit reddit] [comments]

Categories
Misc

Build Mainstream Servers for AI Training and 5G with the NVIDIA H100 CNX

Learn about the H100 CNX, an innovative new hardware accelerator for GPU-accelerated I/O intensive workloads.

There is an ongoing demand for servers with the ability to transfer data from the network to a GPU at ever faster speeds. As AI models keep getting bigger, the sheer volume of data needed for training requires techniques such as multinode training to achieve results in a reasonable timeframe. Signal processing for 5G is more sophisticated than previous generations, and GPUs can help increase the speed at which this happens. Devices such as robots or sensors, are also starting to use 5G to communicate with edge servers for AI-based decisions and actions.

Purpose-built AI systems, such as the recently announced NVIDIA DGX H100, are specifically designed from the ground up to support these requirements for data center use cases. Now, another new product can help enterprises also looking to gain faster data transfer and increased edge device performance, but without the need for high-end or custom-built systems.

Announced by NVIDIA CEO Jensen Huang at NVIDIA GTC last week the NVIDIA H100 CNX is a high-performance package for enterprises. It combines the power of the NVIDIA H100 with the advanced networking capabilities of the NVIDIA ConnectX-7 SmartNIC. Available in a PCIe board, this advanced architecture delivers unprecedented performance for GPU-powered and I/O intensive workloads for mainstream data center and edge systems.

Design benefits of the H100 CNX 

In standard PCIe devices, the control plane and data plane share the same physical connection. However, in the H100 CNX, the GPU and the network adapter connect through a direct PCIe Gen5 channel. This provides a dedicated high-speed path for data transfer between the GPU and the network using GPUDirect RDMA and eliminates bottlenecks of data going through the host. 

The diagram shows the H100 CNX consisting of a GPU and a SmartNIC, connected via a PCIe Gen5 switch. There are two connections to the network from the H100 CNX, indicating either two 200 gigabit per second or one 400 gigabit per second links. A CPU is connected to the H100 CNX via a connection to the same PCIe switch. A PCIe NVMe drive is connected directly to the CPU. The data plane path is shown going from the network, to the SmartNIC, through the PCIe switch, to the GPU. A combined data and control pane path goes from the CPU to the PCIe switch, and also from the CPU to the NVMe drive.
Figure 1. High-level architecture of H100 CNX.

With the GPU and SmartNIC combined on a single board, customers can leverage servers at PCIe Gen4 or even Gen3. Achieving a level of performance once only possible with high-end or purpose-built systems saves on hardware costs. Having these components on one physical board also improves space and energy efficiency.

Integrating a GPU and a SmartNIC into a single device creates a balanced architecture by design. In systems with multiple GPUs and NICs, a converged accelerator card enforces a 1:1 ratio of GPU to NIC. This avoids contention on the server’s PCIe bus, so the performance scales linearly with additional devices. 

Core acceleration software libraries from NVIDIA such as NCCL and UCX automatically make use of the best-performing path for data transfer to GPUs. Existing accelerated multinode applications can take advantage of the H100 CNX without any modification, so customers immediately can benefit from the high performance and scalability.  

H100 CNX use cases

The H100 CNX delivers GPU acceleration along with low-latency and high-speed networking. This is done at lower power, with a smaller footprint and higher performance than two discrete cards. Many use cases can benefit from this combination, but the following are particularly notable. 

5G signal processing

5G signal processing with GPUs requires data to move from the network to the GPU as quickly as possible, and having predictable latency is critical too. NVIDIA converged accelerators combined with the NVIDIA Aerial SDK provide the highest-performing platform for running 5G applications. Because data doesn’t go through the host PCIe system, processing latency is greatly reduced. This increased performance is even seen when using commodity servers with slower PCIe systems.

Accelerating edge AI over 5G

NVIDIA AI-on-5G is made up of the NVIDIA EGX enterprise platform, the NVIDIA Aerial SDK for software-defined 5G virtual radio area networks, and enterprise AI frameworks. This includes SDKs, such as NVIDIA Isaac and NVIDIA Metropolis. Edge devices such as video cameras, industrial sensors, and robots can use AI and communicate with the server over 5G. 

The H100 CNX makes it possible to provide this functionality in a single enterprise server, without deploying costly purpose-built systems. The same accelerator applied to 5G signal processing can be used for edge AI with the NVIDIA Multi-Instance GPU technology. This makes it possible to share a GPU for several different purposes.  

Multinode AI training

Multinode training involves data transfer between GPUs on different hosts. In a typical data center network, servers often run into various limits around performance, scale, and density.  Most enterprise servers don’t include a PCIe switch, so the CPU becomes a bottleneck for this traffic. Data transfer is bound by the speed of the host PCIe backplane. Although a 1:1 ratio of GPU:NIC is ideal, the number of PCIe lanes and slots in the server can limit the total number of devices.

The design of H100 CNX alleviates these problems. There is a dedicated path from the network to the GPU for GPUDirect RDMA to operate at near line speeds. ​The data transfer also occurs at PCIe Gen5 speeds regardless of host PCIe backplane. Scaling up of GPU power within a host can be done in a balanced manner, since the 1:1 ratio of GPU:NIC is inherently achieved.  A server can also be equipped with more acceleration power, since fewer PCIe lanes and device slots are required for converged accelerators than discrete cards.

The NVIDIA H100 CNX is expected to be available for purchase in the second half of this year.  If you have a use case that could benefit from this unique and innovative product, contact your favorite system vendor and ask when they plan to offer it with their servers. 

Learn more about the NVIDIA H100 CNX.

Categories
Misc

Add bounds to output from network

I basically have a simple CNN that outputs a single integer value at the end. This number is corresponding to a certain angle, so i know that the bounds have to be between 0 and 359. My intuition tells me that if i were to somehow limit the final value to this range instead of being unbounded like in most activation functions, i would reach any form of convergence sooner.

To try this, I changed the last layer and applied the sigmoid activation function, then added an additional lambda layer where i just multiplied the value by 359. However, this model still has a very high loss (using MSE, at times it’s actually greater than 3592 which is leading me to believe i’m not actually bounding the output between 0 and 359).

Is it a good idea to bound my output like this, and what would be the best way to implement it?

submitted by /u/NSCShadow
[visit reddit] [comments]

Categories
Misc

(Image Classification )High training accuracy and low validation accuracy

(Image Classification )High training accuracy and low validation accuracy

I have 15 classes, each one has around 90 training images and 7 validation images. Am I doing something wrong or are my images just really bad? It’s supposed to identify between 15 different fish species, and some of them do look pretty similar. Any help is appreciated

https://preview.redd.it/8pujy5nglfq81.png?width=616&format=png&auto=webp&s=d312eb69116499c6f3ab48891dcc937cabcedbda

https://preview.redd.it/56xx5pgllfq81.png?width=920&format=png&auto=webp&s=49be6d38b34df73be0de196aab9b477a44eaabfd

https://preview.redd.it/f9eejttnlfq81.png?width=1114&format=png&auto=webp&s=fc80852cf39a62bfa22417cdd3c95188898b3a1c

submitted by /u/IraqiGawad
[visit reddit] [comments]

Categories
Misc

Error: No gradients provided for any variable:

Code:

current = model(np.append(np.ndarray.flatten(old_field),next_number).reshape(1,-1))
next = target(np.append(np.ndarray.flatten(game.field),action).reshape(1,-1))
nextState = game.field
target_q_values = (next * 0.9) + reward # bu yong max? shi lilunshangdeb bushi quzuida
# loss = tf.convert_to_tensor((target_q_values – current)**2)
loss = mse(target_q_values, current)
train_step(loss)

Train_step:

@tf.function
def train_step(loss):
with tf.GradientTape(persistent=True) as tape:
gradient = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradient,model.trainable_variables))

submitted by /u/Striking-Warning9533
[visit reddit] [comments]