I’m using tensorflow.js and I need a layer that can take in an image and output the image resized at a new resolution (bilinear filtering is fine.) I can’t find one in the tf.js API so I’m not sure what I can use. I need to make sure the model can still be serialized to disk, so I think writing a custom layer class might be off the table.
I did this in Linux Mint, a Ubuntu variant, and I did do pip install tensorflow and all of that, granted when I try pip3 install tensorflow, I get a wall of red text.
Relevant code is:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
#Defining the model using default linear activation function
model=keras.Sequential()
model.add(layers.Dense(14))
model.add(layers.Dense(4))
model.add(layers.Dense(1))
2022-02-27 11:34:48.204164: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2022-02-27 11:34:48.204436: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library ‘libcuda.so.1’; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory 2022-02-27 11:34:48.204446: W tensorflow/stream_executor/cuda/cuda_driver.cc:326] failed call to cuInit: UNKNOWN ERROR (303) 2022-02-27 11:34:48.204464: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (term-IdeaPad-Flex): /proc/driver/nvidia/version does not exist 2022-02-27 11:34:48.204624: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-02-27 11:34:48.204886: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
I have an all AMD laptop, but I don’t see how you need NVIDIA when I see people using it in virtual machines. If you know of a way for me to fix this, or do something where I can upload to a different site, and have that work, let me know.
I am in the process of learning TensorFlow and I am wondering if TF is a workable solution for what I am trying to achieve. My side project is chess website where users can come submit their chess ratings, and then the website uses their data to compare ratings between different chess websites and orgs. My data set currently has around 7500 rows and looks like this:
My backend is a Python API that is hosted on Heroku. What I would like to achieve is that once a player enters in their ratings, every rating they leave empty, I use machine learning/TensorFlow to predict each null value for that player? Is that doable with TF in backend hosted on Heroku?
Also, if anyone has any tips to lead me in the right direction, they are most welcome. I should also note that I suspect this might not be the most appropriate use of TF, or that TF might not be the best solution, but I am using this side project go grow and demonstrate skills I take interest in .
I found a very similar issue to the one I am having on stack overflow here, but I’m posting here because no one figured it out. That post pretty much explains the issue I am having, but I am using an ImageDataGenerator instead. I cant seem to be able to get it to work within the train_step of a custom keras model. Any help is appreciated
I wrote a solution to the problem first with numpy, then numba optimized. When it comes to tensors, I have no idea how to even start the optimization. Are there some easy to follow tutorials out there? How do I even tell tensorflow the rules and goal of the optimization?
Hey (obligatory i am pretty noob). I’m training a neural network with Tensorflow (deuh) and during training the network takes about 300 micro seconds per sample. This is with dropout and layer normalization and back propagation of course and with 4 threads. During predictions however I can only predict one sample at a time (due to external needs) I would expect this to take about 1 ms or even less but when actually timing it I get more like 12ms. Are there any ways I can speed up this behavior how much can a less complex neural network bring me?
NVIDIA delivers industry-leading SDN performance benchmark results
The NVIDIA BlueField-2 data processing unit (DPU) delivers unmatched software-defined networking (SDN) performance, programmability, and scalability. It integrates eight Arm CPU cores, the secure and advanced ConnectX-6 Dx cloud network interface, and hardware accelerators that together offload, accelerate, and isolate SDN functions, performing connection tracking, flow matching, and advanced packet processing.
This post outlines the basic tenets of an accurate SDN performance benchmark and demonstrates the actual results achievable on the NVIDIA ConnectX-6 Dx with accelerators enabled. The BlueField-2 and next-generation BlueField-3 DPUs include additional acceleration capabilities and offer higher performance for a broader range of use cases.
SDN performance benchmark best practices
Any SDN performance evaluation of the BlueField DPUs or ConnectX SmartNICs should leverage the full power of the hardware accelerators. BlueField-2’s packet processing actions are programmable through the NVIDIA ASAP2 (accelerated switching and packet processing) engine. The SDN accelerators featured on both the BlueField DPUs and ConnectX SmartNICs rely on ASAP2 and other programmable hardware accelerators to achieve line-rate networking performance.
NVIDIA ASAP2 support has been integrated into the upstream Linux Kernel and the Data Plane Development Kit (DPDK) framework and is readily available in a range of Linux OS distributions and cloud management platforms.
Connection tracking acceleration is available starting with Linux Kernel 5.6. The best practice is to use a modern enterprise Linux OS, for example, Ubuntu 20.04, Red Hat Enterprise Linux 8.4, and so on. These newer kernels include inbox support for SDN with connection tracking acceleration with ConnectX-6 Dx SmartNICs and BlueField-2 DPUs. Benchmarking SDN with connection tracking based on a Linux system with an outdated kernel would be misleading.
Finally, for any SDN benchmark to be effective, it must be representative of SDN pipelines implemented in real-world cloud data centers where hundreds of thousands of connections are the norm. Both ConnectX-6 Dx SmartNICs and BlueField-2 DPUs are designed for, and deployed in hyperscale environments, and deliver breakthrough network performance at cloud-scale.
Accelerated SDN performance
Look at the NVIDIA ConnectX-6 Dx performance. The following benchmarks show the throughput and latency of SDN pipeline performance with connection tracking hardware acceleration enabled. We ran tests using a system set up, testing tools, and procedures similar to other reported results. We ran Open VSwitch (OVS) DPDK to seamlessly enable connection tracking acceleration on the ConnectX-6 Dx SmartNIC.
The following charts describe the observed SDN performance by using the iperf3 tool for 4 and 16 iperf instances with one flow per instance.
Figure 1. Observed SDN performance with the iperf3 tool for 4 instances
Figure 2. Observed SDN performance with 16 iperf instances
Key findings:
ConnectX-6 Dx provides higher throughput, achieving up to 120% and 150% higher for 4 and 16 instances respectively, for all packet sizes tested.
ConnectX-6 Dx achieves >90% line rate for packets as small as 1 KB compared to 8-KB packets for the other offerings.
The following chart shows the observed performance for an SDN pipeline with 32 instances on the same system setup. The results show that ConnectX-6 Dx provides much better scaling as the number of flows increases and up to 4x higher throughput.
Figure 3. Observed SDN performance with 32 iperf instances
The following benchmark measures latency using sockperf. The results indicate that ConnectX-6 Dx provides ~20-30% lower latency compared to other offerings for all packet sizes that were tested.
Figure 4. Observed one-way latency for an SDN pipeline with connection tracking
Non-accelerated connection tracking implementations create bottlenecks on the host CPU. Offloading connection tracking to the on-chip accelerators means the performance achieved in these benchmarks is not strongly dependent on the host CPU or its ability to drive the test bench. These results are also indicative of the performance achievable on the BlueField-2 DPU, which integrates ConnectX-6 Dx.
BlueField-3 supports higher performance levels
NVIDIA welcomes the opportunity to test and showcase the performance of ConnectX-6 Dx and BlueField-2 while also adhering to industry best practices and operating standards. The data shown in this post compares the performance benchmark results for ConnectX-6 Dx to results reported elsewhere. The ConnectX-6 Dx provides up to 4X higher throughput and up to 30% lower latency compared to other offerings. These benchmark results demonstrate the NVIDIA leadership position in SDN acceleration technologies.
BlueField-3 is the next-generation NVIDIA DPU and integrates the advanced ConnectX-7 adapter and additional acceleration engines. Providing 400 Gb/s networking, more powerful Arm CPU cores, and a highly programmable Datapath Accelerator (DPA), BlueField-3 delivers even higher levels of performance and programmability to address the most demanding workloads in massive-scale data centers. Existing DPU-accelerated SDN applications built on BlueField-2 using DOCA will benefit from the performance enhancements that the BlueField-3 brings, without any code changes.
Learn more about modernizing your data center infrastructure with BlueField DPUs. Stay tuned for even higher SDN performance with BlueField-3 arriving in 2022.