Categories
Misc

Meet the Researcher: Lorenzo Baraldi, Artificial Intelligence for Vision, Language and Embodied AI

This month, we spotlight Lorenzo Baraldi, Assistant Professor at the University of Modena and Reggio Emilia in Italy.

‘Meet the Researcher’ is a monthly series in which we spotlight different researchers in academia who are using NVIDIA technologies to accelerate their work. This month, we spotlight Lorenzo Baraldi, Assistant Professor at the University of Modena and Reggio Emilia in Italy.

Before working as a professor, Baraldi was a research intern at Facebook AI Research. He serves as an Associate Editor of the Pattern Recognition Letters journal and works at the integration of Vision, Language, and Embodied AI.

What are your research areas of focus?

I work within the AimageLab research group on Computer Vision and Deep Learning. I focus mainly on the integration of vision, language, and action. The final goal of our research is to develop agents that can perceive and act in our world while being capable of communicating with humans.

What motivated you to pursue this research area of focus?

Combining the ability to perceive the visual world around us, with that of acting and that of expressing in natural language is something that humans do quite naturally and is one of the keys to human intelligence. In the last few years, we have witnessed tremendous achievements in areas that consider only one of those abilities: Computer Vision, Natural Language Processing, and Robotics. How to combine these abilities, instead, still needs to be understood and is a thrilling field of research.

Tell us about your current research projects.

We are mainly working in three directions: 1 – we integrate vision and language, for example by developing algorithms that can describe images in natural language. A recent paper of this work was presented at CVPR Transformer-based model for image captioning; 2 – we integrate vision and action, by developing agents for autonomous navigation. We are interested in agents moving in indoor and outdoor scenarios, and possibly interacting with people, also in crowded situations; 3 – we integrate all of this with the ability to understand language, for instance by training agents that can move following an instruction or curiosity-driven agents that can describe what they see along their path.

Overview of Baraldi’s image captioning approach. Building on a Transformer-like encoder-decoder architecture, the approach includes a memory-aware region encoder that augments self-attention with memory vectors.

What problems or challenges does your research address?

I think one of the main challenges we need to solve is to find the right way of integrating multi-modal information, which can come from either visual, textual, or motorial perception. In other words, we need to find the right architecture for dealing with this information, that is why a lot of our research involves the design of new architectures. Secondly, most of the approaches we design are generative and sequential: we generate sentences, we generate actions or paths for robots, and so on. Again, how to generate sequences conditioned on multi-modal information is still a challenge.

Sentences generated on the ACVR Robotic Vision Challenge dataset.

What is the (expected) impact of your work on the field/community/world?

If the research efforts that the community is devoting to this area will be successful, we will have algorithms that can understand us and help us in our daily lives, seeing with us and acting in the world to help us. I think in the long run this might also change the way we interact with computers, which might become a lot easier and language-based.

How have you used NVIDIA technology either in your current or previous research?

Performing large-scale training on NVIDIA GPUs is one of the most important ingredients which power our research, and I am sure this will become even more important in the next future. We do that locally, with a distributed GPU cluster in our lab, and we do that at a bigger scale in conjunction with CINECA, the Italian supercomputing center, and with the NVIDIA AI Technical Centre (NVAITC) of Modena. The partnership we have with NVAITC and CINECA has not only increased our computational capacity, but has also provided us with the knowledge and support we needed to exploit the technologies NVIDIA provides, at their maximum. I would say this collaboration is really having an important impact on our research capabilities.

Did you achieve any breakthroughs in that research or any interesting results using NVIDIA technology?

Most, if not all, of the research works we carry out, are somehow powered by NVIDIA technologies. Apart from the results on the integration of vision, language, and action, we also have a few other research lines of which I am particularly proud. One is related to video understanding: detecting people and objects, understanding their relationships, and finding the best way of extracting Spatio-temporal features is an important challenge. Sometimes we also like to apply our research to the cultural heritage: using NVIDIA GPUs we have developed algorithms for retrieving paintings in natural language, and generative networks for translating artworks to reality

What is next for your research?

Even though things are evolving rapidly in our area, there are still a lot of key issues that need to be addressed, and that is what our lab concentrating on. One is that going beyond the limitations of traditional supervised learning and fighting dataset bias: in the end, we would like our algorithms to describe and understand any connection between images and text, not just those that are annotated in current datasets. To this end, are working towards algorithms that can describe objects which are not present in the training dataset, and we constantly explore the new possibilities given by self-supervised and weakly-supervised learning. How to properly manage the temporal dimension is also another key issue that has been central in our research, and which has brought advancements in terms of new architectural design, not only for managing sequences of words, but also for understanding video streams.

Any advice for new researchers?

There are at least three capabilities I would recommend pursuing. One is to learn to code well and elegantly because translating ideas to reality is always going to involve implementation. The second is to learn to have good ideas: that is potentially the trickiest part, but it is even more important because every valuable research needs to start from a good idea. I think reading papers, especially from the past, and think openly, freely and on a large-scale is of great help in this sense. The third is time management: always focus on what is impactful.

Baraldi’s colleague, Matteo Tomei, will be presenting their lab’s recent work at NVIDIA GTC in April, “More Efficient and Accurate Video Networks: A New Approach to Maximize the Accuracy/Computation Trade-off”.

Categories
Misc

I saw some tesla k80 graphics acceleration cards they have no display port there for helping workloads are these any good for tensorflow AI building

Price:140$ Specs

Cuda cores: 4,992

Core speed: 562-875 per card

RAM 24GB

RAM speed: 480GB/s

This is a 2 pci slot card basically 2 cards in 1 No cooling included (I got a plan for that)

Display card will be my old gtx 950

submitted by /u/isaiahii10
[visit reddit] [comments]

Categories
Misc

I’ve been working on an real time object detection project and I’ve been face with an error while trying to capture image to label and train.. please help

I've been working on an real time object detection project and I've been face with an error while trying to capture image to label and train.. please help
submitted by /u/Field_Great
[visit reddit] [comments]
Categories
Misc

I’ve been working on an real time object detection project and I’ve been face with an error while trying to capture image to label and train.. please help

I've been working on an real time object detection project and I've been face with an error while trying to capture image to label and train.. please help
submitted by /u/Field_Great
[visit reddit] [comments]
Categories
Misc

NVIDIA Clara Parabricks Pipelines v3.5 Accelerates Google’s DeepVariant v1.0

NVIDIA released NVIDIA Clara Parabricks Pipelines version 3.5, adding a set of new features to the software suite that accelerates end-to-end genome sequencing analysis.

NVIDIA recently released NVIDIA Clara Parabricks Pipelines version 3.5, adding a set of new features to the software suite that accelerates end-to-end genome sequencing analysis.

With the release of v3.5, Clara Parabricks Pipelines now provides acceleration to Google’s DeepVariant 1.0, in addition to a suite of existing DNA and RNA tools. The addition of DeepVariant to Parabricks Pipelines brings highly-accurate variant calling for both short- and long-read sequencing data to the community. 

This new release also enables graphical reports of QC metrics from binary alignment map (BAM) files to variant call files (VCF). Researchers can use these graphical reports to better assess the quality of their sequencing data and the subsequent variant calling before moving the results for additional downstream analysis. 

Parabricks Pipelines is packaged with enterprise support for A100 and other NVIDIA GPUs, offering one of the industry’s fastest compute frameworks for whole genome and whole exome applications. For a whole genome at 30x coverage, a server with 32 virtual CPUs takes about 1,200 minutes to generate a variant call file (VCF), while a server with eight A100 Tensor Core GPUs running Clara Parabricks takes less than 25 minutes to go from FASTQ to VCF.

Start a free one month trial of NVIDIA Clara Parabricks Pipelines today and learn how to get set up in just 10 minutes with this step-by-step instructional video.

Categories
Misc

Webinar: Create Gesture-Based Interactions with a Robot

Learn how to train your own gesture recognition deep learning pipeline. We’ll start with a pre-trained detection model, repurpose it for hand detection, and use it together with the purpose-built gesture recognition model.

In this webinar, you will learn how to train your own gesture recognition deep learning pipeline. We’ll start with a pre-trained detection model, repurpose it for hand detection, and use it together with the purpose-built gesture recognition model.

NVIDIA pre-trained deep learning models and the Transfer Learning Toolkit (TLT) give you a rapid path to building your next AI project. Whether you’re a DIY enthusiast or building a next-gen product with AI, you can use these models out of the box or fine-tune with your own dataset. The purpose-built, pre-trained models are trained on the large datasets collected and curated by NVIDIA and can be applied to a wide range of use cases. TLT is a simple AI toolkit, shipped with Jupyter notebooks, that requires little to no coding for taking pre-trained models and customizing them with your own data.

Date: March 3, 2021
Time: 11:00am – 12:00pm PT
Duration: 1 hour

Join this webinar to explore:

  • Highly optimized pre-trained models for various industry use cases
  • How to fine-tune with your own data on new pre-trained models and use them to reduce your total development time
  • Developing an end-to-end training pipeline and deploying the trained model on NVIDIA SDKs

Join us after the presentation for a live Q&A session.

Register now >

Categories
Misc

The following shows up in the command prompt

Im trying to create a chatbot using neuralnines tutorial but I ran into a problem

C:UserschakkDesktopchatbot>python main.py 2021-02-21 16:14:30.544425: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2021-02-21 16:14:30.544542: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-02-21 16:14:31.738286: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-21 16:14:31.738724: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2021-02-21 16:14:31.757352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5 coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s 2021-02-21 16:14:31.757788: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2021-02-21 16:14:31.758162: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublas64_11.dll'; dlerror: cublas64_11.dll not found 2021-02-21 16:14:31.759016: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cublasLt64_11.dll'; dlerror: cublasLt64_11.dll not found 2021-02-21 16:14:31.759344: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cufft64_10.dll'; dlerror: cufft64_10.dll not found 2021-02-21 16:14:31.759848: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'curand64_10.dll'; dlerror: curand64_10.dll not found 2021-02-21 16:14:31.760201: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found 2021-02-21 16:14:31.760504: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusparse64_11.dll'; dlerror: cusparse64_11.dll not found 2021-02-21 16:14:31.760798: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found 2021-02-21 16:14:31.760831: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2021-02-21 16:14:31.761295: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-02-21 16:14:31.761890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix: 2021-02-21 16:14:31.761963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 2021-02-21 16:14:31.762260: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set C:UserschakkDesktopchatbot>python main.py 2021-02-21 16:21:50.579668: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudart64_110.dll'; dlerror: cudart64_110.dll not found 2021-02-21 16:21:50.579795: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2021-02-21 16:21:51.786031: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set 2021-02-21 16:21:51.786480: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll 2021-02-21 16:21:51.797963: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 

submitted by /u/AviationAddiction21
[visit reddit] [comments]

Categories
Misc

[Help] How to optimize posenet or handpose javascript?

I’m working on an experiment where users can interact with a 3D object using their gestures or hands. Posenet/Handpose is a great library, but the performance is not up to par just yet, without any 3d object the frame rate hovers around 10-12FPS which is not enough if you want to build an interactive installation.

Is there a way to optimize this, especially on macOS?

I’ve tried the following;

  • Using web worker (didn’t help much)
  • Using WebSocket and run TensorFlow on the server (Didn’t help much, because I can’t run the GPU backend)

What I haven’t tried.

  • Run a TPU server, a bit excessive and perhaps costly? Or is there an alternative for this?
  • Run it on an Nvidia platform (Might need to rent)

submitted by /u/buangakun3
[visit reddit] [comments]

Categories
Misc

A package to sizeably boost your performance

A package to sizeably boost your performance

I am glad to present the TensorFlow implementation of “Gradient Centralization” a new optimization technique to sizeably boost your performance 🚀, available as a ready-to-use Python package!

Project Repo: https://github.com/Rishit-dagli/Gradient-Centralization-TensorFlow

Please consider giving it a ⭐ if you like it😎. Here is an example showing the impact of the package!

https://preview.redd.it/69woozdxjui61.png?width=1280&format=png&auto=webp&s=0f3acbaf28a0dbc05455e1633eee9a82a95dae17

submitted by /u/Rishit-dagli
[visit reddit] [comments]

Categories
Misc

Why loss values don’t make sense for Dice, Focal, IOU for boundary detection Unet in Keras?

I am using Keras for boundary/contour detection using a Unet. When I use binary cross-entropy as the loss, the losses decrease over time as expected the predicted boundaries look reasonable

However, I have tried custom losses for Dice, Focal, IOU, with varying LRs, and none of them are working well. I either get NaNs or non-decreasing/barely-decreasing values for the losses. This is regardless of what I use for the LR, whether it be .01 to 1e-6, or whether I vary the ALPHA and GAMMA and other parameters. This doesn’t make sense since for my images, most of the pixels are the background, and the pixels corresponding to boundaries are the minority. For imbalanced datasets, IOU, Dice, and Focal should work better than binary Cross-Entropy

The code I used for the losses are from https://www.kaggle.com/bigironsphere/loss-function-library-keras-pytorch#Jaccard/Intersection-over-Union-(IoU)-Loss

def DiceLoss(targets, inputs, smooth=1e-6): #flatten label and prediction tensors inputs = K.flatten(inputs) targets = K.flatten(targets) intersection = K.sum(K.dot(targets, inputs)) dice = (2*intersection + smooth) / (K.sum(targets) + K.sum(inputs) + smooth) return 1 - dice ALPHA = 0.8 GAMMA = 2 def FocalLoss(targets, inputs, alpha=ALPHA, gamma=GAMMA): inputs = K.flatten(inputs) targets = K.flatten(targets) BCE = K.binary_crossentropy(targets, inputs) BCE_EXP = K.exp(-BCE) focal_loss = K.mean(alpha * K.pow((1-BCE_EXP), gamma) * BCE) return focal_loss def IoULoss(targets, inputs, smooth=1e-6): #flatten label and prediction tensors inputs = K.flatten(inputs) targets = K.flatten(targets) intersection = K.sum(K.dot(targets, inputs)) total = K.sum(targets) + K.sum(inputs) union = total - intersection IoU = (intersection + smooth) / (union + smooth) return 1 - IoU 

Even if I try different code for the losses, such as the code below

smooth = 1. def dice_coef(y_true, y_pred): y_true_f = K.flatten(y_true) y_pred_f = K.flatten(y_pred) intersection = K.sum(y_true_f * y_pred_f) return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth) def dice_coef_loss(y_true, y_pred): return -dice_coef(y_true, y_pred) 

the loss values still don’t improve. That is, it will show something like

loss: nan - dice_coef_loss: .9607 - val_loss: nan - val_dice_coef_loss: .9631 

and the values won’t change much for each epoch

can anyone help?

submitted by /u/74throwaway
[visit reddit] [comments]