Categories
Offsites

RLDS: An Ecosystem to Generate, Share, and Use Datasets in Reinforcement Learning

Most reinforcement learning (RL) and sequential decision making algorithms require an agent to generate training data through large amounts of interactions with their environment to achieve optimal performance. This is highly inefficient, especially when generating those interactions is difficult, such as collecting data with a real robot or by interacting with a human expert. This issue can be mitigated by reusing external sources of knowledge, for example, the RL Unplugged Atari dataset, which includes data of a synthetic agent playing Atari games.

However, there are very few of these datasets and a variety of tasks and ways of generating data in sequential decision making (e.g., expert data or noisy demonstrations, human or synthetic interactions, etc.), making it unrealistic and not even desirable for the whole community to work on a small number of representative datasets because these will never be representative enough. Moreover, some of these datasets are released in a form that only works with certain algorithms, which prevents researchers from reusing this data. For example, rather than including the sequence of interactions with the environment, some datasets provide a set of random interactions, making it impossible to reconstruct the temporal relation between them, while others are released in slightly different formats, which can introduce subtle bugs that are very difficult to identify.

In this context, we introduce Reinforcement Learning Datasets (RLDS), and release a suite of tools for recording, replaying, manipulating, annotating and sharing data for sequential decision making, including offline RL, learning from demonstrations, or imitation learning. RLDS makes it easy to share datasets without any loss of information (e.g., keeping the sequence of interactions instead of randomizing them) and to be agnostic to the underlying original format, enabling users to quickly test new algorithms on a wider range of tasks. Additionally, RLDS provides tools for collecting data generated by either synthetic agents (EnvLogger) or humans (RLDS Creator), as well as for inspecting and manipulating the collected data. Ultimately, integration with TensorFlow Datasets (TFDS) facilitates the sharing of RL datasets with the research community.

With RLDS, users can record interactions between an agent and an environment in a lossless and standard format. Then, they can use and transform this data to feed different RL or Sequential Decision Making algorithms, or to perform data analysis.

Dataset Structure
Algorithms in RL, offline RL, or imitation learning may consume data in very different formats, and, if the format of the dataset is unclear, it’s easy to introduce bugs caused by misinterpretations of the underlying data. RLDS makes the data format explicit by defining the contents and the meaning of each of the fields of the dataset, and provides tools to re-align and transform this data to fit the format required by any algorithm implementation. In order to define the data format, RLDS takes advantage of the inherently standard structure of RL datasets — i.e., sequences (episodes) of interactions (steps) between agents and environments, where agents can be, for example, rule-based/automation controllers, formal planners, humans, animals, or a combination of these. Each of these steps contains the current observation, the action applied to the current observation, the reward obtained as a result of applying action, and the discount obtained together with reward. Steps also include additional information to indicate whether the step is the first or last of the episode, or if the observation corresponds to a terminal state. Each step and episode may also contain custom metadata that can be used to store environment-related or model-related data.

Producing the Data
Researchers produce datasets by recording the interactions with an environment made by any kind of agent. To maintain its usefulness, raw data is ideally stored in a lossless format by recording all the information that is produced, keeping the temporal relation between the data items (e.g., ordering of steps and episodes), and without making any assumption on how the dataset is going to be used in the future. For this, we release EnvLogger, a software library to log agent-environment interactions in an open format.

EnvLogger is an environment wrapper that records agent–environment interactions and saves them in long-term storage. Although EnvLogger is seamlessly integrated in the RLDS ecosystem, we designed it to be usable as a stand-alone library for greater modularity.

As in most machine learning settings, collecting human data for RL is a time consuming and labor intensive process. The common approach to address this is to use crowd-sourcing, which requires user-friendly access to environments that may be difficult to scale to large numbers of participants. Within the RLDS ecosystem, we release a web-based tool called RLDS Creator, which provides a universal interface to any human-controllable environment through a browser. Users can interact with the environments, e.g., play the Atari games online, and the interactions are recorded and stored such that they can be loaded back later using RLDS for analysis or to train agents.

Sharing the Data
Datasets are often onerous to produce, and sharing with the wider research community not only enables reproducibility of former experiments, but also accelerates research as it makes it easier to run and validate new algorithms on a range of scenarios. For that purpose, RLDS is integrated with TensorFlow Datasets (TFDS), an existing library for sharing datasets within the machine learning community. Once a dataset is part of TFDS, it is indexed in the global TFDS catalog, making it accessible to any researcher by using tfds.load(name_of_dataset), which loads the data either in Tensorflow or in Numpy formats.

TFDS is independent of the underlying format of the original dataset, so any existing dataset with RLDS-compatible format can be used with RLDS, even if it was not originally generated with EnvLogger or RLDS Creator. Also, with TFDS, users keep ownership and full control over their data and all datasets include a citation to credit the dataset authors.

Consuming the Data
Researchers can use the datasets in order to analyze, visualize or train a variety of machine learning algorithms, which, as noted above, may consume data in different formats than how it has been stored. For example, some algorithms, like R2D2 or R2D3, consume full episodes; others, like Behavioral Cloning or ValueDice, consume batches of randomized steps. To enable this, RLDS provides a library of transformations for RL scenarios. These transformations have been optimized, taking into account the nested structure of the RL datasets, and they include auto-batching to accelerate some of these operations. Using those optimized transformations, RLDS users have full flexibility to easily implement some high level functionalities, and the pipelines developed are reusable across RLDS datasets. Example transformations include statistics across the full dataset for selected step fields (or sub-fields) or flexible batching respecting episode boundaries. You can explore the existing transformations in this tutorial and see more complex real examples in this Colab.

Available Datasets
At the moment, the following datasets (compatible with RLDS) are in TFDS:

Our team is committed to quickly expanding this list in the near future and external contributions of new datasets to RLDS and TFDS are welcomed.

Conclusion
The RLDS ecosystem not only improves reproducibility of research in RL and sequential decision making problems, but also enables new research by making it easier to share and reuse data. We hope the capabilities offered by RLDS will initiate a trend of releasing structured RL datasets, holding all the information and covering a wider range of agents and tasks.

Acknowledgements
Besides the authors of this post, this work has been done by Google Research teams in Paris and Zurich in Collaboration with Deepmind. In particular by Sertan Girgin, Damien Vincent, Hanna Yakubovich, Daniel Kenji Toyama, Anita Gergely, Piotr Stanczyk, Raphaël Marinier, Jeremiah Harmsen, Olivier Pietquin and Nikola Momchev. We also want to thank the collaboration of other engineers and researchers who provided feedback and contributed to the project. In particular, George Tucker, Sergio Gomez, Jerry Li, Caglar Gulcehre, Pierre Ruyssen, Etienne Pot, Anton Raichuk, Gabriel Dulac-Arnold, Nino Vieillard, Matthieu Geist, Alexandra Faust, Eugene Brevdo, Tom Granger, Zhitao Gong, Toby Boyd and Tom Small.

Categories
Misc

Real or Not Real? Attorney Steven Frank Uses Deep Learning to Authenticate Art

Leonardo da Vinci’s portrait of Jesus, known as Salvator Mundi, was sold at a British auction for nearly half a billion dollars in 2017, making it the most expensive painting ever to change hands. However, even art history experts were skeptical about whether the work was an original of the master rather than one of Read article >

The post Real or Not Real? Attorney Steven Frank Uses Deep Learning to Authenticate Art appeared first on The Official NVIDIA Blog.

Categories
Misc

Cloud Service, OEMs Raise the Bar on AI Training with NVIDIA AI

Look who just set new speed records for training AI models fast: Dell Technologies, Inspur, Supermicro and — in its debut on the MLPerf benchmarks — Azure, all using NVIDIA AI. Our platform set records across all eight popular workloads in the MLPerf training 1.1 results announced today. NVIDIA A100 Tensor Core GPUs delivered the Read article >

The post Cloud Service, OEMs Raise the Bar on AI Training with NVIDIA AI appeared first on The Official NVIDIA Blog.

Categories
Misc

The Need for Speed: Edge AI with NVIDIA GPUs and SmartNICs, Part 2

The NVIDIA Network Operator includes an RDMA Shared Device Plug-In and OFED Driver.

The NVIDIA GPU Operator has NVIDIA GPU Monitoring, NVIDIA Container Runtime, NVIDIA Driver, and NVIDIA Kubernetes Device Plug-In.

When deployed together, they automatically enable the GPU Direct RDMA Driver.

The NVIDIA EGX Operators are part of the NVIDIA EGX stack which contain Kubernetes, Container engine and Linux Distribution. They run bare metal virtualization.

This is the second post in a two part series.

The first post described how to integrate the NVIDIA GPU and Network Operators using preinstalled drivers.

This post describes the following tasks:

  • Clean up preinstalled driver integrations
  • Installing the Network Operator with a custom driver container
  • Installing the GPU Operator with a custom driver container

NVIDIA Driver integration

The preinstalled driver integration method is suitable for edge deployments requiring signed drivers for secure and measured boot. Use the driver container method when the edge node has an immutable operating system. Driver containers are also appropriate when not all edge nodes have accelerators.

Clean up preinstalled driver integration

First, uninstall the previous configuration and reboot to clear the preinstalled drivers.

  1. Delete the test pods and network attachment.
$ kubectl delete pod roce-shared-pod
pod "roce-shared-pod" deleted

$ kubectl delete macvlannetwork  roce-shared-macvlan-network
macvlannetwork.mellanox.com "roce-shared-macvlan-network" deleted
  1. Uninstall the Network Operator Helm chart.
$ helm delete -n network-operator network-operator
release "network-operator" uninstalled

3. Uninstall MOFED to remove the preinstalled drivers and libraries.

$ rmmod nvidia_peermem

$ /etc/init.d/openibd stop
Unloading HCA driver:                                      [  OK  ]

$ cd ~/MLNX_OFED_LINUX-5.4-1.0.3.0-rhel7.9-x86_64

$ ./uninstall.sh 

4. Remove the GPU test pod.

$ kubectl delete pod cuda-vectoradd
pod "cuda-vectoradd" deleted

5. Uninstall the NVIDIA Linux driver.

$ ./NVIDIA-Linux-x86_64-470.57.02.run --uninstall

6. Remove GPU Operator.

$ helm uninstall gpu-operator-1634173044

7. Reboot.

$ sudo shutdown -r now

Install the Network Operator with a custom driver container

This section describes the steps for installing the Network Operator with a custom driver container.

The driver build script executed in the container image needs access to kernel development and packages for the target kernel. In this example the kernel development packages are provided through an Apache web server.

Once the container is built, upload it to a repository the Network Operator Helm chart can access from the host.

The GPU Operator will use the same web server to build the custom GPU Operator driver container in the next section.

  1. Install the Apache web server and start it.
$ sudo firewall-cmd --state
not running

$ sudo yum install createrepo yum-utils httpd -y

$ systemctl start httpd.service && systemctl enable httpd.service && systemctl status httpd.service
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-10-20 18:10:43 EDT; 4h 45min ago
...
  1. Create a mirror of the upstream CentOS 7 Base package repository. It could take ten minutes or more to download all the CentOS Base packages to the web server. Note that the custom package repository requires 500 GB free space on the /var partition.
$ cd /var/www/html
$ mkdir -p repos/centos/7/x86_64/os
$ reposync -p /var/www/html/repos/centos/7/x86_64/os/ --repo=base  --download-metadata -m

3. Copy the Linux kernel source files into the Base packages directory on the web server. This example assumes the custom kernel was compiled as an RPM using rpmbuild.

$ cd repos/centos/7/x86_64/os
$ sudo cp ~/rpmbuild/RPMS/x86_64/*.rpm .

The Network Operator requires  the following files:

  • kernel-headers-${KERNEL_VERSION}
  • kernel-devel-${KERNEL_VERSION}

Ensure the presence of these additional files for the GPU Operator:

  • gcc-${GCC_VERSION}
  • elfutils-libelf.x86_64
  • elfutils-libelf-devel.x86_64
$ for i in $(rpm -q kernel-headers kernel-devel elfutils-libelf elfutils-libelf-devel gcc | grep -v "not installed"); do ls $i*; done
kernel-headers-3.10.0-1160.42.2.el7.custom.x86_64.rpm
kernel-devel-3.10.0-1160.42.2.el7.custom.x86_64.rpm
elfutils-libelf-0.176-5.el7.x86_64.rpm
elfutils-libelf-devel-0.176-5.el7.x86_64.rpm
gcc-4.8.5-44.el7.x86_64.rpm

4. Browse to the web repository to make sure it is accessible via HTTP.

$ elinks http://localhost/repos/centos/7/x86_64/os --dump
                       Index of /repos/centos/7/x86_64/os

      [1][ICO]          [2]Name       [3]Last modified [4]Size [5]Description
   --------------------------------------------------------------------------
   [6][PARENTDIR] [7]Parent Directory                        -  
   [8][DIR]       [9]base/            2021-10-21 22:55       -  
   [10][DIR]      [11]extras/         2021-10-02 00:29       -  
   --------------------------------------------------------------------------

References

   Visible links
   2. http://localhost/repos/centos/7/x86_64/os/?C=N;O=D
   3. http://localhost/repos/centos/7/x86_64/os/?C=M;O=A
   4. http://localhost/repos/centos/7/x86_64/os/?C=S;O=A
   5. http://localhost/repos/centos/7/x86_64/os/?C=D;O=A
   7. http://localhost/repos/centos/7/x86_64/
   9. http://localhost/repos/centos/7/x86_64/os/base/
  11. http://localhost/repos/centos/7/x86_64/os/extras/

5. MOFED driver container images are built from source code in the mellanox/ofed-docker repository on Github.  Clone the ofed-docker repository.

$ git clone https://github.com/Mellanox/ofed-docker.git
$ cd ofed-docker/

6. Make a build directory for the custom driver container.

$ mkdir centos
$ cd centos/

7. Create a Dockerfile that installs the MOFED dependencies and source archive into a CentOS 7.9 base image. Specify the MOFED and CentOS versions.

$ sudo cat 

8. Modify the RHEL entrypoint.sh script included in the ofed-docker repository to install the custom kernel source packages from the web server. Specify the path to the  base/Packages directory on the web server in the _install_prerequsities() function.

In this example 10.150.168.20 is the web server IP address created earlier in the section.

$ cp ../rhel/entrypoint.sh .
$ cat entrypoint.sh
...
# Install the kernel modules header/builtin/order files and generate the kernel version string.
_install_prerequisites() {
 
    echo "Installing dependencies"
    yum -y --releasever=7 install createrepo elfutils-libelf-devel kernel-rpm-macros numactl-libs initscripts grubby linux-firmware libtool
 
    echo "Installing Linux kernel headers..."
    rpm -ivh http://10.150.168.20/repos/centos/7/x86_64/os/base/Packages/kernel-3.10.0-1160.45.1.el7.custom.x86_64.rpm
    rpm -ivh http://10.150.168.20/repos/centos/7/x86_64/os/base/Packages/kernel-devel-3.10.0-1160.45.1.el7.custom.x86_64.rpm
    rpm -ivh http://10.150.168.20/repos/centos/7/x86_64/os/base/Packages/kernel-headers-3.10.0-1160.45.1.el7.custom.x86_64.rpm
 
    # Prevent depmod from giving a WARNING about missing files 
    touch /lib/modules/${KVER}/modules.order
    touch /lib/modules/${KVER}/modules.builtin
 
    depmod ${KVER}
...

9. The OFED driver container mounts a directory from the host file system for sharing driver files. Create the directory.

$ mkdir -p /run/mellanox/drivers

10. Upload the new CentOS driver image to a registry. This example uses an NGC private registry. Login to the registry.

$ sudo yum install -y podman

$ sudo podman login nvcr.io
Username: $oauthtoken
Password: *****************************************
Login Succeeded!

11. Use Podman to build the driver container image and push it to the registry.

$ sudo podman build --no-cache --tag nvcr.io/nv-ngc5g/mofed-5.4-1.0.3.0:centos7-amd64 .

12. Tag the image and push it to the registry.

$ sudo podman images nvcr.io | grep mofed
nvcr.io/nv-ngc5g/mofed-5.4-1.0.3.0 centos7-amd64 d61e555bddda 2 minutes ago  1.13 GB

13. Override the values.yaml file included in the NVIDIA Network Operator Helm chart to install the custom driver image. Specify the image name, repository, and version for the custom driver container.

$ cat 

14. Install the NVIDIA Network Operator with the new values.yaml.

$ helm install -f ./roce_shared_values_driver.yaml -n network-operator --create-namespace --wait network-operator mellanox/network-operator

15. View the pods deployed by the Network Operator. The MOFED pod should be in status Running. This is the custom driver container. Note that it may take several minutes to compile the drivers before starting the pod.

$ kubectl -n nvidia-network-operator-resources get pods
NAME                      READY   STATUS    RESTARTS   AGE
cni-plugins-ds-zr9kf      1/1     Running   0          10m
kube-multus-ds-w57rz      1/1     Running   0          10m
mofed-centos7-ds-cbs74    1/1     Running   0          10m
rdma-shared-dp-ds-ch8m2   1/1     Running   0          2m27s
whereabouts-z947f         1/1     Running   0          10m

16. Verify that the MOFED drivers are loaded on the host.

$ lsmod | egrep '^ib|^mlx|^rdma'
rdma_ucm               27022  0 
rdma_cm                65212  1 rdma_ucm
ib_ipoib              124872  0 
ib_cm                  53085  2 rdma_cm,ib_ipoib
ib_umad                27744  0 
mlx5_ib               384793  0 
mlx5_core            1360822  1 mlx5_ib
ib_uverbs             132833  2 mlx5_ib,rdma_ucm
ib_core               357959  8 rdma_cm,ib_cm,iw_cm,mlx5_ib,ib_umad,ib_uverbs,rdma_ucm,ib_ipoib
mlx_compat             55063  11 rdma_cm,ib_cm,iw_cm,auxiliary,mlx5_ib,ib_core,ib_umad,ib_uverbs,mlx5_core,rdma_ucm,ib_ipoib
mlxfw                  22321  1 mlx5_core

17. The root filesystem of the driver container should be bind mounted to the /run/mellanox/drivers directory on the host.

$ ls /run/mellanox/drivers
anaconda-post.log  bin  boot  dev  etc  home  host  lib  lib64  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var

Install the GPU Operator with a custom driver container

This section describes the steps for installing the GPU Operator with a custom driver container.

Like the Network Operator, the driver build script executed by the GPU Operator container needs access to development packages for the target kernel.

This example uses the same web server that delivered development packages to the Network Operator in the previous section.

Once the container is built, upload it to a repository the GPU Operator Helm chart can access from the host. Like the Network Operator example, the GPU Operator also uses the private registry on NGC.

  1. Build a custom driver container.
$ cd ~
$ git clone https://gitlab.com/nvidia/container-images/driver.git
$ cd driver/centos7

2. Update the CentOS Dockerfile to use driver version 470.74. Comment out unused arguments.

$ grep ARG Dockerfile 
ARG BASE_URL=http://us.download.nvidia.com/XFree86/Linux-x86_64
#ARG BASE_URL=https://us.download.nvidia.com/tesla
ARG DRIVER_VERSION=470.74
ARG DRIVER_TYPE=passthrough
ARG VGPU_LICENSE_SERVER_TYPE=FNE
ARG PUBLIC_KEY=''
#ARG PUBLIC_KEY=empty
ARG PRIVATE_KEY

3. Build the GPU driver container image and push it to NGC.

$  sudo podman build --no-cache --tag nvcr.io/nv-ngc5g/driver:470.74-centos7 .

4. View the GPU driver container image.

$ podman images nvcr.io |  grep  470
nvcr.io/nv-ngc5g/driver                             470.74-centos7           630f0f8e77f5  2 minutes ago   1.28 GB

5. Verify that the following files are available in the custom repository created for the Network Operator installation:

  • elfutils-libelf.x86_64
  • elfutils-libelf-devel.x86_64
  • kernel-headers-${KERNEL_VERSION}
  • kernel-devel-${KERNEL_VERSION}
  • gcc-${GCC_VERSION}

These files are needed to compile the driver for the custom kernel image.

$ cd /var/www/html/repos/centos/7/x86_64/os/base/Packages/

$ for i in $(rpm -q kernel-headers kernel-devel elfutils-libelf elfutils-libelf-devel gcc | grep -v "not installed"); do ls $i*; done
kernel-headers-3.10.0-1160.45.1.el7.custom.x86_64.rpm
kernel-devel-3.10.0-1160.45.1.el7.custom.x86_64.rpm
elfutils-libelf-0.176-5.el7.x86_64.rpm
elfutils-libelf-devel-0.176-5.el7.x86_64.rpm
gcc-4.8.5-44.el7.x86_64.rpm

6. Unlike the Network Operator, the GPU Operator uses a custom Yum repository configuration file. Create a Yum repo file referencing the custom mirror repository.

$ cd /var/www/html/repos

$ cat 

7. The GPU Operator uses a Kubernetes ConfigMap to configure the custom repository. The ConfigMap must be available in the gpu-operator-resources namespace. Create the namespace and the ConfigMap.

$ kubectl create ns gpu-operator-resources

$ kubectl create configmap repo-config -n gpu-operator-resources --from-file=/var/www/html/repos/custom-repo.repo
configmap/repo-config created

$ kubectl describe cm -n gpu-operator-resources repo-config 
Name:         repo-config
Namespace:    gpu-operator-resources
Labels:       
Annotations:  

Data
====
custom-repo.repo:
----
[base]
name=CentOS Linux $releasever - Base
baseurl=http://10.150.168.20/repos/centos/$releasever/$basearch/os/base/
gpgcheck=0
enabled=1

8. Install the GPU Operator Helm chart. Specify the custom repository location, the custom driver version, and the custom driver image name and location.

$ helm install nvidia/gpu-operator --generate-name --set driver.repoConfig.configMapName=repo-config  --set driver.repoConfig.destinationDir=/etc/yum.repos.d --set driver.image=driver --set driver.repository=nvcr.io/nv-ngc5g --set-string driver.version="470.74" --set toolkit.version=1.7.1-centos7 --set operator.defaultRuntime=crio

9. View the deployed pods.

$ kubectl get pods -n gpu-operator-resources
NAME                                       READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-r6kq6                1/1     Running     0          3m33s
nvidia-container-toolkit-daemonset-62pbj   1/1     Running     0          3m33s
nvidia-cuda-validator-ljd5l                0/1     Completed   0          119s
nvidia-dcgm-9nsfx                          1/1     Running     0          3m33s
nvidia-dcgm-exporter-zm82v                 1/1     Running     0          3m33s
nvidia-device-plugin-daemonset-bp66r       1/1     Running     0          3m33s
nvidia-device-plugin-validator-8pbmv       0/1     Completed   0          108s
nvidia-driver-daemonset-4tx24              1/1     Running     0          3m33s
nvidia-mig-manager-kvcgc                   1/1     Running     0          3m32s
nvidia-operator-validator-g9xz5            1/1     Running     0          3m33s

10. Verify the driver is loaded.

$ lsmod |  grep  nvidia
nvidia_modeset       1195268  0 
nvidia_uvm            995356  0 
nvidia              35237551  114 nvidia_modeset,nvidia_uvm
drm                   456166  5 ast,ttm,drm_kms_helper,nvidia

11. Run nvidia-smi from the driver daemonset pod.

Defaulted container "nvidia-driver-ctr" out of: nvidia-driver-ctr, k8s-driver-manager (init)
Thu Oct 28 02:37:50 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.74       Driver Version: 470.74       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-PCI...  On   | 00000000:23:00.0 Off |                    0 |
| N/A   25C    P0    32W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-PCI...  On   | 00000000:E6:00.0 Off |                    0 |
| N/A   27C    P0    32W / 250W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The NVIDIA peer memory driver that enables GPUDirect RDMA is not built automatically.

Repeat this process to build a custom nvidia-peermem driver container.

This additional step is needed for any Linux operating system that the nvidia-peermem installer in GPU Operator does not yet support.

The future with NVIDIA Accelerators

NVIDIA accelerators help future-proof an edge AI investment against the exponential growth of sensor data. NVIDIA operators are cloud native software that streamline accelerator deployment and management on Kubernetes. The operators support popular Kubernetes platforms out of the box and can be customized to support alternative platforms. 

Recently, NVIDIA announced converged accelerators that combine DPU and GPU capability onto a single PCI device. The converged accelerators are ideal for edge AI applications with demanding compute and network performance requirements. The NVIDIA operators are being enhanced to facilitate converged accelerator deployment on Kubernetes.

Both the NVIDIA GPU Operator and Network Operator are open source software projects published under the Apache 2.0 license. NVIDIA welcomes upstream participation for both projects.

Register for GTC 2021 session, Exploring Cloud-native Edge AI, to learn more about accelerating edge AI with NVIDIA GPUs and SmartNICs.

Categories
Misc

whats poppin my dudes ( git diff tfjs tf.py)

anyone got any tips of the disparity between python and transpiling to javascript support? for example I made a model in py, transpiled it, and then it turns out that after all that, and lots of digging into bugs, that some transformative layers that we can use in python are not supported in the javscript version? any other things worth noting? what are the gaps?

submitted by /u/doctor_slimm
[visit reddit] [comments]

Categories
Misc

How To Build Custom Object Detector In Live Video With TensorFlow | Introduction | #AI

How To Build Custom Object Detector In Live Video With TensorFlow | Introduction | #AI submitted by /u/Minayafl
[visit reddit] [comments]
Categories
Offsites

MURAL: Multimodal, Multi-task Retrieval Across Languages

For many concepts, there is no direct one-to-one translation from one language to another, and even when there is, such translations often carry different associations and connotations that are easily lost for a non-native speaker. In such cases, however, the meaning may be more obvious when grounded in visual examples. Take, for instance, the word “wedding”. In English, one often associates a bride in a white dress and a groom in a tuxedo, but when translated into Hindi (शादी), a more appropriate association may be a bride wearing vibrant colors and a groom wearing a sherwani. What each person associates with the word may vary considerably, but if they are shown an image of the intended concept, the meaning becomes more clear.

The word “wedding” in English and Hindi conveys different mental images. Images are taken from wikipedia, credited to Psoni2402 (left) and David McCandless (right) with CC BY-SA 4.0 license.

With current advances in neural machine translation and image recognition, it is possible to reduce this sort of ambiguity in translation by presenting a text paired with a supporting image. Prior research has made much progress in learning image–text joint representations for high-resource languages, such as English. These representation models strive to encode the image and text into vectors in a shared embedding space, such that the image and the text describing it are close to each other in that space. For example, ALIGN and CLIP have shown that training a dual-encoder model (i.e., one trained with two separate encoders) on image–text pairs using a contrastive learning loss works remarkably well when provided with ample training data.

Unfortunately, such image–text pair data does not exist at the same scale for the majority of languages. In fact, more than 90% of this type of web data belongs to the top-10 highly-resourced languages, such as English and Chinese, with much less data for under-resourced languages. To overcome this issue, one could either try to manually collect image–text pair data for under-resourced languages, which would be prohibitively difficult due to the scale of the undertaking, or one could seek to leverage pre-existing datasets (e.g., translation pairs) that could inform the necessary learned representations for multiple languages.

In “MURAL: Multimodal, Multitask Representations Across Languages”, presented at Findings of EMNLP 2021, we describe a representation model for image–text matching that uses multitask learning applied to image–text pairs in combination with translation pairs covering 100+ languages. This technology could allow users to express words that may not have a direct translation into a target language using images instead. For example, the word “valiha”, refers to a type of tube zither played by the Malagasy people, which lacks a direct translation into most languages, but could be easily described using images. Empirically, MURAL shows consistent improvements over state-of-the-art models, other benchmarks, and competitive baselines across the board. Moreover, MURAL does remarkably well for the majority of the under-resourced languages on which it was tested. Additionally, we discover interesting linguistic correlations learned by MURAL representations.

MURAL Architecture
The MURAL architecture is based on the structure of ALIGN, but employed in a multitask fashion. Whereas ALIGN uses a dual-encoder architecture to draw together representations of images and associated text descriptions, MURAL employs the dual-encoder structure for the same purpose while also extending it across languages by incorporating translation pairs. The dataset of image–text pairs is the same as that used for ALIGN, and the translation pairs are those used for LaBSE.

MURAL solves two contrastive learning tasks: 1) image–text matching and 2) text–text (bitext) matching, with both tasks sharing the text encoder module. The model learns associations between images and text from the image–text data, and learns the representations of hundreds of diverse languages from the translation pairs. The idea is that a shared encoder will transfer the image–text association learned from high-resource languages to under-resourced languages. We find that the best model employs an EfficientNet-B7 image encoder and a BERT-large text encoder, both trained from scratch. The learned representation can be used for downstream visual and vision-language tasks.

The architecture of MURAL depicts dual encoders with a shared text-encoder between the two tasks trained using a contrastive learning loss.

Multilingual Image-to-Text and Text-to-Image Retrieval
To demonstrate MURAL’s capabilities, we choose the task of cross-modal retrieval (i.e., retrieving relevant images given a text and vice versa) and report the scores on various academic image–text datasets covering well-resourced languages, such as MS-COCO (and its Japanese variant, STAIR), Flickr30K (in English) and Multi30K (extended to German, French, Czech), XTD (test-only set with seven well-resourced languages: Italian, Spanish, Russian, Chinese, Polish, Turkish, and Korean). In addition to well-resourced languages, we also evaluate MURAL on the recently published Wikipedia Image–Text (WIT) dataset, which covers 108 languages, with a broad range of both well-resourced (English, French, Chinese, etc.) and under-resourced (Swahili, Hindi, etc.) languages.

MURAL consistently outperforms prior state-of-the-art models, including M3P, UC2, and ALIGN, in both zero-shot and fine-tuned settings evaluated on well-resourced and under-resourced languages. We see remarkable performance gains for under-resourced languages when compared to the state-of-the-art model, ALIGN.

Mean recall on various multilingual image–text retrieval benchmarks. Mean recall is a common metric used to evaluate cross-modal retrieval performance on image–text datasets (higher is better). It measures the Recall@N (i.e., the chance that the ground truth image appears in the first N retrieved images) averaged over six measurements: Image→Text and Text→Image retrieval for N=[1, 5, 10]. Note that XTD scores report Recall@10 for Text→Image retrieval.

Retrieval Analysis
We also analyzed zero-shot retrieved examples on the WIT dataset comparing ALIGN and MURAL for English (en) and Hindi (hi). For under-resourced languages like Hindi, MURAL shows improved retrieval performance compared to ALIGN that reflects a better grasp of the text semantics.

Comparison of the top-5 images retrieved by ALIGN and by MURAL for the Text→Image retrieval task on the WIT dataset for the Hindi text, एक तश्तरी पर बिना मसाले या सब्ज़ी के रखी हुई सादी स्पगॅत्ती”, which translates to the English, “A bowl containing plain noodles without any spices or vegetables”.

Even for Image→Text retrieval in a well-resourced language, like French, MURAL shows better understanding for some words. For example, MURAL returns better results for the query “cadran solaire” (“sundial”, in French) than ALIGN, which doesn’t retrieve any text describing sundials (below).

Comparison of the top-5 text results from ALIGN and from MURAL on the Image→Text retrieval task for the same image of a sundial.

Embeddings Visualization
Previously, researchers have shown that visualizing model embeddings can reveal interesting connections among languages — for instance, representations learned by a neural machine translation (NMT) model have been shown to form clusters based on their membership to a language family. We perform a similar visualization for a subset of languages belonging to the Germanic, Romance, Slavic, Uralic, Finnic, Celtic, and Finno-Ugric language families (widely spoken in Europe and Western Asia). We compare MURAL’s text embeddings with LaBSE’s, which is a text-only encoder.

A plot of LabSE’s embeddings shows distinct clusters of languages influenced by language families. For instance, Romance languages (in purple, below) fall into a different region than Slavic languages (in brown, below). This finding is consistent with prior work that investigates intermediate representations learned by a NMT system.

Visualization of text representations of LaBSE for 35 languages. Languages are color coded based on their genealogical association. Representative languages include: Germanic (red) — German, English, Dutch; Uralic (orange) — Finnish, Estonian; Slavic (brown) — Polish, Russian; Romance (purple) — Italian, Portuguese, Spanish; Gaelic (blue) — Welsh, Irish.

In contrast to LaBSE’s visualization, MURAL’s embeddings, which are learned with a multimodal objective, shows some clusters that are in line with areal linguistics (where elements are shared by languages or dialects in a geographic area) and contact linguistics (where languages or dialects interact and influence each other). Notably, in the MURAL embedding space, Romanian (ro) is closer to the Slavic languages like Bulgarian (bg) and Macedonian (mk), which is in line with the Balkan sprachbund, than it is in LaBSE. Another possible language contact brings Finnic languages, Estonian (et) and Finnish (fi), closer to the Slavic languages cluster. The fact that MURAL pivots on images as well as translations appears to add an additional view on language relatedness as learned in deep representations, beyond the language family clustering observed in a text-only setting.

Visualization of text representations of MURAL for 35 languages. Color coding is the same as the figure above.

Final Remarks
Our findings show that training jointly using translation pairs helps overcome the scarcity of image–text pairs for many under-resourced languages and improves cross-modal performance. Additionally, it is interesting to observe hints of areal linguistics and contact linguistics in the text representations learned by using a multimodal model. This warrants more probing into different connections learned implicitly by multimodal models, such as MURAL. Finally, we hope this work promotes further research in the multimodal, multilingual space where models learn representations of and connections between languages (expressed via images and text), beyond well-resourced languages.

Acknowledgements
This research is in collaboration with Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, and Jason Baldridge. We thank Zarana Parekh, Orhan Firat, Yuqing Chen, Apu Shah, Anosh Raj, Daphne Luong, and others who provided feedback for the project. We are also grateful for general support from Google Research teams.

Categories
Misc

Creating Smarter Spaces with NVIDIA Metropolis and Edge AI

Graphic with NVIDIA logo and smart cars.Learn how AI-enabled video analytics is helping companies and employees work smarter and safer.   Graphic with NVIDIA logo and smart cars.

What do a factory floor, retail store, and major roadway have in common? They are a few examples of valuable and constrained infrastructure that need to be optimized. Manufacturers aim for early detection of defects in the assembly process. Retailers seek to better understand their customer journey and deliver more frictionless checkout experiences. Traffic planners look to reduce traffic gridlock.  

Over one billion cameras are deployed worldwide in nearly all of our important spaces, generating tremendous amounts of data but without a system for analyzing this data, valuable insights are lost. Enter AI-powered computer vision, which unlocks insights hidden in the video to generate insights that enable cities and companies to improve their safety and operational efficiency. 

Optimizing AI-enabled video analytics solutions streamlines tasks across industries, from healthcare to manufacturing, helping companies and their employees to work smarter and safer.   

NVIDIA Metropolis is an application framework, set of developer tools, and partner ecosystem that unites visual data and AI to enable greater functionality and efficiency across a range of physical spaces and environments. 

Transit hubs, retail stores, and factories use vision AI applications for more efficient, accessible, and safe operations. The following examples illustrate vision AI applications transforming how we use and manage our most critical spaces. 

Airports: With terminals serving and moving millions of passengers a year, airports are small cities, industrial sites, and transportation hubs. AI-enabled video analytics solutions identify and manage incidents in real time to minimize disruptions to passengers and airport operations. These solutions help airlines accelerate airplane turnarounds, deliver safer airport operations, and provide parking management to passengers. 

Factories: Companies are increasingly automating their manufacturing processes with IoT sensors, the most common of which are video cameras. These cameras capture vast amounts of data that, when combined with the power of AI, produce valuable insights that manufacturers can use to improve operational efficiency. Real-time understanding and responses are critical, such as identifying product defects on assembly lines, scanning for workplace hazards and signaling when machines require maintenance.

Farms: Farmers around the world are turning to vision AI applications to automate and improve their operations and yield quality. These applications help in a wide range of use cases, from counting cows to detecting weeds to the robotic pollination of tomatoes. These computer vision applications help farmers revolutionize food production by improving yield and using less resources.

Stadiums: Millions of people around the world visit stadiums to enjoy live sporting and cultural events. AI-enabled video analytics solutions are used to automate perimeter protection, weapons detection, crowd analytics, parking management, and suspicious behavior monitoring to provide a safer and more cohesive experience for visitors.

Hospitals: AI-enabled video analytics solutions help keep track of operating room procedures, ultimately improving patient care and surgical outcomes. By using accurate action logging, hospital staff can monitor surgical procedures, enforce disinfecting protocols, and check medical supply inventory levels in real time. AI-enabled video analytics reduces the need for human input on certain routine tasks, giving doctors and nurses more time with their patients.

Universities: AI vision helps university administrators better understand how physical spaces, like offices, gyms, and halls, are used. AI applications can also analyze real-time video footage and generate insights that inform better campus management, from detecting crowd flow patterns to creating immediate alerts for abnormal activities like fires, accidents, or water leakage.



A new generation of AI applications at the edge is driving incredible operational efficiency and safety gains across a broad range of spaces. Download a free e-book to learn how Metropolis and edge AI are helping build smarter and safer spaces around the world.

Categories
Misc

How to get reproducibile results in tensorflow?

I’m working on a project based on a conda environment, by using:

  • tensorflow-gpu=2.4.0,
  • cudatoolkit=10.2.89,
  • cudnn=7.6.5.

I’d like to have reproducibile results, so I tried with:

import os import random import numpy as np from numpy.random import default_rng import tensorflow as tf random.seed(0) rng = default_rng(0) tf.random.set_seed(0) 

And launching the python script from the terminal as:

PYTHONHASHSEED=0 python /path/to/main.py 

But my results are not reproducible.

Without posting my code (because is long and includes many files), which could be some other aspects that I should consider in order to get reproducibility?

PS: the Artificial Neural Network is a CNN and is created with by adding layers as, e.g.,:tf.keras.layers.Convolution2D(…)

submitted by /u/RainbowRedditForum
[visit reddit] [comments]

Categories
Misc

whats poppin my dudes

any suggestions on how I could avoid the ‘loading’ aspect of a model in a server that servers client resquests to a web api endpoint? such that the model is permanently ‘loaded’ and only has to make predictions?

# to save compute time that is (duh)

beep bop

submitted by /u/doctor_slimm
[visit reddit] [comments]