DataBloom - Part 466

Misc

Deploying Rich Cluster API on DGX for Multi-User Sharing

Post author By
Post date June 28, 2021
No Comments on Deploying Rich Cluster API on DGX for Multi-User Sharing

Efficient DGX System Sharing for Data Science Teams NVIDIA DGX is the universal system for AI and Data Science infrastructure. Hence, many organizations have incorporated DGX systems into their data centers for their data scientists and developers. The number of people running on these systems varies, from small teams of one to five people, to … Continued

Efficient DGX System Sharing for Data Science Teams

NVIDIA DGX is the universal system for AI and Data Science infrastructure. Hence, many organizations have incorporated DGX systems into their data centers for their data scientists and developers. The number of people running on these systems varies, from small teams of one to five people, to much larger teams of tens or maybe even hundreds of users. In this post we demonstrate how even a single DGX system can be efficiently utilized by multiple users. The focus in this blog is not on large DGX POD clusters. Rather, depending on the use cases, the deployment scale might be just one to several systems.

The deployment is managed via DeepOps toolkit, an open-source repository for common cluster management tasks. Using DeepOps organizations can customize features and scalability that are appropriate for their cluster and team sizes. Even when sharing just one DGX, users will have the capabilities and features of a rich cluster API to dynamically and efficiently run their workloads. That said, please note that DeepOps will require commitment from an organization’s DevOps professionals to tweak and maintain the underlying deployment. It is free and open-source software without support from NVIDIA.

DeepOps cluster API

DeepOps covers setting up login nodes, management, monitoring, Kubernetes, Slurm, and even storage provisioning. Figure 1 illustrates a cluster with the DeepOps components.

DeepOps Cluster Management Diagram. — *Figure 1. DeepOps view of cluster management.*

The above setup with management nodes, Kubernetes and other components would require additional servers, either physical or virtual, to be provisioned. Some organizations might acquire just DGX systems, and even with one node this will be sufficient without additional servers to set up a multi-user environment. However, deploying Kubernetes with just one node is not supported. Refer to the “Zero to Data Science” blog on deploying Kubernetes. We will rely on Slurm for the cluster resource manager and scheduling. This limited scope is illustrated in Figure 2.

DeepOps Slurm Login on Compute Node Diagram. — *Figure 2. Simplified cluster management with one of the compute nodes functioning as a login node.*

Since the major difference in this setup is that one of the compute nodes functions as a login node, a few modifications are recommended. The GPU devices are restricted from regular login ssh sessions. When a user needs to run something on a GPU they would need to start a Slurm job session. The requested GPUs would then be available within the Slurm job session.

Additionally, in multi-user environments one typically wants to restrict elevated privileges for regular users. Rootless docker is used with the Slurm setup to avoid granting elevated rights to end users running containers. Rootless docker solves a couple problems. 1. The users can work in a familiar fashion with docker and NGC containers; 2. There is no need to add users to a privileged docker group; 3. The rootless docker sessions will be constrained to the Slurm job resources (privileged docker can bypass Slurm allocation constraints whereas rootless cannot).

In summary the users would interact with the GPUs per Figure 3.

Slurm environment for multi user multi-gpu workloads sharing a DGX. — *Figure 3. Multi-user team sharing a DGX system and its GPUs for various workloads.*

DeepOps deployment instructions (single node options)

The DeepOps Slurm deployment instructions for single node setup are available on github.In this blog the platform is DGX. The recommended setup is the latest DGX OS with the latest firmware updates. A pristine freshly imaged DGX is preferred. Refer to the NVIDIA enterprise support portal for the latest DGX updates as well as the DGX release notes.

The steps shown in this document should be executed by a user with sudo privileges, such as the user created during initial DGX OS installation.

1. Clone the repo:

$ git clone https://github.com/NVIDIA/deepops.git
$ cd deepops
$ git checkout tags/21.06

2. Run initial setup script:

This will install Ansible and other prerequisite utilities for running deepops.

$ ./scripts/setup.sh

3. Edit config and options.

After running the setup script in step 2, a copy of “config.example” directory will be made to “config” directory. When one of the compute nodes also functions as a login node a few special configurations have to be set.

a. Configuring inventory “config/inventory”

Let a host be named “gpu01” (DGX-1 with 8 GPUs) with an ssh reachable ip address of “192.168.1.5” and some admin user “dgxuser”. Then a single node config would look like this:

$ vi config/inventory
[all]
gpu01     ansible_host=192.168.1.5

[slurm-master]
gpu01

[slurm-node]
gpu01

[all:vars]
# SSH User
ansible_user=dgxuser
ansible_ssh_private_key_file='~/.ssh/id_rsa'

If the machine has a different hostname (i.e. not gpu01), then use the desired host name. Running the deployment will change the hostname to what is set in the inventory file. Otherwise one can set the “deepops_set_hostname” var in “config/group_vars/all.yml” (refer to DeepOps Ansible configuration options).

Depending on how the ssh is configured on the cluster one might have to generate a passwordless private key. Leave the password blank when prompted:

$ ssh-keygen -t rsa
... accept defaults ...
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 600 ./.ssh/authorized_keys

b. Configuring “config/group_vars/slurm-cluster.yml”

We need to add users to Slurm configuration for the ability to ssh as local users. This is needed when a compute node also functions as a login node. Additionally, set the singularity install option (which is “no” by default). Singularity can be used to run containers and it will be used to set up rootless options as well. Do not set up a default NFS with single node deployment.

$ vi config/group_vars/slurm-cluster.yml
slurm_allow_ssh_user:
- "user1"
- "user2"
- "user3”

slurm_login_on_compute: true

slurm_cluster_install_singularity: yes

slurm_enable_nfs_server: false
slurm_enable_nfs_client_nodes: false

Note: After deployment new users have to be manually added to “/etc/localusers” and “/etc/slurm/localusers.backup” on the node that functions as a login node.

4. Verify the configuration.

Check that ansible can run successfully and reach hosts. Run the hostname utility on “all” nodes. The “all” refers to the section in the “config/inventory” file. $ ansible all --connection=local -m raw -a "hostname"

For non-local setup does not set a connection to local. This requires ssh config to be set up properly in the “config/inventory”. Check connections via. $ ansible all -m raw -a "hostname"

5. Install Slurm.

When running Ansible, specify “–forks=1” so that Ansible does not perform potentially conflicting operations required for a slurm-master and slurm-node in parallel on the same node.The “–forks=1” option will ensure that the installation steps are serial.

$ ansible-playbook -K --forks=1 --connection=local -l slurm-cluster playbooks/slurm-cluster.yml

For non-local installs do not set connection to local. The forks option is still required.

# NOTE: If SSH requires a password, add: `-k`
# NOTE: If sudo on remote machine requires a password, add: `-K`
# NOTE: If SSH user is different than current user, add: `-u ubuntu`
$ ansible-playbook -K --forks=1 -l slurm-cluster playbooks/slurm-cluster.yml

During Slurm playbook reboot is usually done twice:

Once after installing the NVIDIA driver, because the driver sometimes requires a reboot to load correctly.
Once after setting some grub options used for Slurm compute nodes to configure cgroups correctly, because of modification to the kernel command line.

The above reboot sequence cannot be automated when the compute and login nodes are on the same system. The recommended approach is to reboot manually when prompted and then run Ansible again. Setting “slurm_login_on_compute” to true, the slurm-cluster playbook will restrict GPUs in ssh sessions on the slurm-master by running the following command:

$ sudo systemctl set-property sshd.service DeviceAllow="/dev/nvidiactl"

The reasoning for hiding GPUs in regular ssh sessions is that we want to avoid having users run a compute task on the GPUs without a Slurm job. If you desire to use docker within slurm then also install rootless docker after slurm deployment via playbook:

$ ansible-playbook -K --forks=1 --connection=local --limit slurm-cluster playbooks/container/docker-rootless.yml

6. Post install information.

The admin users can access the GPUs that are restricted from regular ssh login sessions. This could be useful in situations when maybe GPU firmware needs to be updated. Let “dgxuser” be an admin user, they would access GPUs via command:

login-session:$ nvidia-smi -L
No devices found.
login-session:$ sudo systemd-run --scope --uid=root --gid=root -- bash
login-session-with-gpus:$ nvidia-smi -L | wc -l
8

Refer to official Slurm documentation for additional admin configurations and options. Typical Slurm options one might want to configure are time limits on jobs, accounts, qos and priority settings, etc.

Monitoring GPUs

Refer to DeepOps documentation regarding how monitoring is configured and deployed on the Slurm cluster.

The grafana dashboard will be available at the ip address of the manager node on port 3000 (with above config http://192.168.1.5:3000). Either open the url at the manager node’s ip address or tunnel. SSH tunnel example:

$ ssh -L localhost:3000:localhost:3000 dgxuser@192.168.1.5
# open url:
http://127.0.0.1:3000/

On RHEL the grafana service might need to be exposed via firewall-cmd.

$ firewall-cmd --zone=public --add-port=3000/tcp --permanent
$ firewall-cmd --reload

Logins and running jobs

After Slurm is deployed, the users can now request GPUs and run containers. The following examples demonstrate the working pattern for a multi-user team sharing a single DGX system.

Initial SSH to Login Node

Let the ip address of the login system be “192.168.1.5” and user “testuser”. They would ssh to the system as follows:

$ ssh testuser@192.168.1.5
testuser@192.168.1.5's password: 
Welcome to NVIDIA DGX Server Version 5.0.0 (GNU/Linux 5.4.0-52-generic x86_64)

  System information as of Wed 09 Dec 2020 10:16:09 PM UTC

  System load:  0.11                Processes:                 908
  Usage of /:   12.9% of 437.02GB   Users logged in:           2
  Memory usage: 1%                  IPv4 address for docker0:  172.17.0.1
  Swap usage:   0%                  IPv4 address for enp1s0f0: 192.168.1.5
  Temperature:  47.0 C

The system has 0 critical alerts and 5 warnings. Use 'sudo nvsm show alerts' for more details.

Last login: Tue Dec  1 00:01:32 2020 from 172.20.176.144
login-session:$

At this point, the tester would not be able to run anything on GPUs. This is because the GPUs are not available to the user via the initial ssh login session.

login-session:$ nvidia-smi -L
No devices found.

It might be useful just in case to ssh again in another terminal window. The second login session could be used for monitoring jobs or to perform auxiliary tasks, whereas the first session would be used to launch jobs interactively (more on this in the following sections).

Allocating GPUs

To request a GPU the testuser must first run a Slurm command with GPU allocation. Example:

login-session:$ srun --ntasks=1 --cpus-per-task=5 --gpus-per-task=1 --pty bash

compute-session:$ nvidia-smi -L
GPU 0: Tesla P100-SXM2-16GB (UUID: GPU-61ba3c7e-584a-7eb4-d993-2d0b0a43b24f)

The job allocations details in Slurm can be viewed in another pane (such as one of the tmux panes in the login session without GPU access) via “squeue” command and details of the job can be viewed via “scontrol”.

             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
               106     batch     bash testuser  R       3:34      1 gpu01

login-session:$ scontrol show jobid -dd 106
JobId=106 JobName=bash
   UserId=testuser(1002) GroupId=testuser(1002) MCS_label=N/A
   . . .
   NumNodes=1 NumCPUs=6 NumTasks=1 CPUs/Task=5 ReqB:S:C:T=0:0:*:*
   TRES=cpu=6,node=1,billing=6,gres/gpu=1
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:1 CoreSpec=*
   JOB_GRES=gpu:1
     Nodes=gpu01 CPU_IDs=0-5 Mem=0 GRES=gpu:1(IDX:0)
   . . .

DeepOps deploys Slurm with “pam_slurm_adopt” such that ssh sessions are permitted and adopted to allocated nodes. What that means is once a user has a Slurm job, additional ssh sessions will be adopted to the job. Proceeding with the above example, let us assume that job 106 is running. If the testuser were to make another ssh connection it would be adopted into job 106. If there were multiple jobs the default behaviour would be adopted into the latest job.

$ ssh testuser@192.168.1.5
testuser@192.168.1.5's password: 
Welcome to NVIDIA DGX Server Version 5.0.0 (GNU/Linux 5.4.0-52-generic x86_64)
. . .
compute-session:$ echo NGPUS: $(nvidia-smi -L | wc -l) NCPUS: $(nproc)
NGPUS: 1 NCPUS: 6

Such ssh sessions are useful in various use cases: monitoring, debugging, additional launch commands, and so on. These adopted ssh sessions are automatically terminated when the corresponding Slurm job ends. The above scenario is also why it is convenient to use tmux with the initial ssh session.

Running containers

NVIDIA provides many AI, HPC, and Data Science containers. These are hosted on NGC. Take advantage of these GPU optimized containers. DeepOps enables running containers with several containerization alternatives: docker, singularity, and enroot with pyxis.

Rootless Docker

Docker remains the de facto industry standard for containerization. Docker is very easy to work with when building and extending containers. NGC and many other data science software stacks are distributed as docker containers. Fortunately, it is straightforward to incorporate rootless docker with Slurm. The reason for using rootless docker is that we want to avoid granting elevated privileges to users unnecessarily.

DeepOps sets up rootless docker as a module package. Environment modules are a popular way to set up cluster wide software for sharing. On a side note, DeepOps can set up easybuild or spack to manage environment modules and packages. The workflow with rootless docker is as follows: 1. A user starts a slurm job; 2. Module load the rootless docker package; 3. Start rootless docker daemon; 4. Work with docker containers per regular docker workflow. The following examples will illustrate the commands.

Start by reserving desired resources:

login-session:$ srun --ntasks=1 --gpus-per-task=4 --cpus-per-gpu=10 
  --gpu-bind=closest --pty bash

Load and start rootless docker:

compute-session:$ module load rootless-docker
compute-session:$ start_rootless_docker.sh --quiet

The option “–quiet” redirects rootless docker messages to dev null to reduce verbosity. Then run a docker image:

compute-session:$ docker run --gpus=all --rm -it 
  --shm-size=10g --ulimit memlock=-1 --ulimit stack=67108864 
  nvcr.io/nvidia/tensorflow:21.04-tf2-py3 
  mpirun --allow-run-as-root -H localhost:4 
    --report-bindings --bind-to none --map-by slot -np 4 
    python /workspace/nvidia-examples/cnn/resnet.py

The “–gpus=all” refers to all GPUs that are scheduled in the Slurm session, not all GPUs on the host. This could be placed in a script and run with srun or sbatch. Example:

$ cat test-resnetbench-docker.sh 
#!/bin/bash

module load rootless-docker

start_rootless_docker.sh --quiet

docker run --gpus=all --rm -t 
  --shm-size=10g --ulimit memlock=-1 --ulimit stack=67108864 
  nvcr.io/nvidia/tensorflow:21.04-tf2-py3 
  mpirun --allow-run-as-root -H localhost:4 
    --report-bindings --bind-to none --map-by slot -np 4 
    python /workspace/nvidia-examples/cnn/resnet.py

stop_rootless_docker.sh
Run the script:
login-session:$ srun --ntasks=1 --gpus-per-task=4 --cpus-per-gpu=10 
  --gpu-bind=closest ${PWD}/test-resnetbench-docker.sh

The Slurm constraints are enforced for rootless docker. This can be verified by starting the container in an interactive slurm session and checking the number of GPUs and CPUs available.

login-session:$ srun --ntasks=1 --gpus-per-task=4 --cpus-per-gpu=10 
  --gpu-bind=closest --pty bash
compute-session:$ module load rootless-docker
compute-session:$ start_rootless_docker.sh --quiet
compute-session:$ docker run --gpus=all --rm -it 
  nvcr.io/nvidia/tensorflow:21.04-tf2-py3 
  bash -c 'echo NGPUS: $(nvidia-smi -L | wc -l) NCPUS: $(nproc)'
NGPUS: 4 NCPUS: 40

Rootless docker runs via user namespace remapping. Files created are owned by the user, and within the container a user would not have write/execute permissions to filesystem or executables that the user already does not have permission to outside of the container. Thus user privileges are not elevated within Slurm when using rootless docker.

A user can explicitly stop the rootless docker daemon with “stop_rootless_docker.sh” script, or just exit the Slurm session. Upon ending a slurm session, processes in the session are killed and therefore the user’s rootless docker process will end.

compute-session:$ stop_rootless_docker.sh 
compute-session:$ exit
exit                                                
login-session:$

These scripts “start_rootless_docker.sh” and “stop_rootless_docker.sh” appear on a user’s path after loading the rootless docker module.

Enroot and singularity

Although beyond the scope of this blog, enroot and singularity could also be deployed via DeepOps. These are especially useful for multi-node jobs if running on more than one DGX system. The examples below though are for single node jobs.

Enroot with pyxis can be tested by running:

login-session:$ srun --mpi=pmi2 --ntasks=4 --gpus-per-task=1 
  --cpus-per-gpu=10 --gpu-bind=closest 
  --container-image='nvcr.io#nvidia/tensorflow:21.04-tf2-py3' 
  python /workspace/nvidia-examples/cnn/resnet.py

The pyxis+enroot is invoked via option “–container-image” to run the container. Refer to enroot and pyxis documentation for further details.

Singularity could be used in a similar fashion to enroot. Using singularity it is typically more efficient to convert the docker container to singularity format (“*.sif”) prior to running the job. Don’t forget the “–nv” option for GPUs. Example:

login-session:$ singularity pull docker://nvcr.io/nvidia/tensorflow:21.04-tf2-py3
login-session:$ srun --mpi=pmi2 --ntasks=4 --gpus-per-task=1 
  --cpus-per-gpu=10 --gpu-bind=closest 
  singularity exec --nv ~/singularity_images/tensorflow_21.04-tf2-py3.sif 
  python /workspace/nvidia-examples/cnn/resnet.py

Refer to singularity documentation for further details. Building containers with singularity is permitted to non-privileged users via the “–fakeroot” option.

Enroot and Singularity excel at running containerized multinode jobs, which is somewhat difficult and less convenient to do using docker on Slurm. The focus of the blog is on single node DeepOps deployment, but if multinode containerized jobs on Slurm are of interest consider enroot and singularity for that task.

Conclusion

DeepOps aims to make managing GPU clusters flexible and scalable. Various organizations ranging from enterprises to small businesses, startups, and educational institutions, can adopt and deploy DeepOps features relevant to their teams. Admins have tools at their disposal to manage GPUs effectively. Meanwhile, users can collaborate and work productively and efficiently on shared GPU resources.

Get started with DeepOps and refer to the tips in this blog to work productively with your GPUs and DGX systems.

Link to DeepOps: DeepOps
Landing Product Page: DGX Systems
Solutions Page: Data Center
Newsletter: Data Science Newsletter

Offsites

Take All Your Pictures to the Cleaners, with Google Photos Noise and Blur Reduction

Post author By
Post date June 28, 2021
No Comments on Take All Your Pictures to the Cleaners, with Google Photos Noise and Blur Reduction

Posted by Mauricio Delbracio, Research Scientist and Sungjoon Choi, Software Engineer, Google Research

Despite recent leaps in imaging technology, especially on mobile devices, image noise and limited sharpness remain two of the most important levers for improving the visual quality of a photograph. These are particularly relevant when taking pictures in poor light conditions, where cameras may compensate by increasing the ISO or slowing the shutter speed, thereby exacerbating the presence of noise and, at times, increasing image blur. Noise can be associated with the particle nature of light (shot noise) or be introduced by electronic components during the readout process (read noise). The captured noisy signal is then processed by the camera image processor (ISP) and later may be further enhanced, amplified, or distorted by a photographic editing process. Image blur can be caused by a wide variety of phenomena, from inadvertent camera shake during capture, an incorrect setting of the camera’s focus (automatic or not), or due to the finite lens aperture, sensor resolution or the camera’s image processing.

It is far easier to minimize the effects of noise and blur within a camera pipeline, where details of the sensor, optical hardware and software blocks are understood. However, when presented with an image produced from an arbitrary (possibly unknown) camera, improving noise and sharpness becomes much more challenging due to the lack of detailed knowledge and access to the internal parameters of the camera. In most situations, these two problems are intrinsically related: noise reduction tends to eliminate fine structures along with unwanted details, while blur reduction seeks to boost structures and fine details. This interconnectedness increases the difficulty of developing image enhancement techniques that are computationally efficient to run on mobile devices.

Today, we present a new approach for camera-agnostic estimation and elimination of noise and blur that can improve the quality of most images. We developed a pull-push denoising algorithm that is paired with a deblurring method, called polyblur. Both of these components are designed to maximize computational efficiency, so users can successfully enhance the quality of a multi-megapixel image in milliseconds on a mobile device. These noise and blur reduction strategies are critical components of the recent Google Photos editor updates, which includes “Denoise” and “Sharpen” tools that enable users to enhance images that may have been captured under less than ideal conditions, or with older devices that may have had more noisy sensors or less sharp optics.

A demonstration of the “Denoise” and “Sharpen” tools now available in the Google Photos editor.

How Noisy is An Image?
In order to accurately process a photographic image and successfully reduce the unwanted effects of noise and blur, it is vitally important to first characterize the types and levels of noise and blur found in the image. So, a camera-agnostic approach for noise reduction begins by formulating a method to gauge the strength of noise at the pixel level from any given image, regardless of the device that created it. The noise level is modeled as a function of the brightness of the underlying pixel. That is, for each possible brightness level, the model estimates a corresponding noise level in a manner agnostic to either the actual source of the noise or the processing pipeline.

To estimate this brightness-based noise level, we sample a number of small patches across the image and measure the noise level within each patch, after roughly removing any underlying structure in the image. This process is repeated at multiple scales, making it robust to artifacts that may arise from compression, image resizing, or other non-linear camera processing operations.

The two segments on the left illustrate signal-dependent noise present in the input image (center). The noise is more prominent in the bottom, darker crop and is unrelated to the underlying structure, but rather to the light level. Such image segments are sampled and processed to generate the spatially-varying noise map (right) where red indicates more noise is present.

Reducing Noise Selectively with a Pull-Push Method
We take advantage of self-similarity of patches across the image to denoise with high fidelity. The general principle behind such so-called “non-local” denoising is that noisy pixels can be denoised by averaging pixels with similar local structure. However, these approaches typically incur high computational costs because they require a brute force search for pixels with similar local structure, making them impractical for on-device use. In our “pull-push” approach¹, the algorithmic complexity is decoupled from the size of filter footprints thanks to effective information propagation across spatial scales.

The first step in pull-push is to build an image pyramid (i.e., multiscale representation) in which each successive level is generated recursively by a “pull” filter (analogous to downsampling). This filter uses a per-pixel weighting scheme to selectively combine existing noisy pixels together based on their patch similarities and estimated noise, thus reducing the noise at each successive, “coarser” level. Pixels at coarser levels (i.e., with lower resolution) pull and aggregate only compatible pixels from higher resolution, “finer” levels. In addition to this, each merged pixel in the coarser layers also includes an estimated reliability measure computed from the similarity weights used to generate it. Thus, merged pixels provide a simple per-pixel, per-level characterization of the image and its local statistics. By efficiently propagating this information through each level (i.e., each spatial scale), we are able to track a model of the neighborhood statistics for increasingly larger regions in a multiscale manner.

After the pull stage is evaluated to the coarsest level, the “push” stage fuses the results, starting from the coarsest level and generating finer levels iteratively. At a given scale, the push stage generates “filtered” pixels following a process similar to that of the pull stage, but going from coarse to finer levels. The pixels at each level are fused with those of coarser levels by doing a weighted average of same-level pixels along with coarser-level filtered pixels using the respective reliability weights. This enables us to reduce pixel noise while preserving local structure, because only average reliable information is included. This selective filtering and reliability (i.e. information) multiscale propagation is what makes push-pull different from existing frameworks.

This series of images shows how filtering progresses through the pull-push process. Coarser level pixels pull and aggregate only compatible pixels from finer levels, as opposed to the traditional multiscale approaches using a fixed (non-data dependent) kernel. Notice how the noise is reduced throughout the stages.

The pull-push approach has a low computational cost, because the algorithm to selectively filter similar pixels over a very large neighborhood has a complexity that is only linear with the number of image pixels. In practice, the quality of this denoising approach is comparable to traditional non-local methods with much larger kernel footprints, but operates at a fraction of the computational cost.

Image enhanced using the pull-push denoising method.

How Blurry Is an Image?
An image with poor sharpness can be thought of as being a more pristine latent image that was operated on by a blur kernel. So, if one can identify the blur kernel, it can be used to reduce the effect. This is referred to as “deblurring”, i.e., the removal or reduction of an undesired blur effect induced by a particular kernel on a particular image. In contrast, “sharpening” refers to applying a sharpening filter, built from scratch and without reference to any particular image or blur kernel. Typical sharpening filters are also, in general, local operations that do not take account of any other information from other parts of the image, whereas deblurring algorithms estimate the blur from the whole image. Unlike arbitrary sharpening, which can result in worse image quality when applied to an image that is already sharp, deblurring a sharp image with a blur kernel accurately estimated from the image itself will have very little effect.

We specifically target relatively mild blur, as this scenario is more technically tractable, more computationally efficient, and produces consistent results. We model the blur kernel as an anisotropic (elliptical) Gaussian kernel, specified by three parameters that control the strength, direction and aspect ratio of the blur.

Gaussian blur model and example blur kernels. Each row of the plot on the right represents possible combinations of σ₀, ρ and θ. We show three different σ₀ values with three different ρ values for each.

Computing and removing blur without noticeable delay for the user requires an algorithm that is much more computationally efficient than existing approaches, which typically cannot be executed on a mobile device. We rely on an intriguing empirical observation: the maximal value of the image gradient across all directions at any point in a sharp image follows a particular distribution. Finding the maximum gradient value is efficient, and can yield a reliable estimate of the strength of the blur in the given direction. With this information in hand, we can directly recover the parameters that characterize the blur.

Polyblur: Removing Blur by Re-blurring
To recover the sharp image given the estimated blur, we would (in theory) need to solve a numerically unstable inverse problem (i.e., deblurring). The inversion problem grows exponentially more unstable with the strength of the blur. As such, we target the case of mild blur removal. That is, we assume that the image at hand is not so blurry as to be beyond practical repair. This enables a more practical approach — by carefully combining different re-applications of an operator we can approximate its inverse.

Mild blur, as shown in these examples, can be effectively removed by combining multiple applications of the estimated blur.

This means, rather counterintuitively, that we can deblur an image by re-blurring it several times with the estimated blur kernel. Each application of the (estimated) blur corresponds to a first order polynomial, and the repeated applications (adding or subtracting) correspond to higher order terms in a polynomial. A key aspect of this approach, which we call polyblur, is that it is very fast, because it only requires a few applications of the blur itself. This allows it to operate on megapixel images in a fraction of a second on a typical mobile device. The degree of the polynomial and its coefficients are set to invert the blur without boosting noise and other unwanted artifacts.

The deblurred image is generated by adding and subtracting multiple re-applications of the estimated blur (polyblur).

Integration with Google Photos
The innovations described here have been integrated and made available to users in the Google Photos image editor in two new adjustment sliders called “Denoise” and “Sharpen”. These features allow users to improve the quality of everyday images, from any capture device. The features often complement each other, allowing both denoising to reduce unwanted artifacts, and sharpening to bring clarity to the image subjects. Try using this pair of tools in tandem in your images for best results. To learn more about the details of the work described here, check out our papers on polyblur and pull-push denoising. To see some examples of the effect of our denoising and sharpening up close, have a look at the images in this album.

Acknowledgements
The authors gratefully acknowledge the contributions of Ignacio Garcia-Dorado, Ryan Campbell, Damien Kelly, Peyman Milanfar, and John Isidoro. We are also thankful for support and feedback from Navin Sarma, Zachary Senzer, Brandon Ruffin, and Michael Milne.

¹ The original pull-push algorithm was developed as an efficient scattered data interpolation method to estimate and fill in the missing pixels in an image where only a subset of the pixels are specified. Here, we extend its methodology and present a data-dependent multiscale algorithm for denoising images efficiently. ^↩

Misc

Kitware ParaView Omniverse Connector Now Available in Open Beta

Post author By
Post date June 28, 2021
No Comments on Kitware ParaView Omniverse Connector Now Available in Open Beta

NVIDIA Omniverse has expanded to address the scientific visualization community with aConnector to Kitware ParaView, one of the world’s most popular scientific visualization applications.

NVIDIA Omniverse has expanded to address the scientific visualization community with a Connector to Kitware ParaView, one of the world’s most popular scientific visualization applications.

Now available by downloading Omniverse Open Beta, the Omniverse ParaView Connector enables researchers to boost their productivity and speed their discoveries. Large datasets no longer need to be downloaded and exchanged, and colleagues can get instantaneous feedback as Omniverse users can work in the same workspace in the cloud.

ParaView, an open-source, multi-platform data analysis and visualization application, enables users to quickly build visualizations so they can analyze and explore their data using qualitative and quantitative techniques. Researchers use ParaView to analyze massive datasets, and typically run the application on supercomputers to analyze datasets of petascale, as well as on laptops for smaller data.

The Omniverse ParaView Connector provides researchers with a toolkit that allows them to send and live sync their models to an NVIDIA Omniverse Nucleus Server, or alternatively locally output them to USD, with OpenVDB in case of volumes. This enables the user to edit and sync their ParaView data with any Omniverse Connect applications, or just import it into applications supporting USD or OpenVDB.

Simulating fuel injection in a combustion engine

Omniverse ParaView Connector works in multiple deployments and environments, including headless EGL Kitware ParaView servers, multi-node MPI environments, and python environments. The Connector adds to ParaView a render view, two filters, and an additional tab in ParaView’s settings dialog for global configuration.

For more information, see the Install Instructions for a walkthrough in configuring and using the connector, and check out the additional features available, including:

Supports client-server setups
Multi-environment deployment including: (EGL) Kitware ParaView servers, multi-node (MPI) environments, and python environments
Outputs to an Omniverse Nucleus server enabling multiple connections and outputs at once
Converts to USD for local output
Supports color-mapped surface meshes
Animated timesteps
Point datasets
Slice representations
Structured volume representations
Transfer of any point and/or cell arrays
Manual specification of uniform arrays over time

Download the NVIDIA Omniverse open beta today and get started with the Kitware ParaView Omniverse Connector.

For additional support, check out the Omniverse ParaView forum and join the Omniverse Discord server to chat with the community.

Learn more about the Omniverse ParaView Connector, and catch up on the GTC session on-demand session, Cinematic Scientific Visualization with the Omniverse ParaView Connector.

Misc

Superclouds: AI, Cloud-Native Supercomputers Sail into the TOP500

Post author By
Post date June 28, 2021
No Comments on Superclouds: AI, Cloud-Native Supercomputers Sail into the TOP500

NVIDIA technologies power 342 systems on the TOP500 list released at the ISC High Performance event today, including 70 percent of all new systems and eight of the top 10. The latest ranking of the world’s most powerful systems shows high performance computing centers are increasingly adopting AI. It also demonstrates that users continue to Read article >

The post Superclouds: AI, Cloud-Native Supercomputers Sail into the TOP500 appeared first on The Official NVIDIA Blog.

Misc

NVIDIA Powers Next-Generation Supercomputer at University of Edinburgh

Post author By
Post date June 28, 2021
No Comments on NVIDIA Powers Next-Generation Supercomputer at University of Edinburgh

ISC—NVIDIA today announced that its NVIDIA HGX™ high performance computing platform will power Tursa, the new DiRAC supercomputer to be hosted by the University of Edinburgh.

Misc

NVIDIA Aerial 5G Platform Extends Support for Arm

Post author By
Post date June 28, 2021
No Comments on NVIDIA Aerial 5G Platform Extends Support for Arm

Mobile World Congress—NVIDIA today announced it is extending support for Arm-based CPUs in the NVIDIA Aerial™ A100 AI-on-5G platform, bringing more choice to the 5G ecosystem.

Misc

NVIDIA and Google Cloud to Create Industry’s First AI-on-5G Lab to Speed Development of AI Everywhere

Post author By
Post date June 28, 2021
No Comments on NVIDIA and Google Cloud to Create Industry’s First AI-on-5G Lab to Speed Development of AI Everywhere

Mobile World Congress—NVIDIA today announced that it is partnering with Google Cloud to establish the industry’s first AI-on-5G Innovation Lab, enabling network infrastructure players and AI software partners to develop, test and adopt solutions that will help accelerate the creation of smart cities, smart factories and other advanced 5G and AI applications.

Misc

NVIDIA, Partners Extending Arm Ecosystem from Exascale to the Edge

Post author By
Post date June 28, 2021
No Comments on NVIDIA, Partners Extending Arm Ecosystem from Exascale to the Edge

Supercomputing centers around the world looking for a flexible, energy-efficient alternative to diversify their systems are beginning to turn to Arm for their exascale supercomputers. Arm is the world’s most popular CPU architecture, but the x86 architecture has more than 97 percent of the data center market, thanks to its ecosystem of partners, software and Read article >

The post NVIDIA, Partners Extending Arm Ecosystem from Exascale to the Edge appeared first on The Official NVIDIA Blog.

Misc

Here, There, Everywhere: NVIDIA Platform Accelerates Quantum Circuit Simulation Ecosystem

Post author By
Post date June 28, 2021
No Comments on Here, There, Everywhere: NVIDIA Platform Accelerates Quantum Circuit Simulation Ecosystem

Quantum computing promises scientific leaps — simulating molecules of atoms for drug discovery, for instance — in the near future. Handling exponentially more information than today’s computers, quantum computers harness the physics that govern subatomic particles to make parallel calculations. Teams worldwide in academia, industry and national labs are researching quantum computers and algorithms. Many Read article >

The post Here, There, Everywhere: NVIDIA Platform Accelerates Quantum Circuit Simulation Ecosystem appeared first on The Official NVIDIA Blog.

Misc

From Genomes to Proteins to Cells, Digital Biology Revolution Marches on with HPC and AI

Post author By
Post date June 28, 2021
No Comments on From Genomes to Proteins to Cells, Digital Biology Revolution Marches on with HPC and AI

Scientists and healthcare researchers — once limited by the number of samples that could be studied in a wet lab, or the quality of microscopes to peer into cells — are harnessing powerful computational tools to draw insights from an ever-growing trove of biological data. Backing this digital biology revolution is a combination of high Read article >

The post From Genomes to Proteins to Cells, Digital Biology Revolution Marches on with HPC and AI appeared first on The Official NVIDIA Blog.