DataBloom - Part 384

Misc

Lambda as last layer won’t return predictions for all input data

Post author By
Post date January 19, 2022
No Comments on Lambda as last layer won’t return predictions for all input data

In my model I have a keras.layers.Lambda layer as the output layer. I use the layer to do some post-processing and return either a 0 or a 1. The problem arises when I run the predict function, the lambda layer only returns a small portion of the last batch data.

Like imagine you have a lambda function that returns the same constant no matter the input given, then the layer won’t give predictions on all data that was passed to it.

For example:

def output_function(x): # Return 1 no matter the input return 1 model = keras.Sequential([ keras.layers.Dense(1, activation='sigmoid'), keras.layers.Lambda(output_function) ]) test_data = tf.random.uniform(shape=(79, 1)) model.predict(test_data, batch_size=32) # Returns an array of only 3 predictions

I suspect that it is because I don’t return a tensor of the shape (None, 1), but rather just (1). But I don’t know how to rewrite the output_function to return a tensor of shape (None, 1).

submitted by /u/Maltmax
[visit reddit] [comments]

Misc

Access python argument inside a Keras custom metric defined on a file different from the launched python script

Post author By
Post date January 19, 2022
No Comments on Access python argument inside a Keras custom metric defined on a file different from the launched python script

I’m working on a Machine Learning project with tf.keras, and, to start training, I launch the script via shell as:

python file1.py --coeff=2.0

In file1.py, I parse the argument coeff such as:

parser = argparse.ArgumentParser() parser.add_argument("--coeff", type=float, default=1.0) arguments = parser.parse_args()

In file2.py, I defined a tf.keras custom metric, as a function, such as:

def custom_metric(y_true, y_pred): y_true_new = tf.multiply(y_true, MYCOEFF) # .. additional stuff..

used by the function load_model (defined in the same file) used to load the model:

def load_model(model_name): model = tf.keras.models.load_model(model_name, custom_objects={'custom_metric': custom_metric})

Which is a good and easy way to have MYCOEFF equal to the argument coeff passed as argument when launching the script?

What I tried:

I tried to add, in file1.py, the line:

MYCOEFF = arguments.coeff

and, then, importing it in file2.py as:

from file1 import MYCOEFF

but in my case it doesn’t work, because I obtain an error regarding a “circular import”.

Are there any other (and better) ways?

submitted by /u/RainbowRedditForum
[visit reddit] [comments]

Misc

New AI Breast Cancer Model is the First to Show Diagnostic Process

Post author By
Post date January 19, 2022
No Comments on New AI Breast Cancer Model is the First to Show Diagnostic Process

A female doctor looking at mammography scans on her computer. Researchers create a new AI algorithm that can analyze mammography scans, identify whether a lesion is malignant, and show how it reached its conclusion.

A recently developed AI platform is giving medical professionals screening for breast cancer a new, transparent tool for evaluating mammography scans. The research, creates an AI model that evaluates the scans and highlights parts of an image the algorithm finds relevant. The work could help medical professionals determine whether a patient needs an invasive—and often nerve-wracking—biopsy.

“If a computer is going to help make important medical decisions, physicians need to trust that the AI is basing its conclusions on something that makes sense,” Joseph Lo, professor of radiology at Duke and study coauthor said in a press release. “We need algorithms that not only work, but explain themselves and show examples of what they’re basing their conclusions on. That way, whether a physician agrees with the outcome or not, the AI is helping to make better decisions.”

One in every eight women in the US will develop invasive breast cancer during their lifetime. When detected early, a woman has a 93 percent or higher survival rate in the first 5 years.

Mammography, which uses low-energy X-rays to examine breast tissue for diagnosis and screening, is an effective tool for early detection, but requires a highly skilled radiologist to interpret the scans. However, false negatives and positives do occur, resulting in missed diagnosis and up to 40% of biopsied lesions being benign.

Using AI for medical imaging analysis has grown significantly in recent years and offers advantages in interpreting data. Implementing AI models also carries risks, especially when an algorithm fails.

“Our idea was to instead build a system to say that this specific part of a potential cancerous lesion looks a lot like this other one that I’ve seen before,” said study lead and Duke computer science Ph.D. candidate Alina Barnett. “Without these explicit details, medical practitioners will lose time and faith in the system if there’s no way to understand why it sometimes makes mistakes.”

Using 1,136 images from 484 patients within the Duke University Health System, researchers trained the algorithm to locate and evaluate potentially cancerous areas. This was accomplished by training the models to identify unhealthy tissue, or lesions, which often appear as bright or irregular shapes with fuzzy edges on a scan.

Radiologists then labeled these images, teaching the algorithm to focus on the fuzzy edges, also known as margins. Often associated with quick-growing cancerous breast tumor cells, margins are a strong indicator of cancerous lesions. With these carefully labeled images, the AI can compare cancerous and benign edges, and learn to distinguish between them.

The AI model uses the cuDNN-accelerated PyTorch deep learning framework and can be run on two NVIDIA P100 or V100 GPUs.

Figure 1. Top image shows an AI model for spotting pre-cancerous lesions in mammography without revealing the decision-making process. Bottom image shows the IAIA-BL model that tells doctors where it’s looking and how its drawing its conclusions. Credit: Alina Barnett, Duke University.

The researchers found the AI to be as effective as other machine learning-based mammography models, but it holds the advantage of having transparency in its decision-making. When the model is wrong, a radiologist can see how the mistake was made.

According to the study, the model could also be a useful tool when teaching medical students how to read mammogram scans and for resource-constrained areas of the world lacking cancer specialists.

The code from the study is available through GitHub.

Read the study in Nature Machine Intelligence. >>
Read more. >>

Misc

Accelerating Medical Image Processing with NVIDIA DALI

Post author By
Post date January 19, 2022
No Comments on Accelerating Medical Image Processing with NVIDIA DALI

Deep learning models require a lot of data to produce accurate predictions. Here’s how to solve the data processing problem for the medical domain with NVIDIA DALI.

Deep learning models require vast amounts of data to produce accurate predictions, and this need becomes more acute every day as models grow in size and complexity. Even large datasets, such as the well-known ImageNet with more than a million images, are not sufficient to achieve state-of-the-art results in modern computer vision tasks.

For this purpose, data augmentation techniques are required to artificially increase the size of a dataset by introducing random disturbances to the data, such as geometric deformations, color transforms, noise addition, and so on. These disturbances help produce models that are more robust in their predictions, avoid overfitting, and deliver better accuracy.

In medical imaging tasks, data augmentation is critical because datasets contain mere hundreds or thousands of samples at best. Models, on the other hand, tend to produce large activations that require a lot of GPU memory, especially when dealing with volumetric data such as CT and MRI scans. This typically results in training with small batch sizes on a small dataset. To avoid overfitting, more elaborate data preprocessing and augmentation techniques are required.

Preprocessing, however, often has a significant impact on the overall performance of the system. This is especially true in applications dealing with large inputs, such as volumetric images. These preprocessing tasks are typically run on the CPU due to simplicity, flexibility, and availability of libraries such as NumPy.

In some applications, such as segmentation or detection in medical images, the GPU utilization during training is usually suboptimal as data preprocessing is usually performed in the CPU. One of the solutions is to attempt to overlap data processing and training fully, but it is not always that simple.

Such a performance bottleneck leads to a chicken and egg problem. Researchers avoid introducing more advanced augmentations into their models due to performance reasons, and libraries don’t put the effort into optimizing preprocessing primitives due to low adoption.

GPU acceleration solution

You can improve the performance of applications with heavy data preprocessing pipelines significantly by offloading data preprocessing to the GPU. The GPU is typically underutilized in such scenarios but can be used to do the work that the CPU cannot complete in time. The result is better hardware utilization, and ultimately faster training.

Just recently, NVIDIA took 3 out of 10 top places in the MICCAI 2021 Brain Tumor Segmentation Challenge, including the winning solution. The winning solution managed to reach a GPU utilization as high as 98% and reduced the total training time by around 5% (30 minutes), by accelerating the preprocessing pipeline of the system (Figure 1).

Training time (time to train for 1000 epochs) with DALI used for preprocessing is 7:20 hours while the native PyTorch data loader is 7:50 hours. — *Figure 1. U-Net3D BraTS21 training performance comparison*

This difference becomes more significant when you look at the NVIDIA submission for the MLPerf UNet3D benchmark. It used the same network architecture as in the BraTS21 winning solution but with a more complex data loading pipeline and larger input volumes (KITS19 dataset). The performance boost is an impressive 2x end-to-end training speedup when compared with the native pipeline (Figure 2).

Training time (time to train with given accuracy defined in the MLPerf Training benchmark rules) with DALI used for preprocessing is around 80 minutes, with the native PyTorch data loader being 160 minutes. Thanks to DALI, the training is two times shorter. — *Figure 2. U-Net3D MLPerf Training 1.1 training performance comparison*

This was made possible by NVIDIA Data Loading Library (DALI). DALI provides a set of GPU-accelerated building blocks, enabling you to build a complete data processing pipeline that includes data loading, decoding, and augmentation, and to integrate it with a deep learning framework of choice (Figure 3).

DALI is a tool that handles data loading of various formats: images, video, and audio; decodes, transforms (including augmentation), and provides the data in the tensor native to the deep learning framework format. It uses GPU acceleration if possible for all the mentioned operations. — Figure 3. DALI overview and its *usage as a tool for accelerated data loading and preprocessing in DL applications*

Volumetric image operations

Originally, DALI was developed as a solution for images classification and detection workflows. Later, it was extended to cover other data domains, such as audio, video, or volumetric images. For more information about volumetric data processing, see 3D Transforms or Numpy Reader.

DALI supports a wide range of image-processing operators. Some can also be applied to volumetric images. Here are some examples worth mentioning:

Resize
Warp affine
Rotate
Random object bounding box

To showcase some of the mentioned operations, we use a sample from the BraTS19 dataset, consisting of MRI scans labeled for brain tumor segmentation. Figure 4 shows a two-dimensional slice extracted from a brain MRI scan volume, where the darker region represents a region labeled as an abnormality.

A cut of the brain scan with a region highlighted in a different color that represents an abnormality; a tumor in this case. — *Figure 4. A slice from a BraTS19 dataset sample*

Resize operator

Resize upscales or downscales the image to a desired shape by interpolating the input pixels. The upscale or downscale is configurable for each dimension separately, including the selection of the interpolation method.

The left image shows a cut of the brain scan with a region highlighted in a different color that represents an abnormality; a tumor in this case. The right image is a recalled cut, where the height dimension is more aggressively scaled than the width. — Figure 5. Reference slice from BraTS19 dataset sample (left) compared with *resized* *sample (right)*

Warp affine operator

Warp affine applies a geometric transformation by mapping pixel coordinates from source to destination with a linear transformation.

$Out(x,y,z)=In(xsrc,ysrc,zsrc)$

Warp affine can be used to perform multiple transformations (rotation, flip, shear, scale) in one go.

A cut of the brain scan that is warped: it is stretched from left bottom to the right top and slightly rotated to the right. — *Figure 6. A slice of a volume transformed with* WarpAffine

Rotate operator

Rotate allows you to rotate a volume around an arbitrary axis, provided as a vector, and an angle. It can also optionally extend the canvas so that the entire rotated image is contained in it. Figure 7 shows an example of a rotated volume.

A cut of the brain scan that is slightly rotated to the left. — *Figure 7. A slice from a rotated volume*

Random object bounding box operator

Random object bounding box is an operator suited for detection and segmentation tasks. As mentioned earlier, medical datasets tend to be rather small, with target classes (such as abnormalities) occupying a comparatively small area. Furthermore, in many cases the input volume is much larger than the volume expected by the network. If you were to use random cropping windows for training, then the majority would not contain the target. This could cause the training convergence to slow down or bias the network towards false-negative results.

This operator selects pseudo-random crops that can be biased towards sampling a particular label. Connected component analysis is performed on the label map as a pre-step. Then, a connected blob is selected at random, with equal probability. By doing that, the operator avoids overrepresenting larger blobs.

You can also select to restrict the selection to the largest K blobs or specify a minimum blob size. When a particular blob is selected, a random cropping window is generated, within the range containing the given blob. Figure 8 shows this cropping window selection process.

The first image shows three differently colored groups of objects. Then only one, red, is selected, the smaller objects are filtered out. Then only one object is selected and a set of valid cropping windows that includes this object are displayed. — *Figure 8. A visualization of* the **Random object bounding box** *operation on an artificial 2D image with a set of objects belonging to three different classes (each highlighted with different color)*

The gain in learning speed can be significant. On the KITS19 dataset, nnU-Net achieves the same accuracy in 2134 in the test run epochs with the Random object bounding box operator as in 3,222 epochs with random crop.

Typically, the process of finding connected components is slow, but the number of samples in the data set can be small. The operator can be configured to cache the connected component information, so that it’s only calculated during the first epoch of the training.

Accelerate on your own

You can download the latest version of the prebuilt and tested DALI pip packages. The NGC containers for TensorFlow, PyTorch, and MXNet have DALI integrated. You can review the many examples and read the latest release notes for a detailed list of new features and enhancements.

See how DALI can help you accelerate data preprocessing for your deep learning applications. The best place to access is the NVIDIA DALI Documentation, including numerous examples and tutorials. You can also watch our GTC 2021 talk about DALI. DALI is an open-source project, and our code is available on the /NVIDIA/DALI GitHub repo. We welcome your feedback and contributions.

Misc

50% Confidence for all bounding boxes in image?

Post author By
Post date January 19, 2022
No Comments on 50% Confidence for all bounding boxes in image?

I have trained a model using tf1.15, converted it to tflite and compiled it for the edgetpu, which is all working. However, my confidence values for all bounding boxes is 50%. It seems to recognise what is what somewhat well, such as putting many boxes over and around the object it’s trying to detect, but they’re all 50% confidence so it is difficult to get a clean output. I believe the issue was not caused by any conversions, but during training, as tensorboard also shows this.

I have followed [this](https://colab.research.google.com/github/google-coral/tutorials/blob/master/retrain_ssdlite_mobiledet_qat_tf1.ipynb) tutorial for the most part, using mobilenetv2 ssd and my own training data. Initial guess is that my .tfrecord file was made incorrectly, as I did not have .xml files but rather one large .csv.

submitted by /u/BoopJoop01
[visit reddit] [comments]

Misc

Fusing Art and Tech: MORF Gallery CEO Scott Birnbaum on Digital Paintings, NFTs and More

Post author By
Post date January 19, 2022
No Comments on Fusing Art and Tech: MORF Gallery CEO Scott Birnbaum on Digital Paintings, NFTs and More

Browse through MORF Gallery — virtually or at an in-person exhibition — and you’ll find robots that paint, digital dreamscape experiences, and fine art brought to life by visual effects. The gallery showcases cutting-edge, one-of-a-kind artwork from award-winning artists who fuse their creative skills with AI, machine learning, robotics and neuroscience. Scott Birnbaum, CEO and Read article >

The post Fusing Art and Tech: MORF Gallery CEO Scott Birnbaum on Digital Paintings, NFTs and More appeared first on The Official NVIDIA Blog.

Misc

New NVIDIA AI Enterprise Release Lights Up Data Centers

Post author By
Post date January 19, 2022
No Comments on New NVIDIA AI Enterprise Release Lights Up Data Centers

With a new year underway, NVIDIA is helping enterprises worldwide add modern workloads to their mainstream servers using the latest release of the NVIDIA AI Enterprise software suite. NVIDIA AI Enterprise 1.1 is now generally available. Optimized, certified and supported by NVIDIA, the latest version of the software suite brings new updates including production support Read article >

The post New NVIDIA AI Enterprise Release Lights Up Data Centers appeared first on The Official NVIDIA Blog.

Misc

I need a quick answer to a question

Post author By
Post date January 18, 2022
No Comments on I need a quick answer to a question

How much would it take to train a model which consists of about 2000 pictures on my laptop (I am a beginner but I need to train it for a project) Specs: Ryzen 5 3500u 10gb of ram vega 8 gpu

I was going to do it on the cpu, because I think its more powerful (correct me if I am wrong)

submitted by /u/isuok623
[visit reddit] [comments]

Offsites

Introducing StylEx: A New Approach for Visual Explanation of Classifiers

Post author By
Post date January 18, 2022
No Comments on Introducing StylEx: A New Approach for Visual Explanation of Classifiers

Posted by Oran Lang and Inbar Mosseri, Software Engineers, Google Research

Neural networks can perform certain tasks remarkably well, but understanding how they reach their decisions — e.g., identifying which signals in an image cause a model to determine it to be of one class and not another — is often a mystery. Explaining a neural model’s decision process may have high social impact in certain areas, such as analysis of medical images and autonomous driving, where human oversight is critical. These insights can also be helpful in guiding health care providers, revealing model biases, providing support for downstream decision makers, and even aiding scientific discovery.

Previous approaches for visual explanations of classifiers, such as attention maps (e.g., Grad-CAM), highlight which regions in an image affect the classification, but they do not explain what attributes within those regions determine the classification outcome: For example, is it their color? Their shape? Another family of methods provides an explanation by smoothly transforming the image between one class and another (e.g., GANalyze). However, these methods tend to change all attributes at once, thus making it difficult to isolate the individual affecting attributes.

In “Explaining in Style: Training a GAN to explain a classifier in StyleSpace”, presented at ICCV 2021, we propose a new approach for a visual explanation of classifiers. Our approach, StylEx, automatically discovers and visualizes disentangled attributes that affect a classifier. It allows exploring the effect of individual attributes by manipulating those attributes separately (changing one attribute does not affect others). StylEx is applicable to a wide range of domains, including animals, leaves, faces, and retinal images. Our results show that StylEx finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are interpretable by people as measured in user studies.

Explaining a Cat vs. Dog Classifier: StylEx provides the top-K discovered disentangled attributes which explain the classification. Moving each knob manipulates only the corresponding attribute in the image, keeping other attributes of the subject fixed.

For instance, to understand a cat vs. dog classifier on a given image, StylEx can automatically detect disentangled attributes and visualize how manipulating each attribute can affect the classifier probability. The user can then view these attributes and make semantic interpretations for what they represent. For example, in the figure above, one can draw conclusions such as “dogs are more likely to have their mouth open than cats” (attribute #4 in the GIF above), “cats’ pupils are more slit-like” (attribute #5), “cats’ ears do not tend to be folded” (attribute #1), and so on.

The video below provides a short explanation of the method:

How StylEx Works: Training StyleGAN to Explain a Classifier
Given a classifier and an input image, we want to find and visualize the individual attributes that affect its classification. For that, we utilize the StyleGAN2 architecture, which is known to generate high quality images. Our method consists of two phases:

Phase 1: Training StylEx

A recent work showed that StyleGAN2 contains a disentangled latent space called “StyleSpace”, which contains individual semantically meaningful attributes of the images in the training dataset. However, because StyleGAN training is not dependent on the classifier, it may not represent those attributes that are important for the decision of the specific classifier we want to explain. Therefore, we train a StyleGAN-like generator to satisfy the classifier, thus encouraging its StyleSpace to accommodate classifier-specific attributes.

This is achieved by training the StyleGAN generator with two additional components. The first is an encoder, trained together with the GAN with a reconstruction-loss, which forces the generated output image to be visually similar to the input. This allows us to apply the generator on any given input image. However, visual similarity of the image is not enough, as it may not necessarily capture subtle visual details important for a particular classifier (such as medical pathologies). To ensure this, we add a classification-loss to the StyleGAN training, which forces the classifier probability of the generated image to be the same as the classifier probability of the input image. This guarantees that subtle visual details important for the classifier (such as medical pathologies) will be included in the generated image.

Training StyleEx: We jointly train the generator and the encoder. A reconstruction-loss is applied between the generated image and the original image to preserve visual similarity. A classification-loss is applied between the classifier output of the generated image and the classifier output of the original image to ensure the generator captures subtle visual details important for the classification.

Phase 2: Extracting Disentangled Attributes

Once trained, we search the StyleSpace of the trained Generator for attributes that significantly affect the classifier. To do so, we manipulate each StyleSpace coordinate and measure its effect on the classification probability. We seek the top attributes that maximize the change in classification probability for the given image. This provides the top-K image-specific attributes. By repeating this process for a large number of images per class, we can further discover the top-K class-specific attributes, which teaches us what the classifier has learned about the specific class. We call our end-to-end system “StylEx”.

A visual illustration of image-specific attribute extraction: once trained, we search for the StyleSpace coordinates that have the highest effect on the classification probability of a given image.

StylEx is Applicable to a Wide Range of Domains and Classifiers
Our method works on a wide variety of domains and classifiers (binary and multi-class). Below are some examples of class-specific explanations. In all the domains tested, the top attributes detected by our method correspond to coherent semantic notions when interpreted by humans, as verified by human evaluation.

For perceived gender and age classifiers, below are the top four detected attributes per classifier. Our method exemplifies each attribute on multiple images that are automatically selected to best demonstrate that attribute. For each attribute we flicker between the source and attribute-manipulated image. The degree to which manipulating the attribute affects the classifier probability is shown at the top-left corner of each image.

Top-4 automatically detected attributes for a perceived-gender classifier.

Top-4 automatically detected attributes for a perceived-age classifier.

Note that our method explains a classifier, not reality. That is, the method is designed to reveal image attributes that a given classifier has learned to utilize from data; those attributes may not necessarily characterize actual physical differences between class labels (e.g., a younger or older age) in reality. In particular, these detected attributes may reveal biases in the classifier training or dataset, which is another key benefit of our method. It can further be used to improve fairness of neural networks, for example, by augmenting the training dataset with examples that compensate for the biases our method reveals.

Adding the classifier loss into StyleGAN training turns out to be crucial in domains where the classification depends on fine details. For example, a GAN trained on retinal images without a classifier loss will not necessarily generate fine pathological details corresponding to a particular disease. Adding the classification loss causes the GAN to generate these subtle pathologies as an explanation of the classifier. This is exemplified below for a retinal image classifier (DME disease) and a sick/healthy leaf classifier. StylEx is able to discover attributes that are aligned with disease indicators, for instance “hard exudates”, which is a well known marker for retinal DME, and rot for leaf diseases.

Top-4 automatically detected attributes for a DME classifier of retina images.

Top-4 automatically detected attributes for a classifier of sick/healthy leaf images.

Finally, this method is also applicable to multi-class problems, as demonstrated on a 200-way bird species classifier.

Top-4 automatically detected attributes in a 200-way classifier trained on CUB-2011 for (a) the class “brewer blackbird”, and (b) the class “yellow bellied flycatcher”. Indeed we observe that StylEx detects attributes that correspond to attributes in CUB taxonomy.

Broader Impact and Next Steps
Overall, we have introduced a new technique that enables the generation of meaningful explanations for a given classifier on a given image or class. We believe that our technique is a promising step towards detection and mitigation of previously unknown biases in classifiers and/or datasets, in line with Google’s AI Principles. Additionally, our focus on multiple-attribute based explanation is key to providing new insights about previously opaque classification processes and aiding in the process of scientific discovery. Finally, our GitHub repository includes a Colab and model weights for the GANs used in our paper.

Acknowledgements
The research described in this post was done by Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald (as an intern), Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani and Inbar Mosseri. We would like to thank Jenny Huang and Marilyn Zhang for leading the writing process for this blogpost, and Reena Jana, Paul Nicholas, and Johnny Soraker for ethics reviews of our research paper and this post.

Misc

GANcraft: Turning Gamers into 3D Artists

Post author By
Post date January 18, 2022
No Comments on GANcraft: Turning Gamers into 3D Artists

GANcraft: A hybrid unsupervised neural rendering pipeline for voxel worlds GANcraft is a hybrid neural rendering pipeline to represent large and complex scenes using Minecraft.

Scientists at NVIDIA and Cornell University introduced a hybrid unsupervised neural rendering pipeline to represent large and complex scenes efficiently in voxel worlds. Essentially, a 3D artist only needs to build the bare minimum, and the algorithm will do the rest to build a photorealistic world. The researchers applied this hybrid neural rendering pipeline to Minecraft block worlds to generate a far more realistic version of the Minecraft scenery.

Previous works from NVIDIA and the broader research community (pix2pix, pix2pixHD, MUNIT, SPADE) have tackled the problem of image-to-image translation (im2im)—translating an image from one domain to another. At first glance, these methods might seem to offer a simple solution to the task of transforming one world to another—translating one image at a time. However, im2im methods do not preserve viewpoint consistency, as they have no knowledge of the 3D geometry, and each 2D frame is generated independently. As can be seen in the images that follow, the results from these methods produce jitter and abrupt color and texture changes.

MUNIT SPADE wc-vid2vid NSVF-W GANcraft

A side by side comparison of past voxel neural rendering pipelines: MUNIT, SPADE, wc-vid2vid, NSVF-W, and GANcraft. You can see the renderings don't hold up as consistently as the GANcraft methodology; blending and distortion occur.

Enter GANcraft, a new method that directly operates on the 3D input world.

“As the ground truth photorealistic renderings for a user-created block world simply doesn’t exist, we have to train models with indirect supervision,” the researchers explained in the study.

The method works by randomly sampling camera views in the input block world and then imagining what a photorealistic version of that view would look like. This is done with the help of SPADE, prior work from NVIDIA on image-to-image translation, and was the key component in the popular GauGAN demo. GANcraft overcomes the view inconsistency of these generated “pseudo-groundtruths” through the use of a style-conditioning network that can disambiguate the world structure from the rendering style. This enables GANcraft to generate output videos that are view consistent, as well as with different styles as shown in this image!

*Figure 2. GANcraft’s methodology enables view consistency in a variety of different styles*.

While the results of the research are demonstrated in Minecraft, the method works with other 3D block worlds such as voxels. The potential to shorten the amount of time and expertise needed to build high-definition worlds increases the value of this research. It could help game developers, CGI artists, and the animation industry cut down on the time it takes to build these large and impressive worlds.

If you would like a further breakdown of the potential of this technology, Károly Zsolnai-Fehér highlights the research in his YouTube series: Two Minute Papers:

Figure 3. The YouTube series, Two Minute Papers, covers significant developments in AI as they come onto the scene.

GANcraft was implemented in the Imaginaire library. This library is optimized for the training of generative models and generative adversarial networks, with support for multi-GPU, multi-node, and automatic mixed-precision training. Implementations of over 10 different research works produced by NVIDIA, as well as pretrained models have been released. This library will continue to be updated with newer works over time.

If you’d like to dive deeper, grab the code on GitHub from the Imaginare repository, see the overview of the framework or read the detailed research paper.

Stay updated on more exciting research from NVIDIA at NVIDIA Research.

The study authors include Zekun Hao, Arun Mallya, Serge Belongie, and Ming-Yu Liu.