Categories
Misc

NVIDIA Showcases the Latest in Graphics, AI, and Virtual Collaboration at SIGGRAPH

Developers, researchers, graphics professionals, and others from around the world will get a sneak peek at the latest innovations in computer graphics at the SIGGRAPH 2021 virtual conference, taking place August 9-13.

Developers, researchers, graphics professionals, and others from around the world will get a sneak peek at the latest innovations in computer graphics at the SIGGRAPH 2021 virtual conference, taking place August 9-13.

NVIDIA will be presenting the breakthroughs that NVIDIA RTX technology delivers, from real-time ray tracing to AI-enhanced workflows.

Watch the NVIDIA special address on Tuesday, August 10 at 8:00 a.m. PDT, where we will showcase the latest tools and solutions that are driving graphics, AI, and the emergence of shared worlds.

And on Wednesday, August 11, catch the global premiere of “Connecting in the Metaverse: The Making of the GTC Keynote” at 11:00 a.m. PDT. The new documentary highlights the creative minds and groundbreaking technologies behind the NVIDIA GTC 2021 keynote. See how a small team of artists used NVIDIA Omniverse to blur the line between real and rendered.

Explore the Latest from NVIDIA Research

At SIGGRAPH, the NVIDIA Research team will be presenting the following papers:

Don’t miss our Real-Time Live! demo on August 10 at 4:30 p.m. PDT to see how NVIDIA Research creates AI-driven digital avatars.

Dive into Technical Training with NVIDIA Deep Learning Institute

Here’s a preview of some DLI sessions you don’t want to miss:

Omniverse 101: Getting Started with Universal Scene Description for Collaborative 3D Workflows

This free self-paced training provides an introduction to USD. Go through a series of hands-on exercises consisting of training videos accompanied by live scripted examples, and learn about concepts like layer composition, references and variants.

Fundamentals of Ray Tracing Development using NVIDIA Nsight Graphics and NVIDIA Nsight Systems

With NVIDIA RTX and real-time ray-tracing APIs like DXR and Vulkan Ray Tracing, see how it’s now easier than ever to create stunning visuals at interactive frame rates. This instructor-led workshop will show audiences how to utilize NVIDIA Nsight graphics and NVIDIA Nsight Systems to profile and optimize 3D applications that are using ray tracing. Space is limited.

Graphics and Omniverse Teaching Kit

Designed for college and university educators looking to bring graphics and NVIDIA Omniverse into the classroom, this teaching kit includes downloadable teaching materials and online courses that provide the foundation for understanding and building hands-on expertise in graphics and Omniverse.

Discover the Latest Tools and Solutions in Our Virtual Demos

We’ll be showcasing how NVIDIA technology is transforming workflows in some of our exciting demos, including:

  • Factory of the Future: Explore the next era of manufacturing with this demo, which showcases BMW Group’s factory of the future – designed, simulated, operated, and maintained entirely in NVIDIA Omniverse.
  • Multiple Artists, One Server: See how teams can accelerate visual effects production with the NVIDIA EGX Platform, which enables multiple artists to work together on a powerful, secure server from anywhere.
  • 3D Photogrammetry on an RTX Mobile Workstation: Watch how NVIDIA RTX-powered mobile workstations help drive the process of 3D scanning using photogrammetry, whether in a studio or in a remote location.
  • Interactive volumes with NanoVDB in Blender Cycles: Learn how NanoVDB makes volume rendering more GPU memory efficient, meaning larger and more complex scenes can be interactively adjusted and rendered with NVIDIA RTX-accelerated ray tracing and AI denoising.

Enter for a Chance to Win Some Gems

Attendees can win a limited-edition hard copy of Ray Tracing Gems II, the follow up to 2019’s Ray Tracing Gems.

Ray Tracing Gems II brings the community of rendering experts back together to share their knowledge. The book covers everything in ray tracing and rendering, from basic concepts geared toward beginners to full ray tracing deployment in shipping AAA games.

Learn more about the sweepstakes and enter for a chance to win.

Join NVIDIA at SIGGRAPH and learn more about the latest tools and technologies driving real-time graphics, AI-enhanced workflows, and virtual collaboration.

Categories
Misc

GFN Thursday Brings ‘Evil Genius 2: World Domination,’ ‘Escape From Naraka’ with RTX, and More This Week on GeForce NOW

This GFN Thursday shines a spotlight on the latest games joining the collection of over 1,000 titles in the GeForce NOW library from the many publishers that have opted in to stream their games on our open cloud-gaming service. Members can look forward to 14 games — including Evil Genius 2: World Domination from Rebellion Read article >

The post GFN Thursday Brings ‘Evil Genius 2: World Domination,’ ‘Escape From Naraka’ with RTX, and More This Week on GeForce NOW appeared first on The Official NVIDIA Blog.

Categories
Misc

RTX for Indies: Stunning Ray-Traced Lighting Achieved with RTXGI in Action-Platformer Escape from Naraka

Developed by XeloGames, an indie studio of just three, and published by Headup Games, Escape from Naraka achieves eye-catching ray-traced lighting using RTXGI and significant performance boosts from DLSS.

Developed by XeloGames, an indie studio of just three, and published by Headup Games, Escape from Naraka achieves eye-catching ray-traced lighting using RTX Global Illumination (RTXGI) and significant performance boosts from Deep Learning Super Sampling (DLSS). NVIDIA had the opportunity to speak with the XeloGames team about their experience using NVIDIA’s SDKs while developing their debut title.  

“We believe that, sooner or later, everyone will have ray tracing,” XeloGames said, discussing their motivation to use RTXGI, “so it’s really good for us to start earlier, especially in Indonesia.”

Starting early, in this case, is an understatement for XeloGames. Escape from Naraka is actually the first-ever ray tracing title from Indonesia; the team used ray-traced reflections, shadows, and global illumination to paint a dramatic labyrinth for the player to explore. 

Such a feat, executed by such a small studio, speaks to the usefulness of RTXGI as a tool for development. Escape from Naraka was made in Unreal Engine 4, using NVIDIA’s NvRTX branch to bring ray tracing and DLSS into production. Once ray-traced global illumination was integrated into the engine, XeloGames reported benefits they immediately experienced:

“RTXGI really helps with how quick you can set up a light in a scene. Instead of the old ways where you have to manually adjust every light, you can put in a Dynamic Diffuse Global Illumination (DDGI) volume and immediately see a difference.”

Rapid in-engine updates expedited the task of lighting design in Escape from Naraka, alongside the ability to make any object emissive for “cost-free performance lighting”, XeloGames added. Of course, implementing RTXGI in their title came with its challenges as well. For Escape from Naraka specifically, a unique obstacle presented itself; the abundance of rocks in their level design often made it challenging to find opportunities to make full use of ray-traced lighting. “Rocks are not really that great at bouncing lights around”, XeloGames developers remarked. 

RTXGI is undoubtedly a powerful tool to have in a game developers toolkit, but the mileage that can be achieved with its features can vary case-by-case. An important step before using ray traced global illumination is deciding if it’s features are a right fit for your game. 

Regardless of the rock conflict (mitigated by making certain textures emissive to brighten darker areas) and a couple of bugs that had to be squashed along the way, XeloGames’ three person team was able to achieve a beautiful integration of RTXGI in Escape from Naraka. Check out the Escape from Naraka Official RTX Reveal Trailer for a look at how RTX Global Illumination was able to enhance the game’s visual appeal: 

“It definitely made scenes look more natural,” said XeloGames developer on the enhancements RTXGI brought to their game, “lights bounce around more naturally instead of just directly.” 

The results of global illumination can speak for themselves, pairing excellently with ray-traced reflection and shadows for stunning results.

RTXGI is not the only NVIDIA feature XeloGames packed into their newest release. Deep Learning Super Sampling (DLSS) is implemented as well to bring an AI-powered frame rate boost.

“Adding NVIDIA DLSS to the game was fast and easy with the UE4 plugin, providing our players maximum performance as they take on all the challenges Escape From Naraka has to offer.” 

XeloGames reported a swift implementation of DLSS with NvRTX, emphasizing the importance of using DLSS as a frame booster as well as an enabler to turn ray tracing on with the performance headroom it provides. In concert, RTXGI and DLSS empower a rich and fully-immersive experience in Escape from Naraka.  

Escape from Naraka is available now on Steam.

Check out XeloGames at their official website

Learn more and download RTXGI here

Learn more and download DLSS here

Explore and use NVIDIA’s NvRTX branch for Unreal Engine 4 here.

Categories
Misc

Getting attribute error in windows but not in Linux

AttributeError: ‘google.protobuf.pyext._message.RepeatedCompositeCo’ object has no attribute ‘_values’

Protobuf version=3.15 Mediapiper version=0.8.6 Tensorflow version=2.5.0 I have tried installing all the version in virtual environment but error won’t go away.

submitted by /u/singh_prateek
[visit reddit] [comments]

Categories
Misc

Free Ray Tracing Gems II Chapter Covers Ray Tracing in Remedy’s Control

This chapter, written by Juha Sjöholm, Paula Jukarainen, and Tatu Aalto, presents how all ray tracing based effects were implemented in Remedy Entertainment’s Control.

Next week, Ray Tracing Gems II will finally be available in its entirety as a free download, or for purchase as a physical release from Apress or Amazon. Since the start of July, we’ve been providing early access to a new chapter every week. Today’s chapter, by Juha Sjöholm, Paula Jukarainen, and Tatu Aalto, presents how all ray tracing based effects were implemented in Remedy Entertainment’s Control. This includes opaque and transparent reflections, near field indirect diffuse illumination, contact shadows, and the denoisers tailored for these effects.

You can download the full chapter free here

You can also learn more about Game of the Year Winner Control here

We’ve collaborated with our partners to make limited edition versions of the book, including custom covers that highlight real-time ray tracing in Fortnite, Control, and Watch Dogs: Legion.

To win a limited edition print copy of Ray Tracing Gems II, enter the giveaway contest here: https://developer.nvidia.com/ray-tracing-gems-ii

Categories
Misc

GPU Accelerating Node.js JavaScript for Visualization and Beyond

NVIDIA GTC21 had numerous great and engaging contents, especially around RAPIDS, so it would be easy to miss our debut presentation “Using RAPIDS to Accelerate Node.js JavaScript for Visualization and Beyond.” Yep – we are bringing the power of GPU accelerated data science to the JavaScript Node.js community with the Node-RAPIDS project. Node-RAPIDS is an … Continued

NVIDIA GTC21 had numerous great and engaging contents, especially around RAPIDS, so it would be easy to miss our debut presentation “Using RAPIDS to Accelerate Node.js JavaScript for Visualization and Beyond.” Yep – we are bringing the power of GPU accelerated data science to the JavaScript Node.js community with the Node-RAPIDS project.

Node-RAPIDS is an open-source, technical preview of modular RAPIDS’ library bindings in Node.js, as well as, complementary methods for enabling high-performance, browser-based visualizations.

Venn diagram showing intersections of Performance, Reach, and Usability, highlighting the intersection of all three.

What’s the problem with web viz?

Around a decade ago, the mini-renaissance around web-based data visualization showed the benefits of highly interactive, easy to share, and use tools such as D3. While not as performant as C/C++ or Python frameworks, their popularity took off because of JavaScript’s accessibility. No surprise that it often ranks as the most popular developer language, preceding Python or Java, and there is now a full catalog of visualization and data tools.

Yet, this large JavaScript community of developers is impeded by the lack of first-class and accelerated data tools in their preferred language. The analysis is most effective when it is paired as close as possible to its data source, science, and visualizations. To fully access GPU hardware with JavaScript, (beyond webGL limitations and hacks) requires being a polyglot to set up complicated middleware plumbing or use non-js frameworks like Plotly Dash. As a result, data engineers, data scientists, visualization specialists, and front-end developers are often siloed, even within organizations. This is detrimental because data visualization is the ideal medium of communication between these groups.

As for the RAPIDS Viz team, ever since our first proof of concept, we’ve wanted to build tools that can more seamlessly interact with hundreds of millions of data points in real-time through our browsers – and we finally have a way.

Why Node.js

If you are not familiar with Node.js, it is an open-source, cross-platform runtime environment based on C/C++ that executes JavaScript code outside of a web browser. Over 1 Million Node.js downloads occur per day. Node Package Manager (NPM) is the default JavaScript package manager and Microsoft owns it. Node.js is used in the backend of online marketplaces like eBay, AliExpress, and is used by high-traffic websites, such as Netflix, PayPal, and Groupon. Clearly, it is a powerful framework.

 XKCD
Figure 1: XKCD – Node.js is a Universal Connector.

Node.js is the connector that gives us JavaScript with direct access to hardware, which results in a streamlined API and the ability to use NVIDIA CUDA ⚡. By creating node-rapids bindings, we enable a massive developer community with the ability to use GPU acceleration without the need to learn a new language or work in a new environment. We also give the same community access to a high-performance data science platform: RAPIDS!

Here is a snippet of node-RAPIDS in action based on our basic notebook, which shows a 6x speedup for a small regex example: 

Node-RAPIDS: designed as building blocks

 Notional diagram of node-rapids module structure, showing components for memory management (cuda, rmm), data science  (cudf, cuml, cugraph, cuspatial), graphics, streaming, and front end.
Figure 2: Node-RAPIDS module overview.

Similar to node projects, Node-RAPIDS is designed to be modular. Our aim is not to build turnkey web applications, but to create an inventory of functionality that enables or accelerates a wide variety of use cases and pipelines. The preceding is an overview of the current and planned Node-RAPIDS modules grouped in general categories. A Node-RAPIDS application can use as many or as few of the modules as needed.

To make starting out less daunting, we are also building a catalog of demos that can serve as templates for generalized applications. As we develop more bindings, we will create more demos to showcase their capabilities.

Notional architecture diagram showing a GPU-accelerated node.js application for crossfiltering, made possible using node-rapids.
Figure 3: Example of a Cross Filter App.

The preceding is an idealized stack of a geospatial cross filter dashboard application using RAPIDS cuDF and RAPIDS cuSpatial libraries. We have a simple demo using Deck.gl that you can preview with our video and explore demo code on Github. 

Figure 4: Example of Streaming ETL Process.

The last example preceding is a server-side only ETL pipeline without any visualization. We have an example of a straightforward ETL process using cuDF bindings and the nteract notebook desktop application, which you can preview with our video and nteract with (get it) on our Notebook.

What is next?

While we have been thinking about this project for a while, we are just getting started in development. RAPIDS is an incredible framework, and we want to bring it to more people and more applications – RAPIDS everywhere as we say.

Near-term next steps:

  • Some short-term next steps are to continue building core RAPIDS binding features, which you can check out on our current binding coverage table.
  • If the idea of GPU accelerated SQL queries straight from your web app sounds interesting (it does to us), we hope to get started on some blazingSQL bindings soon too.
  • And most noteworthy, we plan to start creating and publishing modular docker containers, which will dramatically simplify the current from-source tech preview installation process.

As always, we need community engagement to help guide us. If you have feature requests, questions, or use cases, you can reach out to us!

This project has numerous potentials. It can accelerate a wide variety of Node.js applications, as well as bring first-class, high-performance data science and visualization tools to a huge community. We hope you join us at the beginning of this exciting project.

Resources:

Categories
Misc

Developing a Question Answering Application Quickly Using NVIDIA Riva

There is a high chance that you have asked your smart speaker a question like, “How tall is Mount Everest?” If you did, it probably said, “Mount Everest is 29,032 feet above sea level.” Have you ever wondered how it found an answer for you? Question answering (QA) is loosely defined as a system consisting … Continued

There is a high chance that you have asked your smart speaker a question like, “How tall is Mount Everest?” If you did, it probably said, “Mount Everest is 29,032 feet above sea level.” Have you ever wondered how it found an answer for you?

Question answering (QA) is loosely defined as a system consisting of information retrieval (IR) and natural language processing (NLP), which is concerned with answering questions posed by humans in a natural language. If you are not familiar with information retrieval, it is a technique to obtain relevant information to a query, from a pool of resources, webpages, or documents in the database, for example. The easiest way to understand the concept is the search engine that you use daily. 

You then need an NLP system to find an answer within the IR system that is relevant to the query. Although I just listed what you need for building a QA system, it is not a trivial task to build IR and NLP from scratch. Here’s how NVIDIA Riva makes it easy to develop a QA system.

Riva overview

NVIDIA Riva is an accelerated SDK for building multimodal conversational AI services that use an end-to-end deep learning pipeline. The Riva framework includes optimized services for speech, vision, and natural language understanding (NLU) tasks. In addition to providing several pretrained models for the entire pipeline of your conversational AI service, Riva is also architected for deployment at scale. In this post, I look closely into the QA function of Riva and how you can create your own QA application with it.

Riva QA function

To understand how the Riva QA function works, start with Bidirectional Encoder Representations from Transformers (BERT). It’s a transformer-based, NLP, pretraining method developed by Google in 2018, and it completely changed the field of NLP. BERT understands the contextual representation of a given word in a text. It is pretrained on a large corpus of data, including Wikipedia. 

With the pretrained BERT, a strong NLP engine, you can further fine-tune it to perform QA with many question-answer pairs like those in the Stanford Question Answering Dataset (SQuAD). The model can now find an answer for a question in natural language from a given context: sentences or paragraphs. Figure 1 shows an example of QA, where it highlights the word “gravity” as an answer to the query, “What causes precipitation to fall?”. In this example, the paragraph is the context and the successfully fine-tuned QA model returns the word “gravity” as an answer.

A paragraph describing the meteorological explanation of precipitation. It has three pairs of questions and answers at the bottom.
Figure 1. Question-answer pairs for a sample passage in the SQuAD dataset.
Source: SQuAD: 100,000+ Questions for Machine Comprehension of Text.

Create a QA system with Riva

Teams of engineers and researchers at NVIDIA deliver a quality QA function that you can use right out-of-the-box with Riva. The Riva NLP service provides a set of high-level API actions that include QA, NaturalQuery. The Wikipedia API action allows you to fetch articles posted on Wikipedia, an online encyclopedia, with a query in natural language. That’s the information retrieval system that I discussed earlier. Combining the Wikipedia API action and Riva QA function, you can create a simple QA system with a few lines of Python code. 

Start by installing the Wikipedia API for Python. Next, import the Riva NLP service API and gRPC, the underlying communication framework for Riva.

!pip install wikipedia
import wikipedia as wiki
import grpc
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv

Now, create an input query. Use the Wikipedia API action to fetch the relevant articles and define the number of them to fetch, defined as max_articles_combine. Ask a question, “What is speech recognition?” You then print out the titles of the articles returned from the search. Finally, you add the summaries of each article into a variable: combined_summary.

input_query = "What is speech recognition?"
wiki_articles = wiki.search(input_query)
max_articles_combine = 3
combined_summary = ""
if len(wiki_articles) == 0:
    print("ERROR: Could not find any matching results in Wikipedia.")
else:
    for article in wiki_articles[:min(len(wiki_articles), max_articles_combine)]:
        print(f"Getting summary for: {article}")
        combined_summary += "n" + wiki.summary(article)
The figure shows the printed output of the Python code run, three articles related to speech recognition.
Figure 2. Titles of articles fetched by Wikipedia API action.

Next, open a gRPC channel that points to the location where the Riva server is running. Because you are running the Riva server locally, it is ‘localhost:50051‘. Then, instantiate NaturalQueryRequest, and send a request to the Riva server, passing both the query and the context. Finally, print the response, returned from the Riva server.

channel = grpc.insecure_channel('localhost:50051')
riva_nlp = rnlp_srv.RivaLanguageUnderstandingStub(channel)req = rnlp.NaturalQueryRequest()
req.query = input_query
req.context = combined_summary
resp = riva_nlp.NaturalQuery(req)

print(f"Query: {input_query}")
print(f"Answer: {resp.results[0].answer}")
The output of the Python code run, the query and answer, generated by the Riva QA function.
Figure 3. Example query and answer. 

Summary

With Riva QA and the Wikipedia API action, you just created a simple QA application. If there’s an article in Wikipedia that is relevant to your query, you can theoretically find answers. Imagine that you have a database full of articles relevant to your domain, company, industry, or anything of interest. You can create a QA service that can find answers to the questions specific to your field of interest. Obviously, you would need an IR system that would fetch relevant articles from your database, like the Wikipedia API action used in this post. When you have the IR system in your pipeline, Riva can help you find an answer for you. We look forward to the cool applications that you’ll create with Riva. .

Categories
Misc

An Exclusive Invitation: Peek Behind the Omniverse Curtain at the Inaugural Omniverse User Group

Join the first NVIDIA Omniverse User Group, an exclusive event hosted by the lead engineers, designers, and artists of Omniverse on August 12, during the virtual SIGGRAPH conference.

Join the first NVIDIA Omniverse User Group, an exclusive event hosted by the lead engineers, designers, and artists of Omniverse on August 12, during the virtual SIGGRAPH conference.

The Omniverse User Group inaugural event is open to all developers, researchers, creators, students, professionals, and hobbyists of all levels, whether current Omniverse power users or curious explorers. The two-hour event will feature a look into the Omniverse roadmap, and provide sneak peeks of never-before-seen technologies and experiments.

Those who attend the Omniverse User Group will:

  • Hear the vision and future of Omniverse from Rev Lebaredian, VP of Omniverse & Simulation Technology, and Richard Kerris, VP of Omniverse Developer Platform
  • Learn how you can build on and extend the Omniverse ecosystem
  • See the unveiling of “Create With Marbles: Marvelous Machines” contest submissions and winners
  • Attend “Meet the Expert” breakout sessions and speak with Omniverse engineering leads about specific platform applications and features
Image courtesy of Antonio Covelo (@ant_vfx on Twitter), one of the participants of the first Omniverse “Create With Marbles” contest

Omniverse User Group Event Details
When: Thursday, August 12 from 5:00 pm – 6:30 pm PDT/8:00 pm – 9:30 pm EDT
Where: Virtual Event via Zoom

Register now to join this exclusive event.

Mark Your Calendars for NVIDIA at SIGGRAPH

Artists and developers can explore the latest news about NVIDIA Omniverse at SIGGRAPH. Watch the NVIDIA special address on Tuesday, August 10, at 8:00 am PDT to learn about the latest tools and solutions that are driving graphics, AI, and the emergence of shared worlds. The address will be presented by Richard Kerris, Vice President of Omniverse, and Sanja Fidler, Senior Director of AI Research at NVIDIA.

And tune in to the global premiere of “Connecting in the Metaverse: The Making of the GTC Keynote.” The new documentary premieres on August 11th at 9:00 am PDT, highlighting the creative minds and groundbreaking technologies behind the making of the NVIDIA GTC 2021 keynote. See how a small team of artists used NVIDIA Omniverse to blur the line between real and rendered.

Join NVIDIA at SIGGRAPH and learn more about the latest tools and technologies driving real-time graphics, AI-enhanced workflows and virtual collaboration.

For additional support, check out the developer forum and join the Omniverse Discord server to chat with the community.

Categories
Offsites

Mapping Africa’s Buildings with Satellite Imagery

An accurate record of building footprints is important for a range of applications, from population estimation and urban planning to humanitarian response and environmental science. After a disaster, such as a flood or an earthquake, authorities need to estimate how many households have been affected. Ideally there would be up-to-date census information for this, but in practice such records may be out of date or unavailable. Instead, data on the locations and density of buildings can be a valuable alternative source of information.

A good way to collect such data is through satellite imagery, which can map the distribution of buildings across the world, particularly in areas that are isolated or difficult to access. However, detecting buildings with computer vision methods in some environments can be a challenging task. Because satellite imaging involves photographing the earth from several hundred kilometres above the ground, even at high resolution (30–50 cm per pixel), a small building or tent shelter occupies only a few pixels. The task is even more difficult for informal settlements, or rural areas where buildings constructed with natural materials can visually blend into the surroundings. There are also many types of natural and artificial features that can be easily confused with buildings in overhead imagery.

Objects that can confuse computer vision models for building identification (clockwise from top left) pools, rocks, enclosure walls and shipping containers.

In “Continental-Scale Building Detection from High-Resolution Satellite Imagery”, we address these challenges, using new methods for detecting buildings that work in rural and urban settings across different terrains, such as savannah, desert, and forest, as well as informal settlements and refugee facilities. We use this building detection model to create the Open Buildings dataset, a new open-access data resource containing the locations and footprints of 516 million buildings with coverage across most of the African continent. The dataset will support several practical, scientific and humanitarian applications, ranging from disaster response or population mapping to planning services such as new medical facilities or studying human impact on the natural environment.

Model Development
We built a training dataset for the building detection model by manually labelling 1.75 million buildings in 100k images. The figure below shows some examples of how we labelled images in the training data, taking into account confounding characteristics of different areas across the African continent. In rural areas, for example, it was necessary to identify different types of dwelling places and to disambiguate them from natural features, while in urban areas we needed to develop labelling policies for dense and contiguous structures.

(1) Example of a compound containing both dwelling places as well as smaller outbuildings such as grain stores. (2) Example of a round, thatched-roof structure that can be difficult for a model to distinguish from trees, and where it is necessary to use cues from pathways, clearings and shadows to disambiguate. (3) Example of several contiguous buildings for which the boundaries cannot be easily distinguished.

We trained the model to detect buildings in a bottom-up way, first by classifying each pixel as building or non-building, and then grouping these pixels together into individual instances. The detection pipeline was based on the U-Net model, which is commonly used in satellite image analysis. One advantage of U-Net is that it is a relatively compact architecture, and so can be applied to large quantities of imaging data without a heavy compute burden. This is critical, because the final task of applying this to continental-scale satellite imagery means running the model on many billions of image tiles.

Example of segmenting buildings in satellite imagery. Left: Source image; Center: Semantic segmentation, with each pixel assigned a confidence score that it is a building vs. non-building; Right: Instance segmentation, obtained by thresholding and grouping together connected components.

Initial experiments with the basic model had low precision and recall, for example due to the variety of natural and artificial features with building-like appearance. We found a number of methods that improved performance. One was the use of mixup as a regularisation method, where random training images are blended together by taking a weighted average. Though mixup was originally proposed for image classification, we modified it to be used for semantic segmentation. Regularisation is important in general for this building segmentation task, because even with 100k training images, the training data do not capture the full variation of terrain, atmospheric and lighting conditions that the model is presented with at test time, and hence, there is a tendency to overfit. This is mitigated by mixup as well as random augmentation of training images.

Another method that we found to be effective was the use of unsupervised self-training. We prepared a set of 100 million satellite images from across Africa, and filtered these to a subset of 8.7 million images that mostly contained buildings. This dataset was used for self-training using the Noisy Student method, in which the output of the best building detection model from the previous stage is used as a ‘teacher’ to then train a ‘student’ model that makes similar predictions from augmented images. In practice, we found that this reduced false positives and sharpened the detection output. The student model gave higher confidence to buildings and lower confidence to background.

Difference in model output between the student and teacher models for a typical image. In panel (d), red areas are those that the student model finds more likely to be buildings than the teacher model, and blue areas more likely to be background.

One problem that we faced initially was that our model had a tendency to create “blobby” detections, without clearly delineated edges and with a tendency for neighbouring buildings to be merged together. To address this, we applied another idea from the original U-Net paper, which is to use distance weighting to adapt the loss function to emphasise the importance of making correct predictions near boundaries. During training, distance weighting places greater emphasis at the edges by adding weight to the loss — particularly where there are instances that nearly touch. For building detection, this encourages the model to correctly identify the gaps in between buildings, which is important so that many close structures are not merged together. We found that the original U-Net distance weighting formulation was helpful but slow to compute. So, we developed an alternative based on Gaussian convolution of edges, which was both faster and more effective.

Distance weighting schemes to emphasise nearby edges: U-Net (left) and Gaussian convolution of edges (right).

Our technical report has more details on each of these methods.

Results
We evaluated the performance of the model on several different regions across the continent, in different categories: urban, rural, and medium-density. In addition, with the goal of preparing for potential humanitarian applications, we tested the model on regions with displaced persons and refugee settlements. Precision and recall did vary between regions, so achieving consistent performance across the continent is an ongoing challenge.

Precision-recall curves, measured at 0.5 intersection-over-union threshold.

When visually inspecting the detections for low-scoring regions, we noted various causes. In rural areas, label errors were problematic. For example, single buildings within a mostly-empty area can be difficult for labellers to spot. In urban areas, the model had a tendency to split large buildings into separate instances. The model also underperformed in desert terrain, where buildings were hard to distinguish against the background.

We carried out an ablation study to understand which methods contributed most to the final performance, measured in mean average precision (mAP). Distance weighting, mixup and the use of ImageNet pre-training were the biggest factors for the performance of the supervised learning baseline. The ablated models that did not use these methods had a mAP difference of -0.33, -0.12 and -0.07 respectively. Unsupervised self-training gave a further significant boost of +0.06 mAP.

Ablation study of training methods. The first row shows the mAP performance of the best model combined with self-training, and the second row shows the best model with supervised learning only (the baseline). By disabling each training optimization from the baseline in turn, we observe the impact on mAP test performance. Distance weighting has the most significant effect.

Generating the Open Buildings Dataset
To create the final dataset, we applied our best building detection model to satellite imagery across the African continent (8.6 billion image tiles covering 19.4 million km2, 64% of the continent), which resulted in the detection of 516M distinct structures.

Each building’s outline was simplified as a polygon and associated with a Plus Code, which is a geographic identifier made up of numbers and letters, akin to a street address, and useful for identifying buildings in areas that don’t have formal addressing systems. We also include confidence scores and guidance on suggested thresholds to achieve particular precision levels.

The sizes of the structures vary as shown below, tending towards small footprints. The inclusion of small structures is important, for example, to support analyses of informal settlements or refugee facilities.

Distribution of building footprint sizes.

The data is freely available and we look forward to hearing how it is used. In the future, we may add new features and regions, depending on usage and feedback.

Acknowledgements
This work is part of our AI for Social Good efforts and was led by Google Research, Ghana. Thanks to the co-authors of this work: Wojciech Sirko, Sergii Kashubin, Marvin Ritter, Abigail Annkah, Yasser Salah Edine Bouchareb, Yann Dauphin, Daniel Keysers, Maxim Neumann and Moustapha Cisse. We are grateful to Abdoulaye Diack, Sean Askay, Ruth Alcantara and Francisco Moneo for help with coordination. Rob Litzke, Brian Shucker, Yan Mayster and Michelina Pallone provided valuable assistance with geo infrastructure.

Categories
Misc

An AI a Day Keeps Dr.Fill at Play: Matt Ginsberg on Building GPU-Powered Crossword Solver

9 Down, 14 letters: Someone skilled in creating and solving crossword puzzles. This April, the fastest “cruciverbalist” at the ​​American Crossword Puzzle Tournament was Dr.Fill, a crossword puzzle-solving AI program created by Matt Ginsberg. Dr.Fill perfectly solved the championship puzzle in 49 seconds. The first human champion, Tyler Hinman, filled the 15×15 crossword in exactly Read article >

The post An AI a Day Keeps Dr.Fill at Play: Matt Ginsberg on Building GPU-Powered Crossword Solver appeared first on The Official NVIDIA Blog.