Categories
Offsites

Revisiting Mask Transformer from a Clustering Perspective

Panoptic segmentation is a computer vision problem that serves as a core task for many real-world applications. Due to its complexity, previous work often divides panoptic segmentation into semantic segmentation (assigning semantic labels, such as “person” and “sky”, to every pixel in an image) and instance segmentation (identifying and segmenting only countable objects, such as “pedestrians” and “cars”, in an image), and further divides it into several sub-tasks. Each sub-task is processed individually, and extra modules are applied to merge the results from each sub-task stage. This process is not only complex, but it also introduces many hand-designed priors when processing sub-tasks and when combining the results from different sub-task stages.

Recently, inspired by Transformer and DETR, an end-to-end solution for panoptic segmentation with mask transformers (an extension of the Transformer architecture that is used to generate segmentation masks) was proposed in MaX-DeepLab. This solution adopts a pixel path (consisting of either convolutional neural networks or vision transformers) to extract pixel features, a memory path (consisting of transformer decoder modules) to extract memory features, and a dual-path transformer for interaction between pixel features and memory features. However, the dual-path transformer, which utilizes cross-attention, was originally designed for language tasks, where the input sequence consists of dozens or hundreds of words. Nonetheless, when it comes to vision tasks, specifically segmentation problems, the input sequence consists of tens of thousands of pixels, which not only indicates a much larger magnitude of input scale, but also represents a lower-level embedding compared to language words.

In “CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation”, presented at CVPR 2022, and “kMaX-DeepLab: k-means Mask Transformer”, to be presented at ECCV 2022, we propose to reinterpret and redesign cross-attention from a clustering perspective (i.e., grouping pixels with the same semantic labels together), which better adapts to vision tasks. CMT-DeepLab is built upon the previous state-of-the-art method, MaX-DeepLab, and employs a pixel clustering approach to perform cross-attention, leading to a more dense and plausible attention map. kMaX-DeepLab further redesigns cross-attention to be more like a k-means clustering algorithm, with a simple change on the activation function. We demonstrate that CMT-DeepLab achieves significant performance improvements, while kMaX-DeepLab not only simplifies the modification but also further pushes the state-of-the-art by a large margin, without test-time augmentation. We are also excited to announce the open-source release of kMaX-DeepLab, our best performing segmentation model, in the DeepLab2 library.

Overview
Instead of directly applying cross-attention to vision tasks without modifications, we propose to reinterpret it from a clustering perspective. Specifically, we note that the mask Transformer object query can be considered cluster centers (which aim to group pixels with the same semantic labels), and the process of cross-attention is similar to the k-means clustering algorithm, which adopts an iterative process of (1) assigning pixels to cluster centers, where multiple pixels can be assigned to a single cluster center, and some cluster centers may have no assigned pixels, and (2) updating the cluster centers by averaging pixels assigned to the same cluster center, the cluster centers will not be updated if no pixel is assigned to them).

In CMT-DeepLab and kMaX-DeepLab, we reformulate the cross-attention from the clustering perspective, which consists of iterative cluster-assignment and cluster-update steps.

Given the popularity of the k-means clustering algorithm, in CMT-DeepLab we redesign cross-attention so that the spatial-wise softmax operation (i.e., the softmax operation that is applied along the image spatial resolution) that in effect assigns cluster centers to pixels is instead applied along the cluster centers. In kMaX-DeepLab, we further simplify the spatial-wise softmax to cluster-wise argmax (i.e., applying the argmax operation along the cluster centers). We note that the argmax operation is the same as the hard assignment (i.e., a pixel is assigned to only one cluster) used in the k-means clustering algorithm.

Reformulating the cross-attention of the mask transformer from the clustering perspective significantly improves the segmentation performance and simplifies the complex mask transformer pipeline to be more interpretable. First, pixel features are extracted from the input image with an encoder-decoder structure. Then, a set of cluster centers are used to group pixels, which are further updated based on the clustering assignments. Finally, the clustering assignment and update steps are iteratively performed, with the last assignment directly serving as segmentation predictions.

To convert a typical mask Transformer decoder (consisting of cross-attention, multi-head self-attention, and a feed-forward network) into our proposed k-means cross-attention, we simply replace the spatial-wise softmax with cluster-wise argmax.

The meta architecture of our proposed kMaX-DeepLab consists of three components: pixel encoder, enhanced pixel decoder, and kMaX decoder. The pixel encoder is any network backbone, used to extract image features. The enhanced pixel decoder includes transformer encoders to enhance the pixel features, and upsampling layers to generate higher resolution features. The series of kMaX decoders transform cluster centers into (1) mask embedding vectors, which multiply with the pixel features to generate the predicted masks, and (2) class predictions for each mask.

The meta architecture of kMaX-DeepLab.

Results
We evaluate the CMT-DeepLab and kMaX-DeepLab using the panoptic quality (PQ) metric on two of the most challenging panoptic segmentation datasets, COCO and Cityscapes, against MaX-DeepLab and other state-of-the-art methods. CMT-DeepLab achieves significant performance improvement, while kMaX-DeepLab not only simplifies the modification but also further pushes the state-of-the-art by a large margin, with 58.0% PQ on COCO val set, and 68.4% PQ, 44.0% mask Average Precision (mask AP), 83.5% mean Intersection-over-Union (mIoU) on Cityscapes val set, without test-time augmentation or using an external dataset.

Method PQ
MaX-DeepLab 51.1% (-6.9%)
MaskFormer 52.7% (-5.3%)
K-Net 54.6% (-3.4%)
CMT-DeepLab 55.3% (-2.7%)
kMaX-DeepLab 58.0%
Comparison on COCO val set.
Method PQ APmask mIoU
Panoptic-DeepLab 63.0% (-5.4%) 35.3% (-8.7%) 80.5% (-3.0%)
Axial-DeepLab 64.4% (-4.0%) 36.7% (-7.3%) 80.6% (-2.9%)
SWideRNet 66.4% (-2.0%) 40.1% (-3.9%) 82.2% (-1.3%)
kMaX-DeepLab 68.4% 44.0% 83.5%
Comparison on Cityscapes val set.

Designed from a clustering perspective, kMaX-DeepLab not only has a higher performance but also a more plausible visualization of the attention map to understand its working mechanism. In the example below, kMaX-DeepLab iteratively performs clustering assignments and updates, which gradually improves mask quality.

kMaX-DeepLab’s attention map can be directly visualized as a panoptic segmentation, which gives better plausibility for the model working mechanism (image credit: coco_url, and license).

Conclusions
We have demonstrated a way to better design mask transformers for vision tasks. With simple modifications, CMT-DeepLab and kMaX-DeepLab reformulate cross-attention to be more like a clustering algorithm. As a result, the proposed models achieve state-of-the-art performance on the challenging COCO and Cityscapes datasets. We hope that the open-source release of kMaX-DeepLab in the DeepLab2 library will facilitate future research on designing vision-specific transformer architectures.

Acknowledgements
We are thankful to the valuable discussion and support from Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Florian Schroff, Hartwig Adam, and Alan Yuille.

Categories
Misc

Just Released: Modulus v22.07

Accelerate your AI-based simulations using NVIDIA Modulus. The 22.07 release brings advancements with weather modeling, novel network architectures, geometry modeling, performance, and more.

Accelerate your AI-based simulations using NVIDIA Modulus. The 22.07 release brings advancements with weather modeling, novel network architectures, geometry modeling, and more—plus performance improvements.

Categories
Misc

Sequences That Stun: Visual Effects Artist Surfaced Studio Arrives ‘In the NVIDIA Studio’

Visual effects savant Surfaced Studio steps In the NVIDIA Studio this week to share his clever film sequences, Fluid Simulation and Destruction, as well as his creative workflows. These sequences feature quirky visual effects that Surfaced Studio is renowned for demonstrating on his YouTube channel.

The post Sequences That Stun: Visual Effects Artist Surfaced Studio Arrives ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Categories
Misc

Just Released: Lightning-Fast Simulations with PennyLane and the NVIDIA cuQuantum SDK

Learn how the PennyLane lightning.gpu device uses the NVIDIA cuQuantum software development kit to speed up the simulation of quantum circuits.

Discover how the new PennyLane simulator device, lightning.gpu, offloads quantum gate calls to the NVIDIA cuQuantum software development kit to speed up the simulation of quantum circuits.

Categories
Misc

AI on the Sky: Stunning New Images from the James Webb Space Telescope to be Analyzed by, Train, AI

The release by U.S. President Joe Biden Monday of the first full-color image from the James Webb Space Telescope is already astounding — and delighting — humans around the globe. “We can see possibilities nobody has ever seen before, we can go places nobody has ever gone before,” Biden said during a White House press Read article >

The post AI on the Sky: Stunning New Images from the James Webb Space Telescope to be Analyzed by, Train, AI appeared first on NVIDIA Blog.

Categories
Misc

DLI Course:  Getting Started with DOCA Flow

In this new course learn about creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection for increasing the performance and security demands of modern data centers.

In this new course learn about creating software-defined, cloud-native, DPU-accelerated services with zero-trust protection for increasing the performance and security demands of modern data centers.

Categories
Misc

Advice on Building a Data Science Career: Q&A with Ken Jee


Ken Jee, a data science professional, shares insights on leveraging university resources, benefits of content creation, and useful learning methods for AI topics.

Ken Jee is a data scientist and YouTube content creator who has quickly become known for creating engaging and easy-to-follow videos. Jee has helped countless people learn about data science, machine learning, and AI and is the initiator of the popular #66daysofdata movement.

Currently, Jee works as the Head of Data Science at Scouts Consulting Group. In this post, he discusses his work as a data scientist and offers advice for anyone looking to enter the field. We explore the importance of university education, the relevancy of math for data scientists, creating visibility within the industry, and the value of an open mind when it comes to new technologies.

This post is a transcription of bits and pieces of Jee’s wisdom with which I had the pleasure to speak to on my podcast. At the conclusion of this article, you’ll find a link to the entire discussion. While there has been numerous editing in the answers provided by Jee to ensure brevity and conciseness, the intentions of his answers are maintained.

Why did you start making data science videos on YouTube?

I started making data science videos on YouTube because I didn’t see the resources that I was looking for when I was trying to learn data science.

I also saw making videos as the best way to improve my communication skills. Creating content has given me a competitive advantage because it has attracted employers to me rather than going out to get them. I usually refer to this as the concept of content gravity. The more content that I create, the more pull I have on employers and opportunities coming to me. 

I love working on interesting data projects and creating easy-to-digest content that can help others learn and grow. I believe that data science skills are valuable and shareable and that data-driven content has a great potential to go viral. Companies should encourage their employees to have side hustles and be public about them, as it looks good for the company.

I see a future where everyone uses social media to share their work and ideas and where this is accepted and expected in most roles. In some of my previous job roles, I’ve been referred to as “the guy who makes YouTube videos.” My external efforts outside of work have aided my internal visibility within companies.

How did you become interested in data science?

I became interested in data science because I wanted to improve my golfing skills. I started to explore how data could help me analyze my performance and find ways to improve. I soon discovered that I had a unique advantage: the ability to analyze data and create data-driven actions to improve my golfing abilities. This led me to explore further other performance improvement methods supported by data and intelligence.

How essential is mathematics in data science?

I believe that mathematics is less important when breaking into the data science field. What’s important is getting your hands dirty and coding. I recommend that people get their hands dirty by building projects and coding, as this will help them intuitively find where the math is valuable and important.

 I also recommend reviewing calculus, linear algebra, and discrete math, but only once you have a reason to do so and understand how they are relevant to data science. As you continue to progress within the field, you will gradually learn where math skills are important and relevant. And once you see the value that they bring, you will be more motivated to learn them.

Is self-directed learning more important than a formal degree when entering the data science field?

One of the primary reasons I encourage people to investigate unusual learning methods, as opposed to attending a university, is that many students underutilize the resources available at institutions. I used all of my office hours with professors and asked questions from PhDs who knew a lot about subjects, but I discovered very few students did the same.

In my opinion, having a degree is only useful if you put in the effort and make the most of the available opportunities. I recommend taking advantage of other options available at university, such as side projects. Doing so can help students get the most out of their education and give them an edge in the job market. However, I warn that simply getting a degree does not guarantee a successful career.

Editor’s Note: Jee contributes to the data science learning platform 365DataScience, educating learners on starting a successful data science career. He also has a master’s degree in computer science and another in business, marketing, and management. Jee holds a bachelor’s degree in economics.

Obtaining a master’s degree in an advanced subject such as data science is not always the best method to stand out. Having an impressive portfolio, unique work or volunteer experience can be more valuable.

It is worth considering if you can invest the time and money into obtaining a master’s degree as it is undoubtedly a viable resource. But it’s also important to consider the opportunity cost of returning to school to land a job. So, it’s financially practical to view attending graduate school to obtain a particular role within AI as an opportunity cost. You essentially must determine if attending grad school will provide a good return on investment.

How do you learn?

I learn best by struggling through something on my own at my own pace, rereading the same thing over and over again until I understand it. In grad school, I fell in love with reading, and the majority of my knowledge came from textbooks. 

I recommend looking at things from different angles to get a diverse understanding of a topic. One of the most important keys to accelerating learning is finding a suitable medium that explains the topic in a way that makes sense to you, this could be reading a blog post, watching a video, or listening to a podcast. 

Although my primary method of obtaining knowledge in grad school was through books, I admit that my learning of data science concepts and topics today involves videos and YouTube tutorials. Specifically, I want to mention the popular data science YouTube channel StatQuest with Josh Starmer.

What are the best skills to differentiate yourself as a data scientist?

Data scientists have to learn coding, math, and business in order to be successful. I differentiated myself from the competition with my unique combination of skills. My business knowledge and ability to meet the strategic requirement for coding and data science made me a highly desirable candidate. My resume and portfolio stood out from the competition. Additionally, my communication skills and business knowledge gave me a distinct edge in job interviews.

How did you become the head of data science at your current company?

I discovered that I didn’t fit well into corporate bureaucracy very early on. My focus was on creating value, getting noticed for adding value and getting work satisfaction. My title has progressed from data scientist to head of data science. I am now responsible for all data-related work and have taken on the role of Director of Data Science. 

This change reflects the increased responsibilities that I have taken very early on within my current company, from solely being responsible for all data science activities to managing teams of data scientists. If you are looking for a job, I recommend that you create your opportunities by reaching out to potential employers. 

You may be surprised at how open they are to hiring you if they see that you are willing and able to do the work. I advise data science practitioners to find a position that doesn’t yet exist or make one for themselves. This way, you can skip the line and get to where you want to be without waiting for opportunities.

What is your advice to entry-level data scientists?

Entry-level data scientists should share their work and journey with others. People are hesitant to produce content because they are afraid of being judged, but this is not usually the case. People are more likely to be positive and supportive. I also recommend learning to code first, as this is a valuable skill for data scientists. However, I recognize that everyone learns differently, so this is not a one-size-fits-all approach.

Summary from the author

Jee’s journey within data science is unique, but the steps that led to his success are replicable and adaptable to your data science career. My discussion with him revealed the importance of using digital content to communicate your expertise and presence within the data science field, which can sometimes be filled with noise. His advice to data science practitioners is to focus on creating value and making sure that you’re learning continuously to keep up with the rapidly changing field. So whatever your goals are for your data science career, don’t forget to enjoy the journey and document it along the way!

You can watch or listen to the entire conversation with Ken Jee on YouTube or Spotify.

Find more of Jee’s content on: YouTube | Twitter | LinkedIn | Podcast

Categories
Misc

Upcoming Event: Join NVIDIA at Microsoft Inspire 2022

Learn how NVIDIA and Azure together enable global on-demand access to the latest GPUs and developer solutions to build, deploy, and scale AI-powered services.

Learn how NVIDIA and Azure together enable global on-demand access to the latest GPUs and developer solutions to build, deploy, and scale AI-powered services.

Categories
Misc

Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel

Engineers are using the NVIDIA Omniverse 3D simulation platform as part of a proof of concept that promises to become a model for putting green energy to work around the world. Dubbed Gigastack, the pilot project — led by a consortium that includes Phillips 66 and Denmark-based renewable energy company Ørsted — will create low-emission Read article >

The post Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel appeared first on NVIDIA Blog.

Categories
Misc

Build Tools for the 3D World with the Extend the Omniverse Contest

Announcing our first Omniverse developer contest for building an Omniverse Extension. Show us how you’re extending Omniverse to transform 3D workflows and virtual worldbuilding.

Developers across industries are building 3D tools and applications to help teams create virtual worlds in art, design, manufacturing, and more. NVIDIA Omniverse, an extensible platform for full fidelity design, simulation, and developing USD-based workflows, has an ever-growing ecosystem of developers building Python-based extensions. We’ve launched contests in the past for building breathtaking 3D simulations using the Omniverse Create app. 

Today, we’re announcing our first NVIDIA Omniverse contest specifically for developers, engineers, technical artists, hobbyists, and researchers to develop Python tools for 3D worlds. The contest runs from July 11 to August 19, 2022. The overall winner will be awarded an NVIDIA RTX A6000, and the runners-up in each category will win a GeForce RTX 3090 Ti.

The challenge? Build an Omniverse Extension using Omniverse Kit and the developer-centric Omniverse application Omniverse Code. Contestants can create Python extensions in one of the following categories for the Extend the Omniverse contest:

  • Layout and scene authoring tools
  • Omni.ui with Omniverse Kit
  • Scene modifier and manipulator tools

Layout and scene authoring tools

The demand for 3D content and environments is growing exponentially. Layout and scene authoring tools help scale workflows for world-building, leveraging rules-based algorithms and AI to generate assets procedurally.

Instead of tediously placing every component by hand, creators can paint in broader strokes and automatically generate physical objects like books, lamps, or fences to populate a scene. With the ability to iterate layout and scenes more freely, creators can accelerate their workflows and free up time to focus on creativity. 

Universal Scene Description (USD) is at the foundation of layout and scene authoring tools contestants can develop in Omniverse. The powerful, easily extensible scene description handles incredibly large 3D datasets without skipping a beat—enabling creating, editing, querying, rendering, and collaboration in 3D worlds.

Video 1. How to build a tool using Omniverse Code that programmatically creates a scene

Omni.ui with Omniverse Kit

Well-crafted user interfaces provide a superior experience for artists and developers alike. They can boost productivity and enable nontechnical and technical users to harness the power of complex algorithms. 

Building custom user interfaces has never been simpler than with Omni.ui, Omniverse’s UI toolkit for creating beautiful and flexible graphical UI design. Omni.ui was designed using modern asynchronous technologies and UI design patterns to be reactive and responsive. 

Using Omniverse Kit, you can deeply customize the final look of applications with widgets for creating visual components, receiving user input, and creating data models. With its style sheet architecture that feels akin to HTML or CSS, you can change the look of your widgets or create a new color scheme for an entire app.

Existing widgets can be combined and new ones can be defined to build the interface that you’ve always wanted. These extensions can range from floating panels in the navigation bar to markup tools in Omniverse View and Showroom. You can also create data models, views, and delegates to build robust and flexible interfaces.

Video 2. How to use Omniverse Kit and Omni.ui, the toolkit to create custom UIs in Python

Scene modifier and manipulator tools

Scene modifier and manipulator tools offer new ways for artists to interact with their scenes. Whether it’s changing the geometry of an object, the lighting of a scene, or creating animations, these tools enable artists to modify and manipulate scenes with limited manual work.

Using omni.ui.scene, Omniverse’s low-code module for building UIs in 3D space, you can develop 3D widgets and manipulators to create and move shapes in a 3D projected scene with Python. Many primitive objects are available, including text, image, rectangle, arc, line, curve, and mesh, with more regularly being added.

Video 3. How to build a scene modifier tool in Omniverse

We can’t wait to see what extensions you’ll create to contribute to the ecosystem of extensions that are expanding what’s possible in the Omniverse. Read more about the contest, or watch the video below for a step-by-step guide on how to enter. You can also visit the GitHub contest page for sample code and other resources to get started. 

Video 4. How to submit to the contest

Don’t miss these upcoming events:

  • Join the Omniverse community on Discord July 13, 2022 for the Getting Started – #ExtendOmniverse Developer Contest livestream
  • Join us at SIGGRAPH for hands-on developer labs where you can learn how to build extensions in Omniverse.  

Learn more in the Omniverse Resource Center, which details how developers can build custom applications and extensions for the platform. 

Follow Omniverse on Instagram, Twitter, YouTube, and Medium for additional resources and inspiration. Check out the Omniverse forums and join our Discord Server to chat with the community.