NVIDIA’s GPU Technology Conference is a hotbed for sharing groundbreaking innovations — making it the perfect forum for developers, students and professionals from underrepresented communities to discuss the challenges and opportunities surrounding AI. Last month’s GTC brought together virtually tens of thousands of attendees from around the world, with more than 20,000 developers from emerging Read article >
NetEase Thunder Fire Games Uses Mesh Shading To Create Beautiful Game Environments for Justice In December, we interviewed Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline. Their results were nothing … Continued
NetEase Thunder Fire Games Uses Mesh Shading To Create Beautiful Game Environments for Justice
In December, we interviewed Haiyong Qian, NetEase Game Engine Development Researcher and Manager of NetEase Thunder Fire Games Technical Center, to see what he’s learned as the Justice team added NVIDIA ray-tracing solutions to their development pipeline. Their results were nothing short of stunning with local 8K DLSS support and global illumination through RTXGI.
Recently, NetEase introduced Mesh Shader support to Justice. Not only are the updated environments breathtaking, the game supports 1.8 billion triangles running over 60 FPS in 4K on an NVIDIA 3060Ti.
To learn more about the implementation and results, we sat down with Yuheng Zou, game engine developer at NetEase. His work focuses on the rendering engine in Justice, specifically GPU features enabled by DirectX 12.
Q: What are you trying to achieve by adding mesh shading to Justice?
Our first thought is to render some highly detailed models which may need insane number of triangles. Soon we found we can combine Mesh Shaders with auto-generated LODs to achieve almost only-resolution-relevant rendering complexity, instead of polygon number. And we decided to try it out. With so much potential of Mesh Shader, we conceive that it would be the main stream of future games.
Q: What is not currently working with regular compute / draw indirect / traditional methods?
The simple draw call just doesn’t work for this. It lacks the ability to process mesh in a coarser grain than triangle, like meshlet culling.
Compute or draw indirect may be fine, but we do need to make a huge change on the rendering pipeline. The underlying idea of the algorithm is, in the first place, to do culling, then draw the effective parts of mesh. While it can be achieved by culling and compacting with compute shader and then drawing indirect, the data exchange between the two-step process can sometimes be fatal to GPU under highpoly rendering context. Mesh shader solves this problem from the hardware level.
Q: How do Mesh Shaders solve this?
Mesh shader can extend the scalability of geometry stage, and is very easy to integrate to engine runtime. It has the ability to encapsulate the culling procedure in a single API call, which omits tedious state and resource set up procedure as draw indirect requires. With MeshShaders, the culling algorithms we use can be of great flexibility. For example, in the shadow pass, we don’t have the depth information so occlusion culling is simply ignored in the shader.
Q: Is the end result of adding Mesh Shaders something your players will quickly notice, or is the effect more subtle?
Our technology customizes a highpoly mesh pipeline, including production, processing, serialization, streaming and rendering, aiming to provide our players with a refreshed experience with such high-fidelity contents. Actually, it works. Soon after “Wan Fo Ku” released, our players found the models presented were much more elaborate than the traditional one, with many close-up screenshots posted on the forums. While adding Mesh Shaders under customized highpoly scene do boost the rendering effectiveness, how to optimize our traditional scene remains subtle and needs more engineering efforts.
Q: What kinds of environments benefited most from the technology?
Our technology enables the ability of rendering parallax and silhouette of models in an incredible fidelity. For scenes like caves, these details can produce a visually better image. It also provides Chinese ancient buildings, furniture and ornaments with “meticulous” rendering result, which enables the culture carried by them to be expressed in Justice to the finest extent.
Q: What is the one thing you wish you knew before you added mesh shading? What would you do differently based on your learnings?
With such detailed models, large texture resolution is a must. We will pay more attention to texture loading or streaming. This also makes further requirements to our mesh/texture compression algorithms.
Q: Any other tips for developers looking to work with Mesh Shaders for the first time?
Mesh shader has the possibility of boosting geometry stage drastically. However, careful profiling and optimization is required. We highly recommend NSight for debugging and profiling Mesh Shaders.
Doing GPU culling with Mesh Shaders will sometimes require a change on mesh representation data format. A clever data format design will enable your trial and error, making a lot faster progress.
Since 2016, Justice has been closely engaged with NVIDIA in China on video game graphics technology innovation. As one of the best PC MMO games in China market, Justice has attracted thirty millions of players in the past three years with its excellent technologies and beautiful graphics.
Learn more about NVIDIA’s technical resources for Game Developers here.
Every day, vast amounts of geospatial imagery are being collected, and yet, until recently, one of the biggest potential users of that trove — property insurers — had made surprisingly little use of it. Now, CAPE Analytics, a computer vision startup and NVIDIA Inception member, seeks to turn that trove of geospatial imagery into better Read article >
A team of NVIDIA artists released never-before-seen imagery and behind-the-scenes videos from the Omniverse RTX Racer playable sample project. The clips and imagery are the result of 3 weeks of progress, showcasing the Omniverse platform’s power in multi-GPU rendering, dynamic lighting, and real-time rendering.
A team of NVIDIA artists released never-before-seen imagery and behind-the-scenes videos from the Omniverse RTX Racer playable sample project. The clips and imagery are the result of 3 weeks of progress, showcasing the Omniverse platform’s power in multi-GPU rendering, dynamic lighting, and real-time rendering.
The racing demo consists of a loop level, where a player controls a dune buggy vehicle, with the ability to steer, accelerate, decelerate, and brake.
With no difference between editing and “play” mode within NVIDIA Omniverse, users can move the camera and control the vehicle, or change the lighting, simultaneously, in real time.
Creative and Collaborative Workflows in Omniverse
The photorealistic scenes, characters, and props were created using a variety of creative applications such as Pixologic ZBrush, Autodesk 3ds Max, Autodesk Maya and Blender, textured in Substance Painter, and rendered with Omniverse RTX Renderer. Using Omniverse Create, two artists worked collaboratively on the rocky, desert environment, leveraging the platform’s live sync abilities to adjust environment art, layout and lighting simultaneously. The Omniverse platform’s simulation suite, including NVIDIA PhysX 5, delivered physically accurate simulation, providing players with a true-to-reality experience as they raced their buggy across the rugged terrain.
The cheeky alien main character was sculpted entirely in ZBrush, refined in Autodesk Maya and textured in Substance Painter.
Alien driver. Substance Painter. Rendered in NVIDIA Iray.
The dune buggy, a custom vehicle designed to race around desert dunes, was modeled in Autodesk Fusion 360 and textured in Substance Painter. The vehicle is physically based, and went through various iterations in size and dimensions to find the optimal racing balance in the physically accurate environment.
Dune buggy. Autodesk Fusion 360 viewport.
One Renderer, Dual Personalities
Omniverse RTX Renderer works in two different modes: real-time ray-traced mode and referenced path-traced mode. The demo is able to run in both modes without a significant difference in graphics quality — the materials and visual output are very similar.
At a high level, both modes implement the same rendering algorithm for path tracing, but with different optimization and approximations choices. The real-time ray-traced mode is typically used in real-time simulation applications such as games, while the reference path-traced mode is used for projects requiring ultra-high fidelity, final-frame quality results as it accumulates many samples over many frames, often requiring multiple GPUs.
Leveraging the power of four NVIDIA RTX 8000 GPUs, the NVIDIA team experimented with referenced path-traced mode, ultimately achieving offline film quality rendering at near real-time speeds.
Bringing it to Life with Omniverse Physics
The Omniverse RTX Racer project also showcases NVIDIA Omniverse’s advanced physics simulation suite including PhysX 5, exclusively available to the public as part of the platform, NVIDIA Flow, and Blast.
PhysX provides the powerful vehicle authoring technology that made the dune buggy come to life. The same vehicle authoring technology is used as part of the new NVIDIA DRIVE Sim platform for training and validating self-driving cars, now built entirely on Omniverse. With accurate steering, suspension, wheel collision, and tire friction behavior, users can easily create multiple instances of different styles of vehicles.
The team of artists and developers used this project to experiment with Flow, a sparse grid-based fluids simulation package for real-time applications that was recently integrated into the Omniverse platform. Flow was used to simulate the buggy’s blazing rocket engine and cloudy dust trail. This effect is written in Slang, the NVIDIA cross-platform shading language, and simulated entirely on NVIDIA GPUs.
The Omniverse RTX Racer sample is a living project with future iterations to come. Learn more about NVIDIA Omniverse and download the open beta today.
I have a question about TFRecords and how to train tf.keras models with them. For this I built a toy example, loading the iris dataset, writing the the data to a TFRecord, reading it back in and trying to train a simple MLP I found in a tutorial.
For encoding/writing/reading/decoding the TFRecords I followed mostly the official [Tutorial](https://www.tensorflow.org/tutorials/load_data/tfrecord). As far as I am able to see I can recover the original data, so I thought I plug the dataset into the MLP as the [fit method](https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit) should be able to work with tf.data datasets. In an example notebook with the MNIST dataset it worked fine, but in my case fit throws the following error:
Watch the replay of this joint session at GTC 21 to learn about achieving near-line rate speed of a next-generation firewall through the use of DPUs for a highly efficient 5G native security solution.
5G is unlike earlier generations of wireless networks. 5G offers many new capabilities such as lower latency, higher reliability and throughput, agile service deployment through cloud-native architectures, greater device density, and more. The adoption of 5G and its expanded capabilities drives the bandwidth requirements of mobile networks to 100Gbps and beyond.
With 5G and the increasingly frequent implementations of cloud computing, a new direction in cyber-security is required to maintain adequate protection. Today’s cyber-attack methods are demonstrating increased sophistication and going after larger attack surfaces. Coupling this with modern cloud environments – which are more vulnerable than on-premises deployments – makes proper security enforcement difficult. With 5G, a new approach to security is needed to achieve adequate protection.
Figure 1. Properly securing 5G networks is becoming increasingly challenging
Next-generation 5G Firewall
Palo Alto Networks and NVIDIA have collaborated to create a scalable, adaptive security solution that combines the Palo Alto Next-Generation Firewall with the NVIDIA BlueField-2 Data Processing Unit (DPU). Integrating these two raises the bar for high-performance security in virtualized software-defined networks. The NVIDIA BlueField-2 DPU provides a rich set of network offload engines designed to address evolving security needs within demanding markets such as 5G and the cloud. Palo Alto Networks has taken its expertise in securing enterprise and mobile networks and applied it to 5G. They used this knowledge to implement a 5G-native security initiative that includes a virtual firewall. The virtual firewall is designed to meet the stringent security needs of 5G cloud-native environments, offering scale, operational simplicity, and automation, enabling customers to gain unparalleled security protection.
For data centers looking to modernize their security infrastructure within 5G and cloud environments, the power of a software-defined, hardware-accelerated security architecture from NVIDIA and Palo Alto Networks provides increased infrastructure efficiency, granular zero-trust security across the entire solution stack, and streamlined security and management operations.
Figure 3. Intelligent traffic offload provided by BlueField-2 DPU
The dynamic nature of this solution has intelligent traffic offloads built in so that it adapts to real-time threats without requiring changes to the network infrastructure. The NVIDIA ASAP2 VNF offload technology filters or steers traffic for elephant flows identified based on AppID. Further, this AppID is used to inspect the first few packets to detect if it contains a threat or if it can offload the flow. If the packet is not suitable for offload, it is sent to the firewall for inspection. If the firewall determines the session is of no threat, it is sent to the PAN gRPCd process that calls the DPU daemon to add the session to the DPU session table for future offloading. The DPU will handle all subsequent packets in the flow without consuming any server CPU cycles for firewall processing. The solution provides up to 100Gb/s throughput with 80% of traffic offloaded to the DPU and ensures the highest performance without utilizing the CPU. This results in an throughput increase of 5X compared to host-based traditional firewall security solutions.
At GTC21, NVIDIA and Palo Alto Networks jointly presented the intelligent traffic offload use-case for 5G native security. Watch the replay of this joint session to learn about achieving near-line rate speed of a next-generation firewall through the use of DPUs for a highly efficient 5G native security solution. Don’t miss the demonstration showcasing the flexibility, programmability, and agility of the Palo Alto Networks and NVIDIA joint cyber-security solution. The GA of this solution is targeted for May 2021. Please connect with your NVIDIA or Palo Alto Networks sales representatives to learn more.
Click here to watch the recorded session from GTC.