Advanced API Performance: Vulkan Clearing and Presenting

This post covers best practices for Vulkan clearing and presentation on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.

With the recent Vulkan 1.3 release, it’s timely to add some Vulkan-specific tips that are not necessarily explicitly covered by the other Advanced API Performance posts. In addition to introducing new Vulkan 1.3 core features, this post shares a set of good practices for clearing and presenting surfaces.

Vulkan 1.3 Core

Vulkan 1.3 brings improvements through extensions to key parts in the API. This section summarizes our recommendations for obtaining the best performance when working with a number of these new features.

Clears

This section provides a guideline for achieving performance when invoking clear commands. This type of command clears a region within a color image or within the bound framebuffer attachments.

Use VK_ATTACHMENT_LOAD_OP_CLEAR to clear attachments at the beginning of a subpass instead of clear commands. This can allow the driver to skip loading unnecessary data.
Outside of a render pass instance, prefer the usage of vkCmdClearColorImage instead of a CS invocation to clear images. This path enables bandwidth optimizations.
If possible, batch clears to avoid interleaving single clears between dispatches.
Coordinate VkClearDepthStencilValue with the test function to achieve better depth testing performance:
- 0.5 ≤ depth value VK_COMPARE_OP_LESS_OR_EQUAL
- 0.0 ≤ depth value VK_COMPARE_OP_GREATER_OR_EQUAL

Not recommended

Specifying more than 30 unique clear values per application (or more than 15 on Turing) does not make the most of clear bandwidth optimizations.
“Clear shaders” should be avoided unless there is overlap of a compute clear with a neighboring dispatch.

Present

The following section offers insight into the preferred way of using the presentation modes supported by a surface in order to achieve good performance.

Rely on VK_PRESENT_MODE_FIFO_KHR or VK_PRESENT_MODE_MAILBOX_KHR (for VSync on). Noteworthy aspects:
- VK_PRESENT_MODE_FIFO_KHR is preferred as it does not drop frames and lacks tearing.
- VK_PRESENT_MODE_MAILBOX_KHR may offer lower latency, but frames might be dropped.
- VK_PRESENT_MODE_FIFO_RELAXED_KHR is compelling when your application only occasionally lags behind the refresh rate, allowing tearing so that it can “catch back up”.
Rely on VK_PRESENT_MODE_IMMEDIATE_KHR for VSync off.
On Windows systems, use the VK_EXT_full_screen_exclusive extension to bypass compositing.
Handle both out-of-date and suboptimal swapchains to re-create stale swapchains when windows resize, for example.
For latency-sensitive applications, use the Vulkan Reflex SDK to minimize latency by completing game engine work just-in-time for rendering.

More information

For more information about using Vulkan with NVIDIA GPUs, see Vulkan Do’s and Don’ts.

To view the Vulkan API state, use the API Inspector in Nsight Graphics. (free download)

With Nsight Systems, you can view Vulkan usage on a unified CPU-GPU timeline, investigate stutter, and track GPU cold spots to their CPU origins. Download Nsight Systems for free.

Acknowledgments

Thanks to Piers Daniell, Ivan Fedorov, Adam Moss, Ryan Prescott, Joshua Schnarr, Juha Sjöholm, and Márton Tamás for their feedback and contributions.