This post covers best practices for Vulkan clearing and presentation on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.
This post covers best practices for Vulkan clearing and presenting on NVIDIA GPUs. To get a high and consistent frame rate in your applications, see all Advanced API Performance tips.
With the recent Vulkan 1.3 release, it’s timely to add some Vulkan-specific tips that are not necessarily explicitly covered by the other Advanced API Performance posts. In addition to introducing new Vulkan 1.3 core features, this post shares a set of good practices for clearing and presenting surfaces.
Vulkan 1.3 Core
Vulkan 1.3 brings improvements through extensions to key parts in the API. This section summarizes our recommendations for obtaining the best performance when working with a number of these new features.
- Skip framebuffer and render pass object setup by taking advantage of dynamic rendering.
- Reduce the number of pipeline state objects with core support for dynamic states.
- Simplify synchronization and avoid unnecessary image layout transitions by using the improved synchronization API
This section provides a guideline for achieving performance when invoking clear commands. This type of command clears a region within a color image or within the bound framebuffer attachments.
VK_ATTACHMENT_LOAD_OP_CLEARto clear attachments at the beginning of a subpass instead of clear commands. This can allow the driver to skip loading unnecessary data.
- Outside of a render pass instance, prefer the usage of
vkCmdClearColorImageinstead of a CS invocation to clear images. This path enables bandwidth optimizations.
- If possible, batch clears to avoid interleaving single clears between dispatches.
VkClearDepthStencilValuewith the test function to achieve better depth testing performance:
- 0.5 ≤ depth value VK_COMPARE_OP_LESS_OR_EQUAL
- 0.0 ≤ depth value VK_COMPARE_OP_GREATER_OR_EQUAL
- Specifying more than 30 unique clear values per application (or more than 15 on Turing) does not make the most of clear bandwidth optimizations.
- “Clear shaders” should be avoided unless there is overlap of a compute clear with a neighboring dispatch.
The following section offers insight into the preferred way of using the presentation modes supported by a surface in order to achieve good performance.
- Rely on
VSyncon). Noteworthy aspects:
VK_PRESENT_MODE_FIFO_KHRis preferred as it does not drop frames and lacks tearing.
VK_PRESENT_MODE_MAILBOX_KHRmay offer lower latency, but frames might be dropped.
VK_PRESENT_MODE_FIFO_RELAXED_KHRis compelling when your application only occasionally lags behind the refresh rate, allowing tearing so that it can “catch back up”.
- Rely on
- On Windows systems, use the
VK_EXT_full_screen_exclusiveextension to bypass compositing.
- Handle both out-of-date and suboptimal swapchains to re-create stale swapchains when windows resize, for example.
- For latency-sensitive applications, use the Vulkan Reflex SDK to minimize latency by completing game engine work just-in-time for rendering.
For more information about using Vulkan with NVIDIA GPUs, see Vulkan Do’s and Don’ts.
To view the Vulkan API state, use the API Inspector in Nsight Graphics. (free download)
With Nsight Systems, you can view Vulkan usage on a unified CPU-GPU timeline, investigate stutter, and track GPU cold spots to their CPU origins. Download Nsight Systems for free.
Thanks to Piers Daniell, Ivan Fedorov, Adam Moss, Ryan Prescott, Joshua Schnarr, Juha Sjöholm, and Márton Tamás for their feedback and contributions.