16 Jun 2022

QUANTECH BLOG – 3D ENGINE OPTIMISATION FOR PC

When it comes to technology, Quantic Dream is based on an essential principle: it is primarily at the service of the experience. Technology must not be a hindrance to creativity, but on the contrary, it must enhance it.

It is within this framework that our developers imagine day after day the solutions that allow to serve the vision of our artists and scriptwriters. In this new series of technical blogs, we are giving the floor to our development teams, who are revealing very specific aspects of their work.

Please note that this is not meant for casual readers, such as our beloved gamers or fans; here, we want to directly address developers, engineers, researchers, students, and tech enthusiasts.

This essay on 3D engine optimisation for PC is written by 3D Engine Programmer Kevin Havranek.

Table of contents

Table of contents
Introduction
Interface with the graphic API
- “RENDER_CONTEXT” library
- Exposing the pipelines
- Exposing resources
- Pseudo-Code
Reducing our Draw Calls
- Limit CPU -> GPU exchanges
- Vertex Buffers supression
- Merging Index Buffers (and others)
Conclusion
Thanks

Introduction

Historically, the internal technology developed by Quantic Dream has been targeted for the PlayStation, and we also had a PC version running on OpenGL for our publishing needs. In a classic way, our 3D engine uses a library that serves as an interface between the rest of the engine and the graphics library used.
To port Detroit: Become Human to the PC, we had to rewrite the interface of our rendering engine to support Vulkan. I invite you to read the article we wrote back then for AMD GPU Open. This optimisation work has been an ongoing process since then, for us to have a more powerful 3D engine to support the following productions.
The aim of the rendering interface optimisations is to drive the GPU as quickly as possible. A particularly recurrent operation is for example the DrawPrimitive() function and its variants. When there are many objects, it is critical that it is fast to keep a good frame rate.
The first Vulkan implementation of our interface, for reasons of time and ease of use, was straightforward, and consisted of rewriting the OpenGL or DirectX concepts. A big part of the work was to expose the Vulkan concepts and modify the engine to use them.

Interface with the graphic API
“RENDER_CONTEXT” library
The RENDER_CONTEXT in our engine is simply the interface that allows us to make the link between our engine and the different graphics APIs we support, such as Vulkan, OpenGL and PlayStation.

This library has a central class: RENDER_CONTEXT, which allows to drive the GPU and to create classes for textures, buffers, render targets, etc. Each class has a graphical API implementation.

When we ported our engine to the PC with Vulkan, most of the work was done on this interface. Due to time constraints, we were not able to modify the engine in depth and we focused on a Vulkan implementation of the API that respected the interface at the time.

We quickly realised that to get good performance, we needed to expose some of the Vulkan concepts in our interface and therefore modify the engine in depth.

Exposing the pipelines
Unlike OpenGL and other graphics APIs, Vulkan uses the concept of a VkPipeline which can contain a set of shaders with states. With OpenGL it was the driver that took care of generating the different pipelines needed for the display. But with Vulkan, it is up to the developers to create VkPipeline objects.

When Vulkan was first implemented for Detroit: Become Human, the programmers sent the shaders and their states directly to our interface. Then our interface would oversee creating and using the different pipelines at display time. In the end, we reproduced what OpenGL did, but at the level of our interface.

This would cause two issues:

It is time-consuming, because for each draw call there is an entire process of fetching an existing VkPipeline (depending on the resources used) and then creating it, if necessary.
Programmers are not aware of this and may produce sub-optimal code that requires a higher number of VkPipeline.

To allow more control over the creation and chaining of VkPipeline, we have reworked our interface so that developers can create them directly. We have therefore allowed our different rendering modules to be controlled via a new shader management system, which is conceptually close to the VkPipeline system proposed by Vulkan. Once these modules were rewritten, this new way of proceeding allowed us to get rid of this automatic management of VkPipeline, which was impacting performance. In addition, it forces the developer to define the different states when creating a pipeline, which will prevent the random behaviour that could be introduced by a “state machine” type of management.

Exposing resources
To manage and send resources to the GPU with Vulkan, we must use a VkDescriptorSet type of object. Whereas in OpenGL a simple call to a bind function can easily send a resource to the shader, Vulkan requires first to update the VkDescriptorSet and then attach it to the VkCommandBuffer, which will execute the shader; the process of sending resources is therefore more complex.

In Detroit: Become Human, for the first implementation of Vulkan – and in the same way as for VkPipeline – the management of VkDescriptorSet was done internally in our interface. This involved automatic management of VkDescriptorSetLayout which allows us to describe the different resources and their formats that our shaders / VkDescriptorSet would have. The management of these VkDescriptorSetLayouts therefore brought new concepts that we had to understand to optimise our engine.

The difference with OpenGL is that we need to know in advance the descriptor combinations to create VkDescriptorSetLayout. Behind an interface and without any indication from the developer, it is impossible to know this combination even before the execution of the shader and its associated resources. It was then necessary to generate the VkDescriptorSetLayout and the VkDescriptorSet at the time of the draw call or the dispatch. Even if we had executions with very few different descriptor combinations, we still had to generate a pair of VkDescriptorSetLayout / VkDescriptorSet for each execution. A first problem was therefore a higher than necessary number of VkDescriptorSet and their updates, when this could be done at initialisation rather than during the first run.

The first half of the work on the interface was to give developers the ability to create and update their own VkDescriptorSet, to allow this to be done at initialisation of our various modules, rather than leaving it to the interface itself. To use a VkDescriptorSet, all you must do is attach it before a draw call or dispatch.

It was also necessary to give developers the possibility to create their own VkDescriptorSetLayout, to define bindings schemes and group different combinations of similar descriptors, which our interface did not offer before.

We took advantage of this new possibility for some of our rendering modules, which perform a lot of draw calls with similar resource combinations. While our interface generated many VkDescriptorSetLayout / VkDescriptorSet for each draw call coming from these modules, it was now possible to create only one VkDescriptorSetLayout / VkDescriptorSet for all the draw calls of the frame. This new approach allowed us to have a significant gain in performance for the update of the different VkDescriptorSet.

With this optimisation we could divide the update time of the descriptor sets by two.

Pseudo-Code

// Before
void RENDERER::Init()
{
}

void RENDERER::Display()
{
    RENDER_CONTEXT::SetTexture(0, _Tex0);
    RENDER_CONTEXT::SetBuffer(0, _Buf0);
    RENDER_CONTEXT::SetDepthTest(true);
    RENDER_CONTEXT::SetDepthFunc(LESS);
    RENDER_CONTEXT::SetCullMode(FRONT);

    RENDER_CONTEXT::SetVertexShader(_VertexShader);
    RENDER_CONTEXT::SetPixelShader(_PixelShader);

    RENDER_CONTEXT::Draw();
}

// After
void RENDERER::Init()
{
    BINDINGS_LAYOUT BindingsLayout;
    BindingsLayout.RegisterTexture(0, TEXTURE_TYPE);
    BindingsLayout.RegisterBuffer(0, BUFFER_TYPE);

    PIPELINE_CREATE_INFOS CreateInfos;
    CreateInfos._bDepthTest = true;
    CreateInfos._eDepthFunc = LESS;
    CreateInfos._eCullMode = FRONT;
    CreateInfos._pVertexShader = _VertexShader;
    CreateInfos._pPixelShader = _PixelShader;

    _pPipeline = RENDER_CONTEXT::CreatePipeline(CreateInfos, BindingsLayout);
    _pResourceSet = RENDER_CONTEXT::CreateResourceSet(BindingsLayout);

    _pResourceSet->SetTexture(0, _Tex0);
    _pResourceSet->SetBuffer(0, _Buf0);
}

void RENDERER::Display()
{
    RENDER_CONTEXT::SetPipeline(_pPipeline);
    RENDER_CONTEXT::SetResourceSet(0, _pResourceSet);

    RENDER_CONTEXT::Draw();
}

Reducing our Draw Calls
Limit the CPU -> GPU exchanges
When working with a GPU, it is better to limit the exchanges with the CPU, which can be power-intense. One area of optimisation is the number of draw calls: the fewer the number of draw calls, the higher the number of frames per second (FPS).

To reduce the number of draw calls, it is possible to submit several ones in a single command. With Vulkan, we use the vkCmdDrawIndexedIndirect function. To do this, the draw calls must share certain characteristics: same shader pipelines, same descriptor sets and same buffers. So, the less we have, the more draw calls can be grouped together.

Our team is working on reducing the number of shader pipelines, which we would like to be much lower than in Detroit: Become Human. So, when we were exposing the descriptor sets, we took the opportunity to reduce the number of descriptor sets, as mentioned earlier. Finally, we simply decided not to use vertex buffers anymore and to regroup the index buffers.

Vertex Buffers suppression
It is possible to eliminate the use of the vertex buffer (and thus the associated vertex declaration) by retrieving the vertex info (position, normal, uvs, etc.) directly from the shader via a regular buffer. Depending on the shader, the vertex format varies; for example, the number of uvs may change depending on the material. As we’ve decided to put all this info in one regular buffer, we will retrieve this data, depending on the vertex format defined by the material.

To do this, we created a generic access system to get our information, depending on the declaration we want for our shaders. So, we used a descriptor with a generic type (a uint, for example), and then it was up to us to reconstruct our data. Let’s say we want 3 float for a position, then we will have 3 uint loads by reinterpreting them as float.

The savings in vertex buffer and vertex declaration setup between each draw call improved the frame rate.

Merging Index Buffers (and others)
There are also other buffers which are specific to our engine, and which had to undergo this work of merging; for example, our buffer containing the information for the instances of our objects. The work to be done on this type of buffer is mostly always the same, and the only difficulties encountered are often linked to the architecture of our engine, which was not designed to carry out this type of processing.

Now, we still must merge the buffers for the indexes. This is the last step before we can start merging our draw calls as much as possible. Once this is done, we should be able to send all the primitives for the same shader, with a single multi draw call. This should give us a considerable performance gain in terms of frame display time.

Conclusion
With the work done on our interface, as well as on the different refactoring of our buffers, we have already made a big step towards a healthier engine architecture that respects the new standards brought by recent rendering APIs like Vulkan. Coupled with the work to reduce draw calls we are already gaining in performance on our engine.

The next step is to optimise our last buffer grouping tasks, and especially the one on our index buffers, to reach an optimal number of draw calls and maximise performance even more. Other optimisation work on various aspects of our engine is also underway, and will no doubt be the subject of a future article.

Kevin Havranek

Thanks
Ronan Marchalot, Éric Lescot, Grégory Lecop, Clément Roge, Alexandre Lamure, Nathan Biette, Bertrand Cavalie, Jean-Charles Perrier, Max Besnard, Lisa Pendse, Mélanie Sfar.

QD & ME - KEVIN HAVRANEK - 3D ENGINE PROGRAMMER

‹

›

JULIETTE DURAND: RESOURCING THE HUMAN