UE4 - Overview of Static Mesh Optimization Options
As you start to build out a larger level, you begin to realize the performance costs of rendering a large number of static meshes. Rendering a large number of static meshes can take a lot of processing power and slow your framerate to a crawl. Unreal supports a bewildering array of choices to help optimize your static meshes.
Let's review a few important things to consider before reviewing available solutions:
When rendering static meshes, a common cpu cost is the actual function call to render the mesh. Under the covers, this draw call often contains a reference to a VBO already loaded on the gpu, as well as transform information where to render the mesh. The call itself does not have much overhead as the real work is done by the gpu, but making this call a large number of times certainly adds up. Doing this thousands of times every frame causes cpu driver overhead. For example, imagine a wall that is made of of 10000 individual bricks. This would result in 10000 draw calls. Incidentally, the driver overhead of a draw call is one of the main things addressed in the newer graphics APIs like Vulkan and Metal, but it is still something to focus optimization efforts on.
“The main reason to make fewer draw calls is that graphics hardware can transform and render triangles much faster than you can submit them. If you submit few triangles with each call, you will be completely bound by the CPU and the GPU will be mostly idle. The CPU won’t be able to feed the GPU fast enough.”
One way to view draw calls is: stat scenerendering
Notice the Mesh Draw Calls and the Static List Draw Calls in the lower section.
(Static List Draw Calls are a subset of Mesh Draw Calls)
Often, a single mesh might have several materials applied. e.g. For a building, the windows, bricks, doors might all utilize a separate material on the same mesh. The number of materials has a direct impact on the render time for that mesh. Changing materials typically creates an additional draw call. In addition, performance varies widely among different types of materials. The gpu only supports a finite number of bound textures, and the engine works to optimize the current working texture set and minimize loading new textures.
One way to help view material performance is to use the ShaderComplexity viewmode (Alt-8)
This highlights the relative cost of different parts of the scene - (red==costly, green==cheap)
The fastest mesh to render is no mesh at all. The engine culls meshes that are not in the view frustum or that are occluded. See this great article by Tim Hobson: http://timhobsonue4.snappages.com/culling-visibilityculling.htm
Since objects that are in the view frustum of the camera will be rendered, very large meshes will be rendered even if only a small portion of them are visible. Thus, keeping lots of small pieces helps with culling, but increase the number of draw calls.
Some types of culling include:
Precomputed Visiblity Volumes
See Tim's article for more details.
The command stat initviews helps show statistics relevant to culling:
In this scene, notice that while 849 primitives are processed, 776 are frustum culled. This is because the camera is at the edge of the map looking away.
Another useful command is: FREEZERENDERING
This commands freezes the culling state and lets you fly around to see what is actually being rendered.
The image on the left was taken when flying around after freezing the camera when it was on the streetcorner looking at the building on the corner. The image on the right shows the same view unfrozen. This helps you visualize the effects of culling.
Batching - One of the techniques that Unity implements is referred to as static or dynamic batching. I have seen a few posts requesting this feature in UE. https://docs.unity3d.com/Manual/DrawCallBatching.html
Static batching refers to combining vertex and index buffers - this is similar (but not identical) to automated merging actors in UE. For example, Unity Static batching can still cull individually. Static batching costs additional memory and overhead.
Dynamic batching refers to combining several similar small meshes on the cpu and draw them all at once. It is really only an advantage if that work is smaller than doing a draw call. AFAIK, this approach has no direct counterpart in UE. It is kind of like HISM, but lets the cpu do the work every frame. In general, utilizing HISM provides better benefits but has more conditional constraints.
1) Merge Actors - A built-in tool that is no longer experimental. Merge actors will combine multiple static meshes into a single mesh. It can also merge materials into single texture atlases.. https://docs.unrealengine.com/latest/INT/Engine/Actors/Merging/
It can be accessed from the ..Window..DeveloperTools menu.
The primary benefit of merging actors is that you can now render them in a single call. The downside is it is no longer culled effectively, meaning if any part of a large object is visible, then the entire object is rendered.
One useful technique is to build out a level with modular pieces, then make a pass to combine them into bigger chunks. EG Modular building pieces can be combined into single mesh.
2) Level of detail LOD - The farther a mesh is from the camera, the smaller the size on screen and the number of pixels it occupies. A mesh with less detail (Level Of Detail) can be used when farther away. These are setup in the in the static mesh inspector. You can build LOD simplified meshes externally and import them. https://docs.unrealengine.com/latest/INT/Engine/Content/Types/StaticMeshes/HowTo/LODs/
The higher density mesh is used close up, while the lower density proxy mesh is used at farther distances.
The primary benefit of LOD is that you no longer waste gpu bandwidth processing vertices that have no visible value. Since there are less vertices, the burden on the gpu is reduced.
3) Hierarchical Level of Detail - HLODs combine multiple Static Mesh Actors into clusters, then for each cluster, a combined Static Mesh Actor proxy is created. At longer view distances, the proxy is rendered. In contrast, regular LOD is applied to individual static meshes. So not only does HLOD reduce the draw calls, it also renders less vertices when viewed at distance. 4.14 includes a new built in tool that will create these islands of grouped actors, and generate combined meshes for them.
The tool functions similar to light baking - a pass is made to group actors into islands, then the mesh proxies for these islands are created in a single pass.
The primary downside is that it takes a loooong time to generate all the HLODs.
Epic did a great Twitch on HLOD https://www.youtube.com/watch?v=WhcxGbKWdbI&t=3028s
It is worth mentioning Simplygon at this point. Simplygon is a magical tool that helps with creating LODs. It is currently avaliable for free on the marketplace. It integrates with Unreal (and is probably the source of some of the builtin features above). They have an indie friendly licensing model: 2% royalty after first $25K per quarter revenue. I highly recommend at least looking in to it.
4) Instanced Static Mesh (ISM) - The actual vertices of a static mesh are already stored on the gpu memory in a VertexBuffer. The draw call consists of telling the gpu to render the mesh at a specific transfrom and orientation. When rendering a batch of identical meshes, a further possible optimization would be to store the actual transforms on the gpu as well. Using this approach, we can render all of them with a single drawcall. An instanced static mesh leverages this approach.
Unreal lets you add instances to a InstancedStaticMeshCompoonent. You can see these instances in the inspector.
One downside to using Instanced Static Meshes is that they are not individually culled. Another is that they are static and cannot be moved at runtime. Finally, you cannot leverage LODs.
But the main downsides to instanced static meshes is the poor tooling support in UE. To mitigate this, I highly recommend using the Instance Tool by Mary Nate. https://www.unrealengine.com/marketplace/instance-tool
The instance tool adds a new mode that lets you easily convert between Static Meshes and Instanced Static Meshes, as well as utilize the transform tool for the ISMs. This tool works with both ISM and HISM.
5) Hierarchical Instance Static Mesh (HISM) - These function very similar to Instanced Static Meshes, but support LODs and distance culling. LODs and distance culling come with additional performance cost, which is why both approaches are available.
Todays GPU is , quite franky, magical and amazing. It can render a huge number of complex meshes. But even the most powerful gpu has limitations. When rendering a large scene - e,g, a City Block, Dense Jungle, or Ancient Ruins, even the most powerful gpu can be slowed to a crawl. Especially when it has a zombie horde to render on top.
Optimizing these scenes requires leveraging a somewhat bewildering array of techniques. When working on larger scenes, it becomes important to focus on how to get all the data to the GPU in an efficient manner. The goal of this post was to at least introduce some of these options.