GPUGame | Casual Distraction

About

Introduction

GPUGame occurs around the intersection of Tech Art and Engineering. It utilizes an approach that implements all collision detection and AI inside of a persistent Niagara Particle System. This enables significantly higher performance when rendering large 3d animated hordes. For example, we can easily render over 1000 zombies on the screen at over 120 fps+. It is particularly amazing to see this 3d horde rendered in PCVR.

Overview

All enemies and bullets are added to a persistent Niagara System we call GPUGame. In this article, we will sometimes refer to the traditional CPU based Game thread and Game framework stuff as CPUGame. Data is sent between GPUGame and CPUGame using Niagara Data Channels (NDC). GPUGame uses the built-in Niagara Mesh Renderers to directly render static meshes that are animated using Bone-based Vertex Animation. It is important to emphasize there is no need to send any render data out of GPUGame - it is already on the GPU after all. (We leverage the material from the plugin Vertex Animation Manager, but this is not required).

GPUGame resolves collision detection using a Niagara Neighborgrid. (See Ghislain's excellent Neighborgrid tutorial). Basically, each enemy or bullet is represented as a particle. Every frame, each particle is injected into a 2d grid "spatial partition". Then a second pass over the particles evaluates each particle's relevant neighbors and determine the appropriate response (eg. apply depenetration, apply damage, kill particle, etc). Any important information can be exported from GPUGame to CPUGame through an NDC Notify mechanism.

It is worth emphasizing at this point just how much faster the gpu can be than the cpu. It is easy for developers to underestimate how significant this difference is. There is a reason that all 3d graphics in the last 30 years require a GPU. There is a reason all LLM training occurs on a GPU. And that reason is the fact that the GPU can evaluate large amounts of computation simultaneously in parallel. Remember, that while the threads on a general purpose cpu each run a different program concurrently, entire flights of threads on a gpu run concurrently, each executing multiple threads at the same time. This difference enables much higher parallel processing power. It is not just like twice as fast or even 10 times as fast. It is sometimes literally an order of magnitude faster. The key to tapping this is by utizing parallel algorithms. If we can define our collision and AI in a way that can run in a compute shader, we will be able to leverage this magic for our game.

Neighborgrid
Collision Detection and Response

Collision is handled using Niagara Neighborgrid. This system involves creating some fixed GPU memory to store “neighbor cell lists”. A pass over all particles each frame injects each particle into the appropriate neighbor list in this grid. Then, in a second pass over all the particles, can use these neighbor lists to quickly identify any intersecting particles using a sphere/sphere check.

Bone based Vertex Animation

Animation is a core aspect of our system. After all, a zombie needs to shamble. Skeletal animation doesn't scale to the size of horde we want, so instead we leverage bone-based vertex animation. In general this process involves baking the keyframes of a regular skeletal mesh animation into textures that are read on the gpu. These textures store the bone positions and rotations and used to offset vertex positions using WPO at runtime. It is worth emphasizing the textures are very small, and store the #Frames * #Bones. Much smaller than say an alembic cache or traditional per vertex animation.

Looping - Typically we want the animation to loop. But some animations are one-offs that need to run once and then return to a certain state. eg locomotion state (much like a fire and forget Animation Montage). These can be handled by defining an animation notify at the end of the animation.

Animation Notifies - It is often useful to define notification times in an animation that get triggered at key points. The “current animation playhead” for each particle is compared to stored notify times and used to trigger key events as animation notifies. E.g. Attack, or Stop Animation. Part 4 explains how these notifies are sent to the CPU.

Root Motion - Root motion is extremely useful to apply animation that has complex motion. For example a climbing dismount.

Navigation with FlowMap and HeightMap

Initially, our zombies were kind of brainless, as zombies often are. We update GPUGame with the player positions every frame, a zombie would be spawned with a TargetIndex to a specific player target, and they would march relentlessly toward that target.

This works great for chasing down nearby players, but is less useful when the zombie is far from the player and blocked by walls.

We experimented with some simple collision detection that would prevent the zombie walking through walls by depenetrating them. This improved things a bit, but still did not give the intelligent pathing through the world we were looking for.

What we needed was a flow map. The flowmap is a precalculated “blanket” that covers the world, and stores the direction a zombie should follow.

The flowmap is created ahead of time, by tracing into the map and determining the nearest spline and nearest blocking collision.

If not close to its target, the zombie can use the flowmap direction to get a sense of where to go.

Notifications

As mentioned before, we are using NDC for the communication channels between CPUGame and GPUGame. On the client, both blueprint and c++ can read and write NDC. A Niagara GPU Particle system can also read and write NDC data us a Niagara Data Interface.

There are 2 directions notifications can travel:

CPU->GPU

These are typically spawn events eg. SpawnZombie, SpawnBullet.

GPU->CPU

Since collision is resolved on the GPU, these typically involve key lifetime events such as ZombieKilled, ZombieDamage, BulletKilled, etc…

While playing the game, if a zombie is killed on the GPU, a KillZombie notify is sent to the CPU. The CPU will update the score, add currency, play a sound, play a blood VFX, and spawn a ragdoll.

The Notification Rabbit Hole:

ZombieID: Sometimes, we need a way to send data into or out of GPUGame that is specific to an existing particle. EG Kill a specific zombie, detonate a rocket, etc… In particular, we need to be able to update or kill a specific zombie particle. To achieve this, we added a ZombieID attribute to all particles. (EntityID would probably have been a better name for this). We spawn all particles using NDC, so this was simply a matter of setting this ZombieID when writing out the NDC data. (Zombies are only spawned on the server in the GameMode.) The GameMode keeps a monotonically increasing int, and uses that as ZombieID when spawning. Each spawn is then replicated to the clients with the correct ZombieID. So the net result is that every zombie on every client has the same correct unique ZombieID. Bullets could also use utilize this id. Since bullets can be spawned on each client, we need to treat their ID in a uniques space. So the [EnemyType + ZombieID] becomes the unique identifier.