Long time with no posts …
I wondered how big I could make my world. Could I make it global ? What about the world generation time, how long would that take ?
What if it was nil, and the world was entirely procedurally generated ?
The components are already there;
- Diamond square height generation
- Trees, grass, rivers and erosion
There were two key things to overcome
- The system, once running, must have a baseline memory allocation that does not grow – consequently all items must be able to be regenerated on demand.
- The performance of key items like the diamond-square fractal must be really fast.
The first problem is solved by having a clear dependency concept behind any renderable object; small tiles need bigger tiles so they can get generated from the parent heights; trees need a heightmap so they can be located in the world. Rivers need heightmaps (and need to be able to deform the heightmaps). Linking all this up with the scene composition engine which had previously been able to assume all dependencies were available in the pre-generated landscape store was a big engineering challenge. The important structural changes were;
- No component can demand a world resource, they can only request a world resource
- Code must be resiliant to a resource not being available
- Resource requests may cascade a number of further resource requests which may be carried out over multiple frames
Heightmap Generation Performance
I need the heightmap data on the CPU so I can query the meshes for runtime generation of trees, and pretty much anything else that needs to be height dependent, including the generation of a tiles vertex buffer. The CPU performance of the fractal based diamond square algorithm was just about OK, but the real issue came when trying to manipulate the resultant heightfield to overlay deformations (rivers , roads, building area platforms etc). The time required to query every height map point against a large set of deformation meshes was not acceptible.
The answer, like all things DirectX, was to use the shader to implement my diamond square fractal height generation. The steps to being able to implement this were;
- Read the undeformed parent height field in CPU.
- Prepare a texture for the child heightfield from one of the quadrants of the parent undeformed heightfield with every other pixel left empty, to be filled in the shader.
- Call the height generation shader passing the child height texture, and execute the diamond square code that fills in the missing pixels by reference to adjacent pixels.
- Record the output texture and read the data into the child tile class as the undeformed height map
- From another data structure describing landscape features like roads and rivers, obtain a vertex buffer which contains the deformations that the feature requires in terms of heightmap offsets
- Render the deformation vertex buffers over the top of the child heightmap
- Read back the newly deformed heightmap to the child tile CPU, to be used as the ‘true’ heightmap for all subsequent height queries and mesh generation.
All tiles have both a deformed and an undeformed heightmap data array stored. It took a long while to get to this solution, ultimately the problem was that the diamond square algorithm can only produce a new value with reference to the existing parent values – so it generates a very pleasant ‘random’ landscape, but it doesn’t allow for erosion, rivers, linear features, or any other ability to create absolute changes in height.
By storing the raw output of the diamond square algorithm, any deformations I need can be applied over the top of the raw heightfield and get the same perceived results at any resolution. Since my tile heightfields are only 129×129 pixels its not a lot of memory.
I immediately hit the problem of pipeline stalling when reading back the rendered heightfield data to the CPU, but a 2 frame delay injected between rendering the heightfield and reading it back was sufficient to remove the stutter. This problem is well documented and relates the underlying architecture and programming models of GPUs – although the programmer issues commands to the GPU these are stored in a queue and only executed when the GPU needs them to be – often several frames later than the programmer thinks. If the programmer reads data from the GPU back to the CPU then this causes the GPU to need to execute all the stored commands so that it can retrieve the required data, losing all the benefits of the parallel execution of the GPU with respect to the CPU. There is not DirectX API for calling back the CPU when a given GPU resource is available for reading, so most programmers just wait two frames and then retrieve it – it seems to work for me.