Global World

Long time with no posts …

I wondered how big I could make my world. Could I make it global ? What about the world generation time, how long would that take ?

What if it was nil, and the world was entirely procedurally generated ?

The components are already there;

  • Diamond square height generation
  • Trees, grass, rivers and erosion
  • DirectX11

There were two key things to overcome

  1. The system, once running, must have a baseline memory allocation that does not grow – consequently all items must be able to be regenerated on demand.
  2. The performance of key items like the diamond-square fractal must be really fast.

Regnerating Landscape

The first problem is solved by having a clear dependency concept behind any renderable object; small tiles need bigger tiles so they can get generated from the parent heights; trees need a heightmap so they can be located in the world. Rivers need heightmaps (and need to be able to deform the heightmaps). Linking all this up with the scene composition engine which had previously been able to assume all dependencies were available in the pre-generated landscape store was a big engineering challenge. The important structural changes were;

  • No component can demand a world resource, they can only request a world resource
  • Code must be resiliant to a resource not being available
  • Resource requests may cascade a number of further resource requests which may be carried out over multiple frames

Heightmap Generation Performance

I need the heightmap data on the CPU so I can query the meshes for runtime generation of trees, and pretty much anything else that needs to be height dependent, including the generation of a tiles vertex buffer. The CPU performance of the fractal based diamond square algorithm was just about OK, but the real issue came when trying to manipulate the resultant heightfield to overlay deformations (rivers , roads, building area platforms etc). The time required to query every height map point against a large set of deformation meshes was not acceptible.

The answer, like all things DirectX, was to use the shader to implement my diamond square fractal height generation. The steps to being able to implement this were;

  1. Read the undeformed parent height field in CPU.
  2. Prepare a texture for the child heightfield from one of the quadrants of the parent undeformed heightfield with every other pixel left empty, to be filled in the shader.
  3. Call the height generation shader passing the child height texture, and execute the diamond square code that fills in the missing pixels by reference to adjacent pixels.
  4. Record the output texture and read the data into the child tile class as the undeformed height map
  5. From another data structure describing landscape features like roads and rivers, obtain a vertex buffer which contains the deformations that the feature requires in terms of heightmap offsets
  6. Render the deformation vertex buffers over the top of the child heightmap
  7. Read back the newly deformed heightmap to the child tile CPU, to be used as the ‘true’ heightmap for all subsequent height queries and mesh generation.

All tiles have both a deformed and an undeformed heightmap data array stored. It took a long while to get to this solution, ultimately the problem was that the diamond square algorithm can only produce a new value with reference to the existing parent values – so it generates a very pleasant ‘random’ landscape, but it doesn’t allow for erosion, rivers, linear features, or any other ability to create absolute changes in height.

By storing the raw output of the diamond square algorithm, any deformations I need can be applied over the top of the raw heightfield and get the same perceived results at any resolution. Since my tile heightfields are only 129×129 pixels its not a lot of memory.

I immediately hit the problem of pipeline stalling when reading back the rendered heightfield data to the CPU, but a 2 frame delay injected between rendering the heightfield and reading it back was sufficient to remove the stutter. This problem is well documented and relates the underlying architecture and programming models of GPUs – although the programmer issues commands to the GPU these are stored in a queue and only executed when the GPU needs them to be – often several frames later than the programmer thinks. If the programmer reads data from the GPU back to the CPU then this causes the GPU to need to execute all the stored commands so that it can retrieve the required data, losing all the benefits of the parallel execution of the GPU with respect to the CPU. There is not DirectX API for calling back the CPU when a given GPU resource is available for reading, so most programmers just wait two frames and then retrieve it – it seems to work for me.







Shading Foliage

The majority of tree foliage is made of simplistic models where the leaves are simple quads of leaf textures. This enables the creation of lots of leaves with very few quads. A typical tree model is shown below, with face normals projecting from each leaf quad.


This gives a reasonable effect but when lit with a directional light, things go wrong. Firstly these models are all unculled – so we dont need to render the back-face of each of the leaf planes and save a huge bunch of vertex data because of this. Unfortunately that means each plane has only one normal, so seen from the reverse it has the same lighting reflectivity as when seen from the top.

To counter this its normal to check the face orientation in the pixel shader, and to invert the normal before calculating the light. In the pixel shader definition we can make use of the SV_ (system variables, or registers) which always exist but must be pulled into the pixel shader to be examined. You dont have to pass these SV_ values out of the vertex shader to accept them into the pixel shader.

PixelToSurface psTextureBump(psMeshIn psInput , bool isFrontFace : SV_IsFrontFace)
//Reverse the normal if its facing away from us.
if (!isFrontFace)
normal *= -1.0f; // rendering the back face, so I invert the normal

So now we have ‘accurate’ normals. But wait … theres a problem; if we use typical directional lighting code the underside of the leaves will now be dark with the upper face light. This isn’t what happens when we look at trees – they are somewhat translucent.

A second problem is that normal based lighting models are based on reflectivity – a plane facing the light is lit 100%; a plane at 45 degrees to the light reflects 50% of the light, and a plane 60 degrees from the light is lit … somewhat less. When you look at the tree image above its clear that for a light source any where near horizontal, that most of the leaf quads are at an extreme angle to the light source and are rendered very dark – in fact only the very few nearly vertical ones are rendered with realistic levels of reflectivity.

When you consider a tree its clear that the leaves, although arranged on branches, dont align with the branch in the way that is easily described in a normal. In fact a tree acts as a fractal in terms of reflection – pretty much all of the visible area of a tree has a good percentage of its leaf cover facing the viewer, no matter what the angle of the branches are.

To get a completely accurate lighting model would require the leaves to be individually rendered with normals, an impossible task.

A good approximation of tree lighting can be done by simply shading a trees pixels based on the depth from the light source – the ‘back’ of a tree is darker than the ‘front’ of a tree when viewed from the light source. To calculate this, we create a model which is centered on 0,0 and is one unit per side (i.e. in a box -1 to +1). I happen to store all my models scaled to this value anyway, so I can scale them at runtime without reference to their ‘natural’ size.

The following code snippet shows how I get the depth of a model


//Rotate vector around Y axis
float3 rotateY(float3 v, float angle)
float s = sin(angle);
float c = cos(angle);
return float3(c*v.x + s*v.z, v.y, -s*v.x + c*v.z);

// Assuming a model XZ centred around 0,0 and of a scale 1 -> -1 this returns the distance from the edge of the model when seen from the lights point of view.
// The value can be used to depth-shade a model. Returns 0 for nearest point, 1 for most distant point.
float modelDepthFromLight(float3 modelPosition)
// rotate further so that the model faces the light source. The light source is expressed in normalized vector, so
float lightAngle = atan2(-Param_DirectionalLightDirection.x, -Param_DirectionalLightDirection.z);
modelPosition = rotateY(modelPosition, lightAngle);
//float distToOrigin = (modelPosition.z + 1.0f) * 0.5f;
float distToOrigin = reverseLerp(modelPosition.z, -1.0f, 1.0f);
return 1.0f-distToOrigin;

By applying this as an input to the pixel shader I can choose to darken pixels which are further away from the light within the models bounding sphere. Although this isn’t ‘proper’ lighting, it does a good job of simulating self-shadowing, and works best on trees that are very symmetrical. If they are asymmetrical you can notice that a branch at the back of a tree is darkened but with no obvious tree cover in front of it to create that shadow.


Here are fir trees with the light source coming from top right. The trees are clearly darker on the left than the right and this can be seen nicely when viewed from behind


Pleasingly some of the edge leaves are picking up the light, but the mass of the tree is in darkness. This is a much cheaper way of similating self-shadowing and works within reason – however for very asymmetric models it does give some artefacts.

Better Models and Z-Fighting

I’ve added some better models now from TurboSquid (they do a nice range of <£5.00 medieval buildings) and added them to the landscape. Still not convincing but much better on the eye than the previous ones from the Sketchup store.


One consequence of using professionally constructed models was immediate and extensive Z-fighting. This is the well known problem that co-planar textures interfere with each other, flipping backwards and forwards between frames, because there is not enough world distance between them.

The problem is that model developers will tend to overlay textured quads on top of the basic model shape to create things like windows and doors, and roof textures. It takes less vertexes to add a simple slim box on the side of a building and paste the window texture on it, than to insert a window frame into the house mesh. Unfortunately the underlying house box model still exists beneath the window box. The diagram shows how a thin Window has been added to the house box (seen from the side)BetterModels2.jpg

This looks great in a modelling application but using a real-world rendering system there just isnt enough differentiation between the plane of the window and the plane of the house. The consequence is that the window keeps flicking between window and wall. This happens because the depth buffer, which keeps track of how far away from the camera a paricular pixel is, is stored as a 24bit number and that number represents the proportional distance between the near clipping plane and the far clipping plane that the pixel lies on. In a modelling application the near and far planes are going to encompass a very short distance; on a real-world application it could be up to 20,000 meters.

This proportional distance is stored as a logarithmic value, on the reasonable basis that the further something is from the camera, the less likely any depth differences are going to be visible to the end user. There is a huge amount of literature explaining Z-fighting on the web. The fixes for it are either;

  1. Avoiding using the hardware depth buffer entirely and write out your own calculation for depth into a texture buffer, and recycle that texture buffer back out on each render so the pixel shader can check the current pixel depth by sampling it.
  2. Do something else.

(1) is not recommended because hardware depth buffering is extremely fast and often takes place before your pixel shader is run, eliminating whole triangles where all three of their vertexes are further away than the recorded pixel depth.

So that leaves (2).

The simplest way I found to achieve a massive reduction in Z-fighting was the technique of reversing the depth buffer. There are three stages to this;

  1. When calculating the projection matrix, simply pass in the Far and Near clip values in reverse order;
  2. When clearing the depth buffer, clear it to 1,0 not 0,1
  3. When setting the depth comparison function, use GREATER_THAN not LESS_THAN

Step 1

float nearClip = ReverseDepthBuffer ? this.FarClippingDistance : this.NearClippingDistance;

float farClip = ReverseDepthBuffer ? this.NearClippingDistance : this.FarClippingDistance;

if (this.CameraType == enumCameraType.Perspective)


return Matrix.PerspectiveFovLH(







Step 2

if (reverseDepthBuffer)


this.Context.ClearDepthStencilView(dsv, SharpDX.Direct3D11.DepthStencilClearFlags.Depth, 0, 1);




this.Context.ClearDepthStencilView(dsv, SharpDX.Direct3D11.DepthStencilClearFlags.Depth, 1, 0);


I found this entirely eliminated my Z-fighting by more smoothly distributing the available depths over the field of view. This is very nicely illustrated in this NVIDIA article.


Settlement Toplogy with Grass and Trees

I’ve now combined the settlement generation with the grass generation I outlined way back. Although the buildings are still cartoonish, and the grass too uniform, the general outline is OK. A problem persists with the blending of the four different densities of grass (I use the Outerra method of doubling the grass blade width whilst halving the number of blades) – its easy to see in the YouTube video. By adjusting the terrain base texture with the same mottling interval as the grass planting distant grass sections dont need to be painted – they just (almost) seamlessly grow of the ground of the same colour as the grass blades.

One thing to work on is shadowing in grass – there are too many polygons being generated to depth test every one of them – so I think I need to add a ‘shadow pass’ to generate a shadow map of the current scene, not just a depth map. It shows where a tree’s shadow doesnt affect the underlying grass, but does affect the underlying landscape.


Settlement Topology with Trees

Following the previous post I could easily add a rotation element to the buldings so they always faced the road. Adding trees is relatively easy and I de-conflicted them from the building by creating a set of bounding spheres and checking for their intersect.


By creating a a set of parallel vertexes to describe the path line I am able to apply a texture blended across the path to prevent a hard edge.ParallelPathLines.jpg


I make the settlement base texture slightly different from the surround so I can spot it easily in the landscape – I need to do a bit more work to make it segue into the standard textures.




Settlement Topology

To generate settlements I overlay my landscape with a Voroni graph and select cells based on the following criteria

  • All the voroni points are within a given deviation in height so my settlements are on relatively flat land
  • The voroni sides contain either a river or a road – my settlements are no in the middle of nowhere
  • The voroni point heights are in the lower third of the landscape range – my settlements are lowland settlements.

Once I have a target site I generate  a random set of points within a bounding circle inside the Voroni cell. I then triangulate these and remove the underlying landscape vertexes and insert the new settlement vertexes. This overrides the natrual landscape with the settlement and allows me to vary the settlement footprint heights as needed and change the texture that it gets painted with. For now I am painting the settlement using a uniform mud texture to make it stand out, but will get more subtle later.

To create a settlement topology I take the settlement voroni graph and describe some paths using the following rules

  1. Take a hull voroni point
  2. Get a random adjacent voroni point, rejecting any other hull points. This will lead my path inward.
  3. Walk the adjacement voroni points iteratively until either (a) I hit another hull point, or (b) I hit an existing path point.

I use these path points as control points on a spline to generate a set of interconnected smooth curver paths. Again I texture these distinctly for debugging but will make them stand out less later. An example of the voroni graph with paths shown in red is below. The yellow points are building locations which are simply the tangents from the centre point of each path line.



The simplistic algorithm is a good start to generating topologically pleasing villages.


Taking control of Triangulation

Up until now I have been using the excellent library to carry out my trianglulation for models I generate myself. Models I buy from turbosquid are already triangulated so dont require any attention.

I’ve been running into problems trying to constrain the trinangulation algorithm to force certain triangle edges to be used; for instance in the case of roads, I definitely want the road edge line to be used irrespective if the triangulation algorithm could make a better (more equalateral) triangle.

The library does support a level of constraints, but its quite poorly documented and I wanted to understand much better the whole concept of Delaunay Triangulation and write a post on the practical implementation. There are quite a few C and C++ libraries out there, but little explanation of how they work or why they work the way they do – most of them concentrate on performance optimisation tricks, but I wanted just to be able to understand the basic process before getting into performance.

Step 1 – Definitions

A Delaunay Triangulation defines a set of triangles which join a set of points such that all the points belong to a triangle, and no triangle edges cross. A Delaunay Triangulation also fulfills the criteria that the triangles are as equalateral as possible. This is important in graphics programming because “long thin” triangles create visual artefacts and suffer from mathematical inaccuracies when carrying out hit tests etc. So the more equalateral the triangle is, the better it looks, the more evenly it tessellates and the easier it is to decimate (make more or less dense accordingly)

Heres a Delaunay triangulation of a mesh of 8 points.


Step 2 – The Method

Starting with an arbitrary set of 2D Points, collect them into a list (or other array type) structure.


Calculate an additional 3 points (we’ll name “Pf”) which, when made into a triangle, are further away from all the original points than they are to each other. This is called the Super Triangle – you could theoretically use infinity points to describe this, but you’ll get floating point inaccuracies from this, so just calculate the bounding box of the original point set and double it. This forms the first Triangle.

DelaunyTriangulator_FirstTriangle - Infinity.jpg

For the first Point (arbitrarily point 1) insert it into the super Triangle and split it into three resultant triangles. You now have three triangles and have used up one of your Points.

DelaunyTriangulator_FirstTriangle - Infinity - Point 1.jpg

Each of your Triangles must be stored in some kind of array, along with its Circumcircle; the center and radius of a circle on whose circumference all the points of the triangle fall. Its clear that them more equalateral the Triangle the tighter the Circumcircle will be to the Triangle area – if the triangle is very un-equalateral, the Circumcircle will need to be much larger.

For every subsequent Point

  1. Gather all the Triangles where the Point falls within their Circumcircle – i.e. the distance of point to the Circumcircle center is less than the Circumcircle radius. (this is where a lot of optimisation is concentrated on – for my case, I just iterate all the Triangles testing each one in turn). We’ll call this set of Triangles T0.
  2. For all the triangles in T0 work out which of their edges are internal – i.e. shared between other members of the set T0. These edges we’ll call E0.
  3. Delete all T0 triangles from the set of all Triangles.
  4. Join all the points of T0 that do not describe any of the E0 edges to the Point we are inserting, to form a set of new Triangles T1.
  5. Add all the new Triangles T1 to the set of all remaining triangles.

This process is illustrated having selected Point 0 and found two of the existing triangle circumcircles overlap Point 0; the green lines show the internal shared edges of those triangles (these get deleted). Consequently we create new triangles 9-0-10, 9-1-0, 8-1-0 and 8-0-10

DelaunyTriangulator_FirstTriangle - Infinity - E0.jpg

DelaunyTriangulator_FirstTriangle - Infinity - E0 - Copy.jpg

The key to the algorithm is that you just keep repeating the same basic move until you run out of points. You will end up with this;


Once you’ve processed every Point.

  1. Delete all Triangles that have as one of their points a point from the set “Pf” (8,9 and 10 in this example)
  2. What you have left is the Delaunay triangulation of your points. Almost….

DelaunyTriangulator_UnConstrained - Copy.jpg

Whereas the mesh is nice and is quite likely to be a Delaunay mesh, there are circumstances where the algorithm will join two triangles along an edge where a a shorter alternative edge existed (in this case the quad 0,1,7,6 has a rather acute angle on the corner 0,1,7, and would be better expressed as triangles 0,6,7 and 0,7,1. For a high quality “true” Delaunay mesh we should

  • Look at each pair of triangles that share a side.
  • If that resultant polygon is convex (you can test this by checking all the internal angles – they should all be less than 180 degrees – if they are not, a concavity exists) then calculate the length of both possible configurations of shared edge;
  • If the alternative shared edge is shorter than the existing one, flip the triangle definitions around so they share the shorter edge, delete the existing triangles, and insert the new pair.

Step 3 – Constraints

The above method will generate a convex polygon – but you have no control over the triangles generated. In many cases (not least of which is defining a concave hull) there are certain edges you want to pre-define which must be present in the eventual mesh, even though they are a sub-optimal Delaunay mesh. For example there may be some reason why we need triangle 3,4,6 to be represented in the eventual mesh, even though it would never get created using Delaunay


  1. Once triangulation has finished, process a list of Constrained Edges that you want to be in your eventual mesh.
  2. For each Constrained Edge (CE1) generate a circle and identify all the Triangles whose circumcircle intersects it. (this is an optimisation step – you could simply iterate all triangles). This is set T2.
  3. For each Triangle in T2 test each edge and if it crosses CE1 its add it to set T3 and delete the Triangle from the main mesh. In this case triangles 3,7,4 and 7,4,6 both intersect the edge 3,6 and get deleted from the mesh.DelaunyTriangulator_Constrained1.jpg
  4. Use a divide-and-conquor algorithm to generate a mesh joining all the triangles in T3 with the Constrained Edge CE1.

Ok – that last step was a bit of a mouthful. Whats a divide-and-conquor algorithm ? All the points of triangles T3 must lie either on one side or other of the CE1 line (line 3,6 in this example). So divide the points into their two constituent piles (Pile1 and Pile2) and add the two points of the constraint line CE1. Since both sets came from a Delaunay triangulation they will form two seperate convex polygons with edge CE1 forming one of the outside edges – just Delaunay both polygons and add the resultant triangles back into the main mesh.

This is hard to picture with just one triangle, so heres a better illustration; the blue line was a Contraint on a Delaunay mesh – all the red points belonged to triangles that intersected the blue line. The points are sorted into Pile1 (red) and Pile2 (yellow). Each set is triangulated using standard Delaunay to produce two convex polygons, but seperately. The subsequent mesh of both halves is no longer Delaunay, but honours the Constraint line.


Step 3 – Data, Optimizations.

You will need to work with double-precision data for all the mathematical elements of this algorithm – floats are just too imprecise. Many examples are quite prescriptive about the kinds of list structures to be used to efficiently parse the many triangles and points you will generate – you can experiment with these, but from the point of view of learning the method, I just stuck with a simple List<> and a couple of others; a Triangle class and a Vector2Double struct. I didn’t bother storing the triangle adjacencies, but did store the Triangle circumcircles in the Triangle class.

In terms of optimisations, a very good method of accellerating the initial Delaunay triangulation is to process all the points in spatial order – either top to bottom or left to right. When each triangle is generated and stored you can check the list of triangles to see if their circumcircles can need be considered any longer for future points – all future points will be further away in either the X or Y coordinate (depending on your choice of spatial order) so triangles whose circumcircles didn’t intersect for the current point and are left/below it, cannot now match any future point, so you no longer need to worry about them. This means the set of triangle/circle matches to do gets smaller and smaller for each subsequent point.