A New DB ?

As part of this work I have converted my data store from SQL Server Compact Edition to the key-value pair database by Raptor, from NuGet and documented here. Although its not noticibly faster (data load times were never a limiting factor on my engine) it can store more than SQL CEs 500mb of data – which I was bumping into when generating a full data set.

Its pretty easy to use and works just like a persistent dictionary, so would be very easy to swap out for another no-SQL database in the future. Happily I took the precaution of hiding my SQL CE implementation behind an IDictionary interface so there was zero impact of swapping it over.

Combining this with a nice fast generic QuadTree implementation found here I can now genreate, store and carry out two dimensional indexes of a large object graph for landscape decoration (shrubs, rocks etc.) Rather than use a database index  to carry out 2D rectangle searches I just use a QuadTree index in memory and save it to the database as an object in its own right. When its deserialized into memory its as fast as it can be and doesn’t rely on the database provider to implement spatial indexing.

During profiling the method most called, and one which I can’t really improve on is the Rectangle intersect function. This checks whether one rectangle (typically based on the uses View Frustum) intersects a rectangle of interest (typically a planted region, or individual plant). This is at the heart of the 2D quadtree index and is called many, many times. To be sure I had the best implementation I just used the System.Drawing.RectangleF implementation hoping that the boffins at Microsoft had got it right. This only really shows up on the profile becasue it is called so many times – not because it is inherently inefficient. 60 Frames a second I parse the entire object QuadTree against the current View Frustum and reject all those rectangles which dont intersect the View Frustum bounding box. I recurse into those that can be seen until I have a list of visible objects.

Something tells me that I can actually cache the results of my visibility check and only refresh it if the users viewpoint changes significantly, but I’ll wait to do more profiling before I dive into that solution. The fact that VS.net 2015 Community Edition included the excellent Microsoft Profiler for free really helps the process of understanding the code performance.

After all the work to convert from SQL Server Compact Edition to RaptorDB I got a nasty shock when I examined the file size of my stored data – 15GB. It took a huge amount of time to open (I now understand it reads all its indexes into RAM when the db is opened) and once loaded was indeed really fast; but the slow open time and massive disk size just made it a poor choice

Scores : RaptorDB 0 : SQL Server Compact Edition 1

This made me search for a more reliable database, in-process, just to use as a key-value store. I stumbed across the database built into every Windows OS – “Jet”. Originally a derivative from Microsoft’s acquisition of the FoxPro database technology this is the database format used to store the Windows Registry key-value store, so is pretty robust and performant. The API is a pig, but luckily a .net wrapper and associated IDictionary implementation is available on NuGet. It was a simple drop-in for the RaptorDB and once adjusted to take complex type (explained here ) rather than just simple types, could easily replace it.

Scores : RaptorDB 0 : ESENT 1

The only other thing to look out for is the built-in use of BinaryFormatter to serialize all the data being stored. Since I already use my own custom serializer and store/retrieve only byte arrays, I wanted to bypass that and luckily it implements a plugin field save/retrieve delegate to allow me to do just that.




Tesellation Shader with Noise

In an earlier post I used Directx11 tesellation shaders to generate a high frequency of landscape triangles within the shader, allowing my landscape tiles to generate much higher detail when close to the camera. Other than river surfaces I also mentioned that I hadn’t actually used the extra geometry density for anything yet.

Now I have.

Simply by sampling a perlin noise texture at two frequencies to add extra height within the tesellation shader I can generate a much more interesting landscape close up.


The various lumps and bumps in the foreground here are entirely generated using the tesellation shader and perlin noise. tess_noise2.jpg

In order to prevent obvious visual popping I generate a bump map from the perlin noise at design time, and sample that with the same frequency as the height undulation. I then combine that bump sample with the landscapes more basic normal map to generate a combined normal for light rendering.

Combining normals is not simply a matter of adding both together and renormlizing – this would give an average normal, not a combined normal. Luckily someone has already solved this for me. See here for the details http://blog.selfshadow.com/publications/blending-in-detail/

In order to maintain a good correspondance between close and distant lighting I make sure that I use a high tessellation factor when rendering my design-time “drapes” and normal maps for the distant landscape tiles. When I render them as a simple texture-with-normal map in the distance it looks like a high detail image – the darkening effect of the distant normals gives an illusion of a lot of height variance that doesnt actually exist in the geometry.

Even with the varying height generated from the noise samplers I still place the vegetation directly using a pre-calculated Y coordinate rather than rely on height map sampling in the shader. I just repeat the height undulation code written in HLSL within my C# design pipeline to get an accurate measurement of how high each tree will be at runtime. There are a couple of reasons for this.

  • Trees placed on a slope that use a heightmap to determine their distance from the ground will tend to “skew” in the horizontal plane – the front of the tree is further downhill than the back of the tree and every vertex is offset in the vertical plane based on how high off the ground it is. In the real world trees dont behave like that – they grow vertically without reference to the slope of the ground on either side.
  • Less texture lookups at runtime, traded off with an extra float passed in the Vertex instance stream. Given that the Vertex instance stream is a Matrix, this actually doesn’t cost me anything.

Video on YouTube here.



Trees, sparse foliage, rivers and paths

A quick video of all of the pieces put together. The landscape now uses a tesellation shader to generate higher levels of detail but I dont use those extra triangles at the moment other than on the river surface, which is animated from a height map. The sea doesn’t yet use a tessellation shader but a fixed concentric circle mesh which achieves the same result but with hard coding.

The trees and shrubbery are generated usign the same techniques, planted using a Voroni cell map. The textures over the landscape have been colour matched so they are not so obvious in transition – but I may have over-done this as it all looks the same basic colour now. Back to the drawing board on that.

Youtube Video



Tesellation Shaders and Rivers

Having made the move to DX11 and SharpDX I can now use Tesellation Shaders to make my landscape more interesting without having enormous Vertex Buffers. The principles of tessellation shaders are;

  1. Your vertexes are passed into a traditional VertexShader. This VS simply passes through the vertexes to the next stage, and typically does no transformations. This is becasue the vertexes you submit might not be rendered.
  2. The vertexes are passed through to a Hull Shader, where you can decide to discard the set of vertexes that are passed in. This stage is the first time you see the new features of using tessellation – instead of operating on a single vertex at a time, you get passed an array of vertexes which form a triangle. Unless you want to do some culling here, you typically return everything you are passed, unaltered.
  3. The fun starts when your vertexes patches (array of 3 vertexes) are passed into the Domain Shader. Here the DS is also passed a set of barycentric coordinates. The job of the domain shader is to output a new Vertex based on the three passed in, using the barycentric cordinates passed. This is where the majority of the work is done.

In the Domain Shader you end up doing all the work typically done in the Vertex Shader; matrix calculations etc. Traditionally once the vertex leaves the VS is values are interoplated across the triangle and the resultant data structure is passed into the Pixel Shader. Using the DS you are in charge of interpolating the data from the three patch points in any way you see fit. The patch points themselves are never acutally included as vertexes for rendering, and are simply used as control points.

A landcape made of large uniform triangles can be tessellated within the DS to include lots of new sub-triangles, and using noise sampling or other techniques you can make the new vectors have different heights to those that would have been interpolated across the larger triangle. In fact you can change everything – the texture, coordinates etc.

The key to tessellation shaders is to use the extra triangles to generate some useful content and not just to subdivide triangles for the sake of it. The problem is that the introduction of new detail needs to be done in a way which wont affect other rendered object which might not use tesellation. For example a tree will be placed on a landscape based on a sample of a height map. If the landscape introduces tessellation and generates new bumps and other intersting features, that tree placement will be wrong and may sink into the ground or hover above it.

This is nice on a landscape but really comes into its own when painting water surfaces. These have a large appetite for vertexes, and a need to vary the vertexes over a period of time to provide animation. The DS can general a dense mat of sub-triangles and use a height map for the water surface to generate ripples and normals.

Careful selection of the kinds of deformation introduced into the DS is important to getting visually good results. Luckily this doesnt apply to water surfaces since nothing is dependent on their height. The image below uses a linear tessellation algorithm so closer triangles are subdivided more heavily than distant ones. The bump, height and texture samplers are all linked to the game time and so change their sample coordinate to give the illusion of  a flowing river.


SharpDX Resources

As a long term .net programmer I am used to a managed memory runtime environment and know all about the Dispose pattern used to release managed resources. However for some time I’ve been concerned with the memory leaks I was seeing in my SharpDX based 3D landscape.

This post is a reminder to all SharpDX .net programmers that Dispose() implemented in SharpDX objects does not work like the standard Dispose pattern for .net.

All SharpDX objects are wrapped COM objects and are subject to proper disposal via reference counting. The .net runtime caters for all reference counts for .net classes and collects their resources as soon as there are no references to them in the managed call stack (and thread local storage). COM places the requirement to manage the lifetime of objects firmly on the programmer, who must add and remove reference counts to objects as new references to them are stored.

Using the standard .net Dispose pattern allows me to handle the COM de-reference without any problems; the SharpDX documentation tells us that “Dispose” is used to dereference the counter.

What I forgot is that COM references aren’t just created when you instantiate a new instance, but also if you receive an instance reference from the SharpDX factory. So the general pattern;

SharpDX.DirectX11.Texture2D tex = new SharpDX.DirectX11.Texture2D();
.. do some work

works perfectly Ok. What isn’t so obvious is that;

SharpDX.Direct3D11.DepthStencilView dsv;
SharpDX.Direct3D11.RenderTargetView[] rtvs = 
   this.Context.OutputMerger.GetRenderTargets(8, out dsv);

to query a list of the existing render targets, also increments the reference count of those RenderTargetView instances, and the DepthStencilView. So you need to remember to dispose of those too.

As a .net programmer this is odd, but manageable. What is particulary weird feeling is that Dispose() is designed in .net to be used when the programmer knows they have the last instance reference to a managed class. If you have four references to a single instance, you clearly should not call Dispose() until you know that you are holding the very last instance, otherwise any other reference pointers will suddenly find themselves holding disposed classes.

When using Dispose with the SharpDX classes you call it as soon as you know your particular reference is going to go out of scope – irrespective of how many other references to the same instance exists. Dispose() doesn’t acutally release any resources – it simply decrements the COM reference count. The SharpDX class factory does the resource release and COM deallocation.

Becuase it is not obvious when looking at code whether you are using a .net managed class or a wrapped SharpDX COM class, and because Dispose() means something entirely differnet in each case I decided to create extension methods for all the SharpDX classes;

/// <summary>
/// Hides the confusion between Dispose (a .net concept) and handle counting (a com concept).
/// </summary>
/// <param name="leasedObject"></param>
public static void ReleaseReference(this SharpDX.Direct3D11.DeviceContext leasedObject)

This means my code looks like this;

SharpDX.DirectX11.Texture2D tex = new SharpDX.DirectX11.Texture2D();
.. do some work

Which is a lot more self descriptive. I like making code more maintainable this makes me stop and think when I look at it, and I instantly know whats going on. I wish the SharpDX bods had not used the existing Dispose() method to do their reference counting – but I can see why they did.

More Grass, Denser Grass

The CodeMasters blog entry http://blog.codemasters.com/grid/10/rendering-fields-of-grass-in-grid-autosport/ made me think again about my Grass rendering using a Geometry Shader. I had followed the suggestions from Outerra http://outerra.blogspot.co.uk/2012/05/procedural-grass-rendering.html to generate my grass but CodeMasters suggested combining this approach with simple billboards.

Instead of each geometry shader triangle strip representing a single blade of grass, why not just output a quad with a nicely detailed, colourful, texture. The textured quad might represent 5 or 10 blades of grass, rotated and scaled. This is a massive increase in grass density with better art, than the Outerra model.

With a bit of texture atlasing of various textures I could generate a very varied meadow with only some basic changes to my shader – and use less vertexes per location as well. Although the end result is clearly more “billboard” than “geometry” it still achieves a much higher density of foliage.

Here is the outcome with a four texture atlas




This is animated in the normal way using some perlin noise textures to generate movement. The density of the grass is overwhelming here – it looks like a forest. Changing the texture atlas to something more “grassy”;



Mmm. Thats nice.


Meadows underplating shadowed trees, distant ocean and mountains. 80 fps.


This is quite a difficult issue to deal with for a large landscape. The basics of shadow mapping are well documented https://msdn.microsoft.com/en-gb/library/windows/desktop/ee416324(v=vs.85).aspx() and briefly;

  1. Draw your scene in two passes. The first pass is drawn from the location of the light source, and the second from the location of the viewer.
  2. On the first (light) pass, you render to an offscreen texture and only actually draw the depth of the pixel not the colour of the scene. The depth is calculated as output.Depth = (ps_input.Position.z / ps_input.Position.w);
  3. On the second (color) pass, you render your scene as normal to the viewport. However you pass in to your shader the texture you drew in Step 2 along with the View Matrix you used when you drew step 2.
  4. In the pixel shader, read the correct depth pixel from the depth texture you generated in Step 2 and compare it with the depth you calculate for your current pixel – if the depth stored in the texture is less than the depth you have calculated then the pixel should be shaded darker – it is in shadow.

This is fairly straightforward, but how does it work ? How do you actually use the depth picture you drew in step 2 ? The first step is to work out which pixel in your standard rendering pass (step 3) is equivalent to the same pixel you drew in step 2. Since both were rendered from different view points (and using a different projection matrix typically) the actual pixel being drawn in your Pixel Shader has no relationship to the one you drew in the earlier light pass.

The key is to pass into the vertex shader the View and Projection matrix values you used to generate your light pass in Step 2. You then calculate the vertexes position to generate a value which would have been the same for that vertex in the Step 2 vertex shader.

Vertex Shader Fragment

In Step 2 (depth pass) you would have calculated;

output.Position = mul(vertexPosition, Param_WorldMatrix);
output.Position = mul(output.Position, Param_ViewMatrix);
output.Position = mul(output.Position, Param_ProjectionMatrix

so in Step 3 (color pass) you need to calculated the same value, and pass in the matrixes you used in the Step 2 as a new set of parameters “Param_LightXXXXMatrix”

output.LightViewPosition = mul(vertexPosition, Param_WorldMatrix);
output.LightViewPosition = mul(output.LightViewPosition, Param_LightViewMatrix);
output.LightViewPosition = mul(output.LightViewPosition, Param_LightProjectionMatrix);

So your pixel shader will now receive the parameter LightViewPosition as well as the Position you would normally calculate in your vertex shader for this pass. The clever part comes in the pixel shader where you use the passed in LightViewPosition to generate a texture coordinate that can be used to read the correct pixel from the depth map texture;

Pixel Shader Fragment

This calculation uses the LightViewPosition you calculated in the vertex shader and generates a coordinate correct for sampling the depth map texture.

float2 projectTexCoord;
projectTexCoord.x = ((LightViewPosition.x / LightViewPosition.w) / 2.0f) + 0.5f;
projectTexCoord.y = ((-LightViewPosition.y / LightViewPosition.w) / 2.0f) + 0.5f;

This is called Texture Projection and this trick can be used anywhere you have a texture that is generated via a different View and Projection matrix.

Once you’ve got the texture coordinate for your depth map, you just read out the depth you recorded in Step 2 and compare it to the value you are currently about to write to your color pass.

Pixel Shader Fragment

So now we can sample the depth texture and read back the depth we calculated when we generated the same pixel from the lights position.

float realDistance = (LightViewPosition.z / LightViewPosition.w);
// Note the use of the free interpolated comparison method.
return depthMap.SampleCmp(DepthMapSampler, projectTexCoord, realDistance - depthBias);

So whats with the special “SampleCmp” ? We expect, because we used a different location and projection matrix for drawing the depth map, that the pixel we sample from the depth map wont be an exact 1:1 match for the pixel we are drawing to the scene. It may be skewed or scaled such that it represents a slightly different world position. Typically you would do a PCF set of four samples around the point you have calculated, and take the average of that calculation – this will give a nice anti-aliasing. However we need to consider that the depth map does not contain color – it contains depths. Trying to use anti-alias concepts on a depth map would generate nonsense. Two pixels lying next to  each other in the depth map might represent depth calculations for two objects very far apart in world space – a nearby object and a really distant object might record wildly different depth values only one pixel apart from each other.

Luckily in Shader Model 5 the designers gave us the new SampleCmp which allows us to do a four-tap PCF sample in the hardware but instead of returning an weighted average of the values it samples, it gives us a weighted average of those pixels that are less than a depth value we pass in (the third parameter). This is much more useful and gives our shadows a nice soft edge.

Shaking Shadows

aka Shadow Trembling, Shadow Shaking etc.

This is visible when you swivel the viewpoint or move the world viewpoint for your camera. Its generated because of the same problem which caused us to use the SampleCmp function described previously. The shadow map does not have a 1:1 mapping between its pixels and the pixels being rendered for the color pass. Slight variations in the floating point calculations between the light projection and the camera projection matrixes lead to pixels moving in and out of shadow seemingly at random round the edges of a shaded area.

This has a relatively simple workaround – dont change the light position or orientation other than in whole pixel steps. This is completely documented in the link referenced earlier. Implementing the “Stable Light Frustrum” calculations has an awesome benefit – becasuet the light matrixes dont change every time the camera matrixes change, you can afford to redraw your shadow map once every 10 or 20 frames (or when the camera substantially moves orientation or location). This means you can go to town on the GPU cost of calculating the shadows, bringing in multiple cascading shadows maps into play, but recalculating only the very nearest ones and then only quite infrequently.


These examples use false color to indicate which of the three shadow maps is being used to calculate the shadows;


Here with more natural colours