The vast majority of visual effects you see in games today depend on the clever use of lighting and shadows – without them, games would be dull and lifeless. In this fourth part of our deep look at 3D game rendering, we'll focus on what happens to a 3D world alongside processing vertices and applying textures. It once again involves a lot of math and a sound grasp of the fundamentals of optics.

We'll dive right in to see how this all works. If this is your first time checking out our 3D rendering series, we'd recommend starting at the beginning with our 3D Game Rendering 101 which is a basic guide to how one frame of gaming goodness is made. From there we've been working every aspect of rendering in the articles below...

Recap

So far in the series we've covered the key aspects of how shapes in a scene are moved and manipulated, transformed from a 3-dimensional space into a flat grid of pixels, and how textures are applied to those shapes. For many years, this was the bulk of the rendering process, and we can see this by going back to 1993 and firing up id Software's Doom.

The use of light and shadow in this title is very primitive by modern standards: no sources of light are accounted for, as each surface is given an overall, or ambient, color value using the vertices. Any sense of shadows just comes from some clever use of textures and the designer's choice of ambient color.

This wasn't because the programmers weren't up to the task: PC hardware of that era consisted of 66 MHz (that's 0.066 GHz!) CPUs, 40 MB hard drives, and 512 kB graphics cards that had minimal 3D capabilities. Fast forward 23 years, and it's a very different story in the acclaimed reboot.

There's a wealth of technology used to render this frame, boasting cool phrases such as screen space ambient occlusion, pre-pass depth mapping, Bokeh blur filters, tone mapping operators, and so on. The lighting and shadowing of every surface is dynamic: constantly changing with environmental conditions and the player's actions.

Since everything to do with 3D rendering involves math (and a lot of it!), we better get stuck into what's going on behind the scenes of any modern game.

The math of lighting

To do any of this properly, you need to be able to accurately model how light behaves as it interacts with different surfaces. You might be surprised to know that the origins of this dates back to the 18th century, and a man called Johann Heinrich Lambert.

In 1760, the Swiss scientist released a book called Photometria -- in it, he set down a raft of fundamental rules about the behaviour of light; the most notable of which was that surfaces emit light (by reflection or as a light source itself) in such a way that the intensity of the emitted light changes with the cosine of the angle, as measured between the surface's normal and the observer of the light.

This simple rule forms the basis of what is called diffuse lighting. This is a mathematical model used to calculate the color of a surface depending its physical properties (such as its color and how well it reflects light) and the position of the light source.

For 3D rendering, this requires a lot of information, and this can best be represented with another diagram:

You can see a lot of arrows in the picture – these are vectors and for each vertex to calculate the color of, there will be:

  • 3 for the positions of the vertex, light source, and camera viewing the scene
  • 2 for the directions of the light source and camera, from the perspective of the vertex
  • 1 normal vector
  • 1 half-vector (it's always halfway between the light and camera direction vectors)

These are all calculated during the vertex processing stage of the rendering sequence, and the equation (called the Lambertian model) that links them all together is:

So the color of the vertex, through diffuse lighting, is calculated by multiplying the color of the surface, the color of the light, and the dot product of the vertex normal and light direction vectors, with attenuation and spotlight factors. This is done for each light source in the scene, hence the 'summing' part at the start of the equation.

The vectors in this equation (and all of the rest we will see) are normalized (as indicated by the accent on each vector). A normalized vector retains its original direction, but it's length is reduced to unity (i.e. it's exactly 1 unit in magnitude).

The values for the surface and light colors are standard RGBA numbers (red, green, blue, alpha-transparency) – they can be integer (e.g. INT8 for each color channel) but they're nearly always a float (e.g. FP32). The attenuation factor determines how the light level from the source decreases with distance, and it gets calculated with another equation:

The terms AC, AL, and AQ are various coefficients (constant, linear, quadratic) to describe the way that the light level is affected by distance – these all have to be set out by the programmers when created the rendering engine. Every graphics API has its own specific way of doing this, but the coefficients are entered when the type of light source is coded.

Before we look at the last factor, the spotlight one, it's worth noting that in 3D rendering, there are essentially 3 types of lights: point, directional, and spotlight.

Point lights emit equally in all directions, whereas a directional light only casts light in one direction (math-wise, it's actually a point light an infinite distance away). Spotlights are complex directional sources, as they emit light in a cone shape. The way the light varies across the body of the cone is determined the size of the inner and outer sections of cone.

And yes, there's another equation for the spotlight factor:

The value for the spotlight factor is either 1 (i.e. the light isn't a spotlight), 0 (if the vertex falls outside of the cone's direction), or some calculated value between the two. The angles φ (phi) and θ (theta) set out the sizes of the inner/outer sections of the spotlight's cone.

The two vectors, Ldcs and Ldir, (the reverse of the camera's direction and the spotlight's direction, respectively) are used to determine whether or not the cone will actually touch the vertex at all.

Now remember that this is all for calculating the diffuse lighting value and it needs to be done for every light source in the scene or at least, every light that the programmer wants to include. A lot of these equations are handled by the graphics API, but they can be done 'manually' by coders wanting finer control over the visuals.

However, in the real world, there is essentially an infinite number of light sources. This is because every surface reflects light and so each one will contribute to the overall lighting of a scene. Even at night, there is still some background illumination taking place – be it from distant stars and planets, or light scattered through the atmosphere.

To model this, another light value is calculated: one called ambient lighting.

This equation is simpler than the diffuse one, because no directions are involved. Instead, it's a straight forward multiplication of various factors:

  • CSA – the ambient color of the surface
  • CGA – the ambient color of the global 3D scene
  • CLA – the ambient color of any light sources in the scene

Note the use of the attenuation and spotlight factors again, along with the summation of all the lights used.

So we have background lighting and how light source diffusely reflect off the different surfaces in the 3D world all accounted for. But Lambert's approach really only works for materials that reflect light off their surface in all directions; objects made from glass or metal will produce a different type of reflection, and this is called specular and naturally, there's an equation for that, too!

The various aspects of this formula should be a little familiar now: we have two specular color values (one for the surface, CS, and one for the light, CLS), as well as the usual attenuation and spotlight factors.

Because specular reflection is highly focused and directional, two vectors are used to determine the intensity of the specular light: the normal of the vertex and the half-vector. The coefficient p is called the specular reflection power, and it's a number that adjusts how bright the reflection will be, based on the material properties of the surface. As the size of p increases, the specular effect becomes brighter but more focused, and smaller in size.

The final lighting aspect to account for is the simplest of the lot, because it's just a number. This is called emissive lighting, and gets applied for objects that are a direct source of light – e.g. a flame, flashlight, or the Sun.

This means we now have 1 number and 3 sets of equations to calculate the color of a vertex in a surface, accounting for background lighting (ambient) and the interplay between various light sources and the material properties of the surface (diffuse and specular). Programmers can choose to just use one or combine all four by just adding them together.

Visually, the combination takes an appearance like this:

The equations we've looked at are employed by graphics APIs, such as Direct3D and OpenGL, when using their standard functions, but there are alternative algorithms for each type of lighting. For example, diffuse can be done via the Oren-Nayar model which suits very rough surfaces between than Lambertian.

The specular equation earlier in this article can be replaced with models that account for the fact that very smooth surfaces, such as glass and metal, are still rough but on a microscopic level. Labelled as microfacet algorithms, they offer more realistic images, at a cost of mathematical complexity.

Whatever lighting model is used, all of them are massively improved by increasing the frequency with which the equation is applied in the 3D scene.

Per-vertex vs per-pixel

When we looked at vertex processing and rasterization, we saw that the results from all of the fancy lighting calculations, done on each vertex, have to be interpolated across the surface between the vertices. This is because all of the properties associated the surface's material are contained within the vertices; when the 3D world gets squashed into a 2D grid of pixels, there will only be one pixel directly where the vertex is.

The rest of the pixels will need to be given the vertex's color information in such a way that the colors blend properly over the surface. In 1971, Henri Gouraud, a post-graduate of University of Utah at the time, proposed a method to do this, and it now goes by the name of Gouraud shading.

His method was computationally fast and was the de facto method of doing this for years, but it's not without issues. It struggles to interpolate specular lighting properly and if the shape is constructed from a low number of primitives, then the blending between the primitives doesn't look right.

A solution to this problem was proposed by Bui Tuong Phong, also of University of Utah, in 1973 – in his research paper, Phong showed a method of interpolating vertex normals on rasterized surfaces. This meant that diffuse and specular reflection models would work correctly on each pixel, and we can see this clearly using David Eck's online textbook on computer graphics and WebGL.

The chunky spheres are being colored by the same lighting model, but the one on the left is doing the calculations per vertex and then using Gouraud shading to interpolate it across the surface. The sphere on the right is doing this per pixel, and the difference is obvious.

The still image doesn't do enough justice to do the improvement that Phong shading brings, but you can try the demo yourself using Eck's online demo, and see it animated.

Phong didn't stop there, though, and a couple of years later, he released another research paper in which he showed how the separate calculations for ambient, diffuse, and specular lighting could all be done in one single equation:

Okay, so lots to go through here! The values indicated by the letter k are reflection constants for ambient, diffuse, and specular lighting – each one is the ratio of that particular type of light reflected to the amount of incident light; the C values we saw in the earlier equations (the color values of the surface material, for each lighting type).

The vector R is the 'perfect reflection' vector – the direction the reflected light would take, if the surface was perfectly smooth, and is calculated using the normal of the surface and the incoming light vector. The vector C is the direction vector for the camera; both R and C are normalized too.

Lastly, there's one more constant in the equation: the value for α determines how shiny the surface is. The smoother the material (i.e. the more glass/metal-like it is), the higher the number.

This equation is generally called the Phong reflection model, and at the time of the original research, the proposal was radical, as it required a serious amount of computational power. A simplified version was created by Jim Blinn, that replaced the section in the formula using R and C, with H and N (the half-way vector and surface normal). The value of R has to be calculated for every light, for every pixel in a frame, whereas H only needs to be calculated once per light, for the whole scene.

The Blinn-Phong reflection model is the standard lighting system used today, and is the default method employed by Direct3D, OpenGL, Vulkan, etc.

There are plenty more mathematical models out there, especially now that GPUs can process pixels through vast, complex shaders; together, such formulae are called bidirectional reflectance/transmission distribution functions (BRDF/BTFD for short) and they form the cornerstone of coloring in each pixel that we see on our monitors, when we play the latest 3D games.

However, we've only looked at surfaces reflecting light: translucent materials will allow light to pass through, and as it does so, the light rays are refracted. And certain surfaces, such as water, will reflect and transmit in each measures.

Taking light to the next level

Let's take a look at Ubisoft's 2018 title Assassin's Creed: Odyssey – this game forces you to spend a lot of time sailing around on water, be it shallow rivers and coastal regions, as well as deep seas.

To render the water as realistically as possible, but also maintain a suitable level of performance, Ubisoft's programmers used a gamut of tricks to make it all work. The surface of the water is lit via the usual trio of ambient, diffuse, and specular routines, but there are some neat additions.

The first of which is commonly used to generate the reflective properties of water: screen space reflections (SSR for short). This technique works by rendering the scene but with the pixel colors based on the depth of that pixel – i.e. how far it is from the camera – and stored in what's called a depth buffer. Then the frame is rendered again, with the usual lighting and texturing, but the scene gets stored as a render texture, rather than the final buffer to be sent to the monitor.

After that, a spot of ray marching is done. This involves sending out rays from the camera and then at set stages along the path of the ray, code is run to check the depth of the ray against the pixels in the depth buffer. When they're the same value, the code then checks the pixel's normal to see if it's facing the camera, and if it is, the engine then looks up the relevant pixel from the render texture. A further set of instructions then inverts the position of the pixel, so that it is correctly reflected in the scene.

Light will also scatter about when it travels through materials and for the likes of water and skin, another trick is employed – this one is called sub-surface scattering (SSS). We won't go into any depth of this technique here but you can read more about how it can be employed to produce amazing results, as seen below, in a 2014 presentation by Nvidia.

Going back to water in Assassin's Creed, the implementation of SSS is very subtle, as it's not used to its fullest extent for performance reasons. In earlier AC titles, Ubisoft employed faked SSS but in the latest release its use is more complex, though still not to the same extent that we can see in Nvidia's demo.

Additional routines are done to modify the light values at the surface of the water, to correctly model the effects of depth, by adjusting the transparency on the basis of distance from the shore. And when the camera is looking at the water close to the shoreline, yet more algorithms are processed to account for caustics and refraction.

The result is impressive, to say the least:

That's water covered, but what about as the light travels through air? Dust particles, moisture, and so on will also scatter the light about. This results in light rays, as we see them, having volume instead of being just a collection of straight rays.

The topic of volumetric lighting could easily stretch to a dozen more articles by itself, so we'll look at how Rise of the Tomb Raider handles this. In the video below, there is 1 main light source: the Sun, shining through an opening in the building.

To create the volume of light, the game engine takes the camera frustum (see below) and exponentially slices it up on the basis of depth into 64 sections. Each slice is then rasterized into grids of 160 x 94 elements, with the whole lot stored in a 3-dimensional FP32 render texture. Since textures are normally 2D, the 'pixels' of the frustum volume are called voxels.

For a block of 4 x 4 x 4 voxels, compute shaders determine which active lights affect this volume, and writes this information to another 3D render texture. A complex formula, known as the Henyey-Greenstein scattering function, is then used to estimate the overall 'density' of the light within the block of voxels.

The engine then runs some more shaders to clean up the data, before ray marching is performed through the frustum slices, accumulating the light density values. On the Xbox One, Eidos-Montréal states that this can all be done in roughly 0.8 milliseconds!

While this isn't the method used by all games, volumetric lighting is now expected in nearly all top 3D titles released today, especially first person shooters and action adventures.

Originally, this lighting technique was called 'god rays' – or to give the correct scientific term, crepuscular rays -- and one of the first titles to employ it, was the original Crysis from Crytek, in 2007.

It wasn't truly volumetric lighting, though, as the process involved rendering the scene as a depth buffer first, and using it to create a mask – another buffer where the pixel colors are darker the closer they are to the camera.

That mask buffer is sampled multiple times, with a shader taking the samples and blurring them together. This result is then blended with the final scene, as shown below:

The development of graphics cards in the past 12 years has been colossal. The most powerful GPU at the time of Crysis' launch was Nvidia's GeForce 8800 Ultra – today's fastest GPU, the GeForce RTX 2080 Ti has over 30 times more computational power, 14 times more memory, and 6 times more bandwidth.

Leveraging all that computational power, today's games can do a much better job in terms of visual accuracy and overall performance, despite the increase in rendering complexity.

But what the effect is truly demonstrating, is that as important as correct lighting is for visual accuracy, the absence of light is what really makes the difference.

The essence of a shadow

Let's use the Shadow of the Tomb Raider to start our next section of this article. In the image below, all of the graphics settings related to shadows have been disabled; on the right, they're all switched on. Quite the difference, right?

Since shadows occur naturally around us, any game that does them poorly will never look right. This is because our brains are tuned to use shadows as visual references, to generate a sense of relative depth, location, and motion. But doing this in a 3D game is surprisingly hard, or at the very least, hard to do properly.

Let's start with a TechSpot duck. Here she is waddling about in a field, and the Sun's light rays reach our duck and get blocked as expected.

One of the earliest methods of adding a shadow to a scene like this would be to add a 'blob' shadow underneath the model. It's not remotely realistic, as the shape of the shadow has nothing to do with shape of the object casting the shadow; however, they're quick and simple to do.

Early 3D games, like the 1996 original Tomb Raider game, used this method as the hardware at the time – the likes of the Sega Saturn and Sony PlayStation – didn't have the capability of doing much better. The technique involves drawing a simple collection of primitives just above the surface the model is moving on, and then shading it all dark; an alternative to this would be to draw a simple texture underneath.

Another early method was shadow projection. In this process, the primitive casting the shadow is projected onto the plane containing the floor. Some of the math for this was developed by Jim Blinn, in the late 80s. It's a simple process, by today's standards, and works best for simple, static objects.

But with some optimization, shadow projection provided the first decent attempts at dynamic shadows, as seen in Interplay's 1999 title Kingpin: Life of Crime. As we can see below, only the animated characters (including rats!) have shadows, but it's better than simple blobs.

The biggest issues with them are: (a) the total opaqueness of the actual shadow and (b) the projection method relies on the shadow being cast onto a single, flat plane (i.e. the ground).

These problems could be resolved applying a degree of transparency to coloring of the projected primitive and doing multiple projects for each character, but the hardware capabilities of PCs in the late 90s just weren't up to the demands of the extra rendering.

The modern technology behind a shadow

A more accurate way to do shadows was proposed much earlier than this, all the way back in 1977. Whilst working at the University of Austin, Texas, Franklin Crow wrote a research paper in which he proposed several techniques that all involved the use of shadow volumes.

Generalized, the process determines which primitives are facing the light source, and the edges of these are extended are extended onto a plane. So far, this is very much like shadow projection, but the key difference is that the shadow volume created is then used to check whether a pixel is inside/outside of the volume. From this information, all surfaces can be now be cast with shadows, and not just the ground.

The technique was improved by Tim Heidmann, whilst working for Silicon Graphics in 1991, further still by Mark Kilgard in 1999, and for the method that we're going to look at, John Carmack at id Software in 2000 (although Carmack's method was independently discovered 2 years earlier by Bilodeau and Songy at Creative Labs, which resulted in Carmack tweaking his code to avoid lawsuit hassle).

The approach requires the frame to be rendered several times (known as multipass rendering – very demanding for the early 90s, but ubiquitous now) and something called a stencil buffer.

Unlike the frame and depth buffers, this isn't created by the 3D scene itself – instead, the buffer is an array of values, equal in dimensions (i.e. same x,y resolution) as the raster. The values stored are used to tell the rendering engine what to do for each pixel in the frame buffer.

The simplest use of the buffer is as a mask:

The shadow volume method goes something like this:

  • Render the scene into a frame buffer, but just use ambient lighting (also include any emission values if the pixel contains a light source)
  • Render the scene again but only for surfaces facing the camera (aka back-face culling). For each light source, calculate the shadow volumes (like the projection method) and check the depth of each frame pixel against the volume's dimensions. For those inside the shadow volume (i.e. the depth test has 'failed'), increase the value in stencil buffer corresponding to that pixel.
  • Repeat the above, but with front-face culling enabled, and the stencil buffer entries decreased if they're in the volume.
  • Render the whole scene again, but this time with all lighting enabled, but then blend the final frame and stencil buffers together.

We can see this use of stencil buffers and shadow volumes (commonly called stencil shadows) in id Software's 2004 release Doom 3:

Notice how the path the character is walking on is still visible through the shadow? This is the first improvement over shadow projections – others include being able to properly account for distance of the light source (resulting in fainter shadows) and being cast shadows onto any surface (including the character itself).

But the technique does have some serious drawbacks, the most notable of which is that the edges of the shadow are entirely dependent on the number of primitives used to make the object casting the shadow. This, and the fact that the multipass nature involves lots of read/writes to the local memory, can make the use of stencil shadows a little ugly and rather costly, in terms of performance.

There's also a limit to the number of shadow volumes that can be checked with the stencil buffer – this is because all graphics APIs allocate a relatively low number of bits to it (typically just 8). The performance cost of stencil shadows usually stops this problem from ever appearing though.

Lastly, there's the issue that the shadows themselves aren't remotely realistic. Why? Because all light sources, from lamps to fires, flashlights to the Sun, aren't single points in space – i.e. they emit light over an area. Even if one takes this to its simplest level, as shown below, real shadows rarely have a well defined, hard edge to them.

The darkest area of the shadows is called the umbra; the penumbra is always a lighter shadow, and the boundary between the two is often 'fuzzy' (due to the fact that there are lots of light sources). This can't be modelled very well using stencil buffers and volumes, as the shadows produced aren't stored in a way that they can be processed. Enter shadow mapping to the rescue!

The basic procedure was developed by Lance Williams in 1978 and it's relatively simple:

  • For each light source, render the scene from the perspective of the light, creating a special depth texture (so no color, lighting, texturing, etc). The resolution of this buffer doesn't have to be same as the final frame's, but higher is better.
  • Then render the scene from the camera's perspective, but once the frame has been rasterized, each pixel's position (in terms of x,y, and z) is transformed using a light source as the coordinate system's origin.
  • The depth of the transformed pixel is compared to corresponding pixel in the stored depth texture: if it's less, the pixel will be a shadow and doesn't get the full lighting procedure.

This is obviously another multipass procedure, but the last stage can be done using pixel shaders such that the depth check and subsequent lighting calculations are all rolled into the same pass. And because the whole shadowing process is independent of how primitives are used, it's much faster than using the stencil buffer and shadow volumes.

Unfortunately, the basic method described above generates all kinds of visual artifacts (such as perspective aliasing, shadow acne, 'peter panning'), most of which revolve around the resolution and bit size of the depth texture. All GPUs and graphics APIs have limits to such textures, so a whole raft of additional techniques have been created to resolve the problems.

One advantage of using a texture for the depth information, is that GPUs have the ability to sample and filter them very rapidly and via a number of ways. In 2005, Nvidia demonstrated a method to sample the texture so that some of the visual problems caused by standard shadow mapping would be resolved, and it also provided a degree of softness to the shadow's edges; the technique is known as percentage closer filtering.

Around the same time, Futuremark demonstrated the use of cascaded shadow maps (CSM) in 3DMark06, a technique where multiple depth textures, of different resolutions, are created for each light source. Higher resolutions textures are used nearer the light, with lower detailed textures employed at bigger distances from the light. The result is a more seamless, distortion-free, transition of shadows across a scene.

The technique was improved by Donnelly and Laurizten in 2006 with their variance shadow mapping (VSM) routine, and by Intel in 2010 with their sample distribution algorithm (SDSM).

Game developers often use a battery of shadowing techniques to improve the visuals, but shadow mapping as a whole rules the roost. However, it can only be applied to a small number of active light sources, as trying to model it to every single surface that reflects or emits light, would grind the frame rate to dust.

Fortunately, there is a neat technique that functions well with any object, giving the impression that the light reaching the object is reduced (because either itself or other objects are blocking it a little). The name for this feature is ambient occlusion and there are multiple versions of it. Some have been specifically developed by hardware vendors, for example, AMD created HDAO (high definition ambient occlusion) and Nvidia has HBAO+ (horizon based ambient occlusion).

Whatever version is used, it gets applied after the scene is fully rendered, so it's classed as a post-processing effect, and for each pixel the code essentially calculates how visible that pixel in the scene (see more about how this is done here and here), by comparing the pixel's depth value with surrounding pixels in the corresponding location in the depth buffer (which is, again, stored as a texture).

The sampling of the depth buffer and the subsequent calculation of the final pixel color play a significant role in the quality of the ambient occlusion; and just like shadow mapping, all versions of ambient occlusion require the programmer to tweak and adjust their code, on a case-by-case situation, to ensure the effect works correctly.

Done properly, though, and the impact of the visual effect is profound. In the image above, take a close look at the man's arms, the pineapples and bananas, and the surrounding grass and foliage. The changes in pixel color that the use of HBAO+ has produced are relatively minor, but all of the objects now look grounded (in the left, the man looks like he's floating above the soil).

Pick any of the recent games covered in this article, and their list of rendering techniques for handling light and shadow will be as long as this feature piece. And while not every latest 3D title will boast all of these, the fact that universal game engines, such as Unreal, offer them as options to be enabled, and toolkits from the likes of Nvidia provide code to be dropped right in, shows that they're not classed as highly specialized, cutting-edge methods – once the preserve of the very best programmers, almost anyone can utilize the technology.

We couldn't finish this article on lighting and shadowing in 3D rendering without talking about ray tracing. We've already covered the process in this series, but the current employment of the technology demands we accept low frame rates and an empty bank balance.

With next generation consoles from Microsoft and Sony supporting it though, that means that within a few years, its use will become another standard tool by developers around the world, looking to improve the visual quality of their games to cinematic standards. Just look at what Remedy managed with their latest title Control:

We've come a long way from fake shadows in textures and basic ambient lighting!

There's so much more to cover

In this article, we've tried to cover some of the fundamental math and techniques employed in 3D games to make them look as realistic as possible, looking at the technology behind the modelling of how light interacts with objects and materials. And this has been just a small taste of it all.

For example, we skipped things such as energy conservation lighting, lens flare, bloom, high dynamic rendering, radiance transfer, tonemapping, fogging, chromatic aberration, photon mapping, caustics, radiosity – the list goes on and on. It would take another 3 or 4 articles just to cover them, as briefly as we have with this feature's content.

We're sure that you've got some great stories to tell about games that have amazed you with their visual tricks, so when you're blasting your way through Call of Mario: Deathduty Battleyard or similar, spare a moment to look at those graphics and marvel at what's going on behind the scenes to make those images. Yes, it's nothing more than math and electricity, but the results are an optical smorgasbord. Any questions: fire them our way, too! Until the next one.