3DMark's new DXR Feature Test

neeyik

Posts: 2,963   +3,630
Staff member
3DMark has just been updated for Port Royale DLC owners to include a DirectX Ray Tracing feature test.

3dmark-directx-raytracing-feature-test-screenshot-3.jpg


Description
The DirectX Raytracing feature test measures pure ray-tracing performance. Use this test to compare the performance of dedicated ray-tracing hardware in the latest graphics cards. In this feature test, there is a minimal amount of traditional rendering. The result of the test depends entirely on the ray-tracing performance of the graphics card.

Method
Instead of using traditional rendering, the whole scene is ray-traced and drawn in one pass. Camera rays are traced across the field of view with small random offsets to simulate a depth of field effect. The frame rate is determined by the time taken to trace and shade a set number of samples for each pixel, combine the results with previous samples and present the output on the screen. You can change the sample count to see how it affects performance and visual quality. The rendering resolution is 2560 × 1440.

Rendering
There is a minimal amount of traditional rendering in this test. Instead of drawing a GBuffer or using a rasterizer at all, camera rays are traced in a compute shader with random offsets to simulate a depth of field effect. To keep light computation to a minimum, image-based lighting is used in addition to a baked light map.

Sampling
Camera rays are randomized with per-pixel offsets. There are 12 samples for each pixel when running the test with default settings. When the camera is stationary, samples are accumulated at a rate of 12 samples per pixel per frame. This improves the appearance of the depth of field effect from slightly grainy to smooth over the span of several frames. When the camera moves, a light motion blur is applied to reduce the noise that is a natural result of this ray-tracing technique.

Implementation
The test measures the peak ray-traversal performance of the GPU. All other work, such as illumination and post processing, is kept to a minimum. The ray tracing acceleration structure is built only once. As the scene is static and non-animated, there is no need to update the acceleration structure during the test. The test casts primary rays only. The rays are approximately sorted by direction on the CPU during the test initialization, which is possible because the sampling pattern in screen space is known beforehand. Generating the optimal ray order during initialization allows more coherent ray traversal for out-of-focus areas without the run-time cost of sorting


With the default sampling rate, the frames are very noisy - performance is, unsurprisingly, quite low:

https://www.3dmark.com/3dm/52465357? - average fps = 21.66 fps

Changing the sample rate from the default 12 to 20 pretty much halves the frame rate, but does little to improve the noise (it's especially poor during motion). It is, of course, a feature test to compare DXR performance, specifically the handling of the BVH checking and ray-triangle intersection calculations (as accelerated by the ray tracing units in Turing, Ampere, and RDNA 2) rather than denoising implementations.
 
@Tyrchlis Thanks for the result from an RTX 3090.

Now, let's see: 56.78/21.66 = 2.62 times higher. Your 3090 has 82 RT cores, and my 2080 Super has 48 cores; your GPU had an average core clock of 2011 MHz, whereas mine was 1,949 MHz. So without any change of architecture, if the test really does hit 'RT performance' and nothing else, then your 3090 should be (82/48)*(2011/1949) = 1.76 times higher.

Your result is almost 50% better that this increase, which provides evidence that the ray-triangle intersection checking rate in Ampere really is double that of Turing's. It's not 100% better because only the checking rate has been improved - BVH traversal still seems to be the same rate, although the RT core is now fully independent of the CUDA cores.
 
@Tyrchlis Thanks for the result from an RTX 3090.

Now, let's see: 56.78/21.66 = 2.62 times higher. Your 3090 has 82 RT cores, and my 2080 Super has 48 cores; your GPU had an average core clock of 2011 MHz, whereas mine was 1,949 MHz. So without any change of architecture, if the test really does hit 'RT performance' and nothing else, then your 3090 should be (82/48)*(2011/1949) = 1.76 times higher.

Your result is almost 50% better that this increase, which provides evidence that the ray-triangle intersection checking rate in Ampere really is double that of Turing's. It's not 100% better because only the checking rate has been improved - BVH traversal still seems to be the same rate, although the RT core is now fully independent of the CUDA cores.

Okay, WOW, that actually taught me a bit and allowed me to create the visualization necessary to picture your reasoning. And also explains some of my own testing in other ray tracing "pure" benchmarks, that are really nothing of the sort. Pretty much every scene, even the 3Dmark DirectX Raytracing Feature Test has some rasterized elements in the scene. Getting a solid handle on the pure ray tracing performance differential between Turing and Ampere has eluded me, until your explanation.

I skipped Turing because while I wanted ray tracing badly, I knew from previous heavy GPU feature launches that first gen always proves the tech, but demands compromises for performance. Second generation is where performance usually takes off. My 1080 Ti did everything I really needed of it rasterization wise, even up to today. I bought 3090 for two reasons. The gaming side of things must be the fastest ray tracing I can lay hands on, and the dev side needs gobs of memory for texture edits while building mods as well as rigging 3D models and other vram intensive creative tasks.

In many more ways than I expected before the purchase, 3090 has proven to exactly nail my needs. The number of ways it improved a 4 year old computer are extremely numerous to list, but the biggest and most unexpected ways it helped were in the ways LEAST probable!

The logic of this is based around my initial demand of ultra fast ray tracing in games. The 3090 is universally hailed as a "4K or higher only" GPU, which is ridiculous to assert and short sighted when considering modern game engines using ray tracing that bring even 3090 to it's knees at 4K. Watch Dogs Legion, GREAT example of running super smooth at 1440p WITH highest levels of ray tracing, yet bogs down at 4K with same RT settings, even with DLSS chugging it's best away at it.

My 1440p@165hz monitor just got a new breath of life to it. Ray traced games at it's nearly full speed and PROPERLY in the Gsync managed range of my monitor from top to bottom. Suddenly, considering ray tracing was my goal for gaming, I no longer need consider a expensive monitor upgrade that was planned out....

Secondly, the planned system upgrade to eliminate CPU and PCIe bottlenecks is likewise rendered irrelevant by the fact that the 10% loss measured average of those combined bottlenecks doesn't impact gaming performance noticeably at all when frame rates are already roaming between 144-165 average. Loss of 10% is irrelevant. I would only jump up to 150-175-ish in the same scenarios. Not worth it if it puts me just above my monitor's Gsync range.

My 4 year old i7 7700K + Z270 system just got a new lease on life for another 3 years possibly. The fact is, 3090 doubled the performance of the replaced 1080 Ti across the board overall, AND gave me ray tracing at the desired levels. ALL of my planned upgrades over the next 3 months are rendered moot. Meh, will spend the money on stuff like flight stick, peddals, VR set, that kinda stuff now...
 
Back