DirectX just made shader execution reordering official, boosting ray tracing performance in games

Daniel Sims

Posts: 2,454   +74
Staff
In context: Shader Execution Reordering significantly reduces the performance cost of ray tracing and path tracing in games such as Alan Wake 2 and Indiana Jones and the Great Circle on recent Nvidia graphics cards. The latest version of DirectX makes these efficiency gains accessible to all developers, facilitating the implementation of ray tracing in future titles.

The recently released Shader Model 6.9 brings Microsoft's official take on Shader Execution Reordering (SER) out of preview. Nvidia and a few game developers have already benefited from the feature, but the latest test nearly doubles frame rates for Intel Arc Battlemage GPUs.

One of the biggest performance bottlenecks in ray tracing is divergence, which occurs when rays bounce into different kinds of objects, forcing shader threads and branches to access different types of information. This often causes the GPU to run the threads sequentially rather than in parallel, significantly reducing performance.

Since the process occurs on shaders rather than RT cores, simply adding or enhancing RT cores does not address the issue. SER minimizes divergence by keeping similar threads in parallel.

Microsoft claims that the feature can increase performance in ray tracing, and especially path tracing, by up to 100%, but the impact varies across games depending on the amount of detail in each scene.

For example, according to Khronos, SER improved path tracing performance by approximately 24% in Indiana Jones and the Great Circle, 39% in Alan Wake 2 (with help from opacity micromaps), and a staggering 370% in Black Myth: Wukong. A freely available sample program illustrates the improvement, which can reach 40% on an Nvidia RTX 4090 and 90% on Battlemage graphics cards.

Nvidia introduced SER with the RTX 4000 series GPUs in 2022 and doubled the feature's efficiency with RTX 5000 GPUs. Microsoft previewed SER's standardized implementation in DirectX last year but has now made it a mandatory component of Shader Model 6.9.

This will make implementing the feature easier for developers and extend its benefits to Intel Arc Battlemage. AMD Radeon RX GPUs currently do not include hardware SER support, but future generations likely will.

Shader Model 6.9 also formalizes support for opacity micromaps, a feature that reduces the overhead from ray tracing when rendering transparent objects. All RTX GPUs support the feature, and standardization across DirectX increases the likelihood that future Intel and AMD hardware will as well.

Permalink to story:

 
Oh AMD, when will you catch up?

Its pretty telling that intel's second attempt at a GPU is supported but AMDs current gen isnt.

This has been the story for years now, people just didn't want to acknowledge that Intel was pretty much on-par with AMD dGPUs from Alchemist (worse drivers at launch, fine, but competitive raster and better RT, plus, after about a year the drivers were 99% there). Intel was also competitive in iGPUs well before Panther Lake. Strix Halo is cool and all, but way too expensive to actually make a meaningful dent in marketshare for AMD.

Speaking of which where TF is big battlemage man?

Haven't you heard? The stuff we actually want is going Pro/AI only. Another sad reality of the industry...
 
The latest version of DirectX makes these efficiency gains accessible to all developers, facilitating the implementation of ray tracing in future titles.
So this means that this improvement is dependent on developer usage, and can't be implemented through mods or what have you on the user side, right? Shame for current and older games from the last couple of years that would benefit greatly from this.
 
So this means that this improvement is dependent on developer usage, and can't be implemented through mods or what have you on the user side, right? Shame for current and older games from the last couple of years that would benefit greatly from this.
I mean, I've learned never to count modders out. If there is nerdy and determined enough fan of a game, anything is possible.

But, yeah, probably only new game - or at least actively developed/supported games - will ever see benefit from this.
 
Oh AMD, when will you catch up?
Nvidia's own 20 and 30, series also lack SER support.
Only 40 and 50 series that are essentially the same thing support it.

And considering these technologies originated from Nvidia and are now simply standardized it's surprising AMD support is as good as it is.
Its pretty telling that intel's second attempt at a GPU is supported but AMDs current gen isnt.
Intel only has to support two generations of very limited number of dies.
Lets see how they do on feature support after releasing 4-5 generation in a row.
That's when the cracks start to show as evidenced even by Nvidia themselves.
Intel was pretty much on-par with AMD dGPUs from Alchemist (worse drivers at launch, fine, but competitive raster and better RT, plus, after about a year the drivers were 99% there).
It took more than a year for the driver and their control panel to be sorted out. For most people Alchemist was always DOA and horribly late to the market even if they later fixed the drivers.

Battlemage was also late and has worse efficiency given the die size and power consumption. Given Intel's deal with Nvidia I fail to see why they would even bother keeping their GPU's alive past 2029.
Strix Halo is cool and all, but way too expensive to actually make a meaningful dent in marketshare for AMD.
And the full config Panther Lake that costs nearly as much as Strix Halo is much better?
The anemic variants still sold over a 1000 dont come anywhere close in performance.
Efficiency yes, but that's expected given the more advanced process node.
So this means that this improvement is dependent on developer usage
This means it's DOA. Most developers lack competence, resources and time to implement something like this. Hence why it will never be implemented in 99% of games.

Look back at what happened with VRS, Sampler Feedback etc. All great things in theory, but never used.
 
SER and Opacity Micro Maps really increase Path Tracing performance. If you've been using path tracing on Cyberpunk since 2023, you've seen the performance improvements over the years as these two features were implemented.
 
hehehe

Oh AMD, when will you catch up?

Its pretty telling that intel's second attempt at a GPU is supported but AMDs current gen isnt.

Speaking of which where TF is big battlemage man?

When will AMD catch up to NVIDIAs technology, backed by companies that are invested in by NVIDIA? Gee I wonder when? When you can control the market by supporting some games or engines, you get to decide what is standard and what is not. Intels cards are absolutely not caught up with AMD. let me know when someone can upgrade a 9700xt to intel. thanks.
 
And considering these technologies originated from Nvidia and are now simply standardized it's surprising AMD support is as good as it is.

Tbf, that's been the history of DX features: NVIDIA creates Pixel Shaders, AMD makes its own implementation, the two get combined and put into the next DX release. Rinse and repeat.
 
Nvidia's own 20 and 30, series also lack SER support.
And Ford and GM cars built generations ago lack power steering and anti-lock brakes. How is either of our statements relevant to today's products?

This means it's DOA. Most developers lack competence, resources and time to implement something like this. Hence why it will never be implemented in 99% of games.
I could be wrong, but most game developers never work with code at this level. It's the developers of the game engines who will implement this, making it in most cases available to games with a simple recompile.
 
Last edited:
hehehe

Oh AMD, when will you catch up?

Its pretty telling that intel's second attempt at a GPU is supported but AMDs current gen isnt.

Speaking of which where TF is big battlemage man?

Except they do seem to support it.

"RDNA 4 introduces new shader reordering similar to NVIDIA's Shader Execution Reordering (SER) for the GeForce RTX 40 Series. This offers better memory management by allowing shader requests to be carried out more efficiently and not in some strict order. AMD notes that this will improve RDNA 4's performance in many workloads, not just ray-tracing.

Read more: https://www.tweaktown.com/news/1035...nt-improvements-as-outlined-by-amd/index.html"
 
This has been the story for years now, people just didn't want to acknowledge that Intel was pretty much on-par with AMD dGPUs from Alchemist (worse drivers at launch, fine, but competitive raster and better RT, plus, after about a year the drivers were 99% there). Intel was also competitive in iGPUs well before Panther Lake. Strix Halo is cool and all, but way too expensive to actually make a meaningful dent in marketshare for AMD.



Haven't you heard? The stuff we actually want is going Pro/AI only. Another sad reality of the industry...

Hyperbole?

Intel Battlemage B580 was using +50% more transistors, and a newer denser manufacturing process then AMD Navi33 7600XT and was only just comparable performance wise. Not forgetting rdna 3 released 18 months earlier than Battlemage B580, with rdna 4 released 3 months after Battlemage.

So Intel were at first behind and then later even more behind. And so long as you waited a year until after their competition released their next gen hardware, when they were more behind, their drivers were finally in order such that it could compete with their competitors prior gen and only be regular degree of behind to it.

Hardly worth crowing about.

And this comes from someone (me) who thought Intel actually did pretty well and wished they had continued developing their dGPU arch.
 
It took more than a year for the driver and their control panel to be sorted out. For most people Alchemist was always DOA and horribly late to the market even if they later fixed the drivers.

Battlemage was also late and has worse efficiency given the die size and power consumption. Given Intel's deal with Nvidia I fail to see why they would even bother keeping their GPU's alive past 2029.
Hyperbole?

Intel Battlemage B580 was using +50% more transistors, and a newer denser manufacturing process then AMD Navi33 7600XT and was only just comparable performance wise. Not forgetting rdna 3 released 18 months earlier than Battlemage B580, with rdna 4 released 3 months after Battlemage.

So Intel were at first behind and then later even more behind. And so long as you waited a year until after their competition released their next gen hardware, when they were more behind, their drivers were finally in order such that it could compete with their competitors prior gen and only be regular degree of behind to it.

Hardly worth crowing about.

And this comes from someone (me) who thought Intel actually did pretty well and wished they had continued developing their dGPU arch.

First launches are typically riddled with bugs. This is just the reality of any major foray into any tech sector. Having 16GB of high-bandwidth VRAM helped the value-prop though. As for Battlemage, it was also still competitive for the price and VRAM, but it being so late can't be ignored. So, there is little doubt here that Intel are planning on shifting gaming to iGPU. Why would they want to stay in the contentious gaming dGPU market when the more lucrative Pro/AI markets are available? Either way, Intel has been a force to reckon with in GPU for years, just not in a way most people are familiar with.

As for Intel's dealings with nVidia (as of the beginning of March 2026, no product announcements, no rumors even, just a press release... woohoo...), they're both aware that since they can't merge, they are better off continuing to develop what they're missing. Hence the rumors of nVidia's imminent APU with ~5070 mobile CUDA cores + ARM CPU. No doubt, Intel and nVidia want to shed their dependence on the other as soon as they can. That's also why AMD launched Strix Halo to begin with, although it was unfortunately priced WAY too high to capture any meaningful marketshare.

And the full config Panther Lake that costs nearly as much as Strix Halo is much better?
The anemic variants still sold over a 1000 dont come anywhere close in performance.
Efficiency yes, but that's expected given the more advanced process node.

1. While these "anemic variants" exist, they haven't captured anybody's interest around Panther Lake. If the SOC doesn't have the number 8 in the model #, nobody is interested. Moving on... We also don't know how much these laptops are going to end up costing when the dust settles, thanks to the component shortage and the fact that these models are just now showing up on the market. What we do know, however, is that the "full config" (ie 358H) B390 models are currently significantly cheaper than 8060S models. That's already the case from the handful of models that are already available.

As for the 388H, who is going to buy 388H? While AMD prefers to segment their products based on core count and cache, Intel prefers the opposite, to mostly segment them with slight clockspeed differences. This is... not great, but that's been Intel's MO for years. 388H, like all of their top chips, is aimed at those with more money than sense. And like Intel's predecessing top chips, the only advantage 388H has over 358H is slightly (~9%) higher max turbo clocks, so the obsession with 388H somehow being the legitimate comparison point is frankly unwarranted. Case in point: there were 6-10 258V laptop skus for every 1 288V sku that was produced, and it'll likely be the same for 358H vs 388H. At the end of the day, when it comes to Panther Lake, the smart money is going for 358H (B390) or 338H (B370). Nothing else.

2. 18A is competitive with TSMC's 3nm node (it likely loses to TSMC 2nm); about a single node advantage over TSMC's 4nm, so let's be generous and assume a 30% performance boost (assuming the same power consumption) for Intel's single node advantage for the sake of your argument. That alone does not account for the entirety of both the 40-70% iGPU performance boost over 140V/890m AND the significantly lower power consumption (unless, somehow, 18A with PowerVIA is just that much of a game-changer; I doubt those two factors would be wholly responsible for this). There were significant architectural improvements here too, and it doesn't look like AMD will have any competitive response to it unless they immediately go straight to RDNA 5 for their iGPUs in 2027, on a better node at that. Either N2 or A16 at this point.
 
"No SER hardware support on AMD"

Don't forget the linux radeon driver community is nuts, they have given us RT on non-RT hardware, FSR4 on hardware not supported by AMD and their drivers most of the time are better than the official branch, no doubt that RX can do still a lot of stuff by brute force.
 
So this means that this improvement is dependent on developer usage, and can't be implemented through mods or what have you on the user side, right? Shame for current and older games from the last couple of years that would benefit greatly from this.
But, yeah, probably only new game - or at least actively developed/supported games - will ever see benefit from this.
This means it's DOA. Most developers lack competence, resources and time to implement something like this. Hence why it will never be implemented in 99% of games.

Any game made for rtx 40xx and later may or may not benefit from this.
nVidia already used OMM and SER on cards that were able to run them in HW.
This only made them official part of Microsoft definition of Shader Model 6.9

TL;DR
No gain for rtx 20xx and rtx 30xx as they do not have HW support.
Minimal to no gain for rtx 40xx as it was already used in non-strict DXR games.
 
Hence the rumors of nVidia's imminent APU with ~5070 mobile CUDA cores + ARM CPU. No doubt, Intel and nVidia want to shed their dependence on the other as soon as they can. That's also why AMD launched Strix Halo to begin with, although it was unfortunately priced WAY too high to capture any meaningful marketshare.
You sound like those nVidia Crybabies shouting at AMD to lower the prices in vein hope nVidia will sell its toys cheaper.

Notebooks with Strix Halo were offered a really high prices (before this madness)
$1200 for 32GB
$1500 for 64GB
$2000 for 128 GB

Compare that to price of nVidia N1/N1X

As for the 388H, who is going to buy 388H?
It may or may not be cheaper as similar setup on Strix Halo.
Is it worth it?

2. 18A is competitive with TSMC's 3nm node (it likely loses to TSMC 2nm); about a single node advantage over TSMC's 4nm, so let's be generous and assume a 30% performance boost (assuming the same power consumption) for Intel's single node advantage for the sake of your argument. That alone does not account for the entirety of both the 40-70% iGPU performance boost over 140V/890m AND the significantly lower power consumption (unless, somehow, 18A with PowerVIA is just that much of a game-changer; I doubt those two factors would be wholly responsible for this). There were significant architectural improvements here too, and it doesn't look like AMD will have any competitive response to it unless they immediately go straight to RDNA 5 for their iGPUs in 2027, on a better node at that. Either N2 or A16 at this point.

You may see the comparison of notebooks based on:
- Strix Point
- Strix Halo
- Lunar Lake
- Arrow Lake
- Panther Lake

TL;DR - It took a lot of effort to find tests where Panther Lake excels over the rest.
Cougar Cove may or may not give higher performance than Zen5 core, but its temperature goes higher and suffers with thermal throttling.
 
Back