Explainer: What Are Tensor Cores?

neeyik

Posts: 1,285   +1,336
Staff member
  • Thread Starter Thread Starter
  • #26
And you are un-informed... if you don't think directML can do what DLSS is attempting to do.
Well, I've been doing a little bit of exploratory coding with DirectML for a month now, so I'm well aware that it's perfectly possible to do temporal upscaling via methods other than DLSS. At the moment, since I can't use Tensorflow through DML on a GeForce RTX (until Nvidia sort it out in the drivers), I can't judge the relative speed-up that the tensor cores offer, over doing the scaling via the CUDA cores (although my bigger problem is trying to stop Nsight from crashing whilst trying to GPU profiling when running the code).

DirectML isn't automatically 'better' than DLSS, because it's just not the same thing - the former is an API, whereas the latter is a very specific compute routine. If one knew the exact neural network performed in DLSS 2.0, then it would be possible to do it on any DX12 graphics card, but obviously Nvidia are never going to release that information. Even if one did know it, the performance wouldn't be as good as the tensor operations would be done on the FP32 SIMD units that all GPUs have, rather than dedicated units.

Unfortunately, unlike RT cores which are automatically utilised in any DXR that involves acceleration structures (the API leaves it entirely to the GPU and its drivers), use of the tensor cores does require a flag to be enabled (and the data in a set format) that's not part of DirectML. Not yet, at least.
 
  • Like
Reactions: m3tavision

m3tavision

Posts: 549   +318
Well, I've been doing a little bit of exploratory coding with DirectML for a month now, so I'm well aware that it's perfectly possible to do temporal upscaling via methods other than DLSS. At the moment, since I can't use Tensorflow through DML on a GeForce RTX (until Nvidia sort it out in the drivers), I can't judge the relative speed-up that the tensor cores offer, over doing the scaling via the CUDA cores (although my bigger problem is trying to stop Nsight from crashing whilst trying to GPU profiling when running the code).

DirectML isn't automatically 'better' than DLSS, because it's just not the same thing - the former is an API, whereas the latter is a very specific compute routine. If one knew the exact neural network performed in DLSS 2.0, then it would be possible to do it on any DX12 graphics card, but obviously Nvidia are never going to release that information. Even if one did know it, the performance wouldn't be as good as the tensor operations would be done on the FP32 SIMD units that all GPUs have, rather than dedicated units.

Unfortunately, unlike RT cores which are automatically utilised in any DXR that involves acceleration structures (the API leaves it entirely to the GPU and its drivers), use of the tensor cores does require a flag to be enabled (and the data in a set format) that's not part of DirectML. Not yet, at least.
Yeah. But don't forget that directML is evolving.
I suggest you take a long weekend and read up on the latest directML implementations. There is a reason Microsoft announced DX12 Ultimate, then patched their OS for Hardware-Accelerated GPU Scheduling. (And you do know, that nearly every Series X game, will feature directML in the future.)

And if you've dabbled with directML and know a little about "dlss", then you do know that "DLSS" is nothing other than nVidia TRYING to utilize left-over bloat, from Turing's datacenter/big-business architectural design. They were not designed for gaming, it just that nVidia is trying to get developer's to use them... or repurpose them, for some minute aspect of gaming. Then make a big deal about it. (see rtx)

Which is utterly pointless, when nVidia is going to have to support directML anyways... which is more robust.


Understand, I am not nocking nVidia's tensor farms, but they really are wasted space for Gamer's dollars.
 
Last edited: