Yeah but it costs more than double the time to sample the textures (from 0.44ms to 1.15ms). yeah, 16 vs 4 samples but still
on the other hand it is nothing new, with an autoencoder you solve most of it and with a simple SR at the output bring more detail.
of course, you need HW acceleration to make it efficient, and no, it's not doing it in shaders, it's doing it in the TMUs, update ones, with dedicated hard to accelerate this new decompression/sampling
reading the paper, the "compression" takes a long time if you want to maintain the best quality, simply because part of the operation is to train an autoenconder based on a multilayer perceptron (an old type of neural network)
0-is an experimental operation.
1-they are doing the operation partly in software (shaders) and partly in the tensorcores (hardware, which the 1650 does not have), but neither are the tensorcores hardware specialized in this operation and the execution in the shaders is much slower in all the cases. THIS IS WHY THE texture mapping units EXIST since the first gpus, dedicated hardware specialized in decompression, sampling and texture filtering.
2-in the case of "neural" decompression, it does so with 16 times more samples than classic decompression/sampling. So even if it takes twice as long, it has higher quality. then if you reduce that to, let's say, 8 samples, the times may even out, but the quality will still be higher.
If this technique is included accelerated in the TMUs, a part specialized in fixed hardware, it should be much faster and more efficient in all aspects, including energy.
this neural compression thing is something "old". It can be done with an autoencoder, and at its output some deconvolution layers (it would be like a superresolution) to improve the quality and detail. this fact nailed down in hardware is part of what is expected
It has to be in the TMUs, but new, modernized TMUs, with the ability to speed up this operation, with specific fixed hardware units. Notice that in an entire GPU almost the only fixed operation units that are still left are the TMUs, everything else has been programmable for a long time.
and this is faster and more efficient than doing it in the shaders, be it with CUDA, with HLSL, or with GLSL or with Cg or OpenCL, DirectML or whatever. just software takes time and capacity that should be free for everything else.
ever since texture compression has been around, they all come compressed, almost always lossy, and hardware has been adapted to speed up sampling and decompression.
this can help with respect to the amount of memory, since you can compress the texture more and deliver the same quality as the current methods but consuming less memory, instead of giving more quality with the same consumption. it implies that the "medium" option for the quality of the textures could then be the equivalent in the "Ultra" option, or the ultra to the Mediun, in memory consumption...
the developers will be able to use textures with higher compression, which occupy less memory, maintaining the same current quality, so more can be put into the same amount of given VRAM. what they do is save memory.
suppose they at least compress 25% more. It's like taking a current graphics card from 8 to 10GB. but it's even more, reading the paper, for a texture with current compression method to achieve the same (slightly lower) quality than with this compression method, it needs to be 2.83 times bigger/more memory. so you could say that in the best case (not all textures are compressed the same) it would be like "going from 8GB to more than 16GB".
This compression method (note, it is specific to textures/materials as used in graphics) delivers slightly more than twice the visual quality when decompressing using the same amount of memory as if it were with the current compression methods
the developers will be able to continue using 1/2/4K textures, which consume less memory (if they want to maintain the same visual quality) or the same memory with more quality, but they can make it selective depending on which textures, how they will be used and how they will be seen on the screen