FPGA chip shown to be over 50 times more efficient than a Ryzen 4900H

Daniel Sims

Posts: 670   +27
Staff
WTF?! Many people have probably heard of FPGA from its applications for legacy hardware emulation. A recent FPGA demonstration, however, managed to run a simple game utilizing ray tracing – something typically associated with the most advanced graphics processors.

A pair of new workflow tools from two developers has allowed a modest FPGA chip to achieve stunning efficiency gains over a conventional x86 processor. The results could open new paths for energy-efficient operations across several industries.

The demonstration involved a game depicting little more than a shiny ball bouncing across a checkerboard surface. However, the game utilized real-time ray tracing, something no one would expect a medium-sized FPGA chip to handle. Furthermore, the FPGA processor ran the game using far less energy than a much more powerful AMD laptop CPU.

Developers Victor Suarez Rovere and Julian Kemmerer built the demo for an Artix 7 100T in C, expressing the code directly to the circuit using their tools – CflexHDL and PipelineC. Then, they compiled the same demo for a Ryzen 9 4900H, running entirely on the CPU without using its integrated graphics. Both chips ran the game at around 60 frames per second in 1080p but needed drastically different performance profiles for the task.

The Artix – based on a 28nm node process – ran at 148MHz with about 100,000 logic elements. The Ryzen, in comparison, is an 8-core 16-thread 7nm CPU. The developers ran all those threads near the processor's 4.2GHz maximum boost clock. Rovere and Kemmerer estimate that the Artix has about one-fifteenth the number of transistors as the Ryzen.

Despite the deficit, the FPGA part ran the demo using only 660mW and "stayed barely warm" despite a total lack of active cooling. The x86 chip, however, consumed 33W – 50 times more energy – and reached 88C with its fans at maximum to achieve the same performance.

Rovere and Kemmerer estimate that a 7nm FPGA chip would have multiplied the efficiency gap by a factor of six, needing 300 times less wattage than the Ryzen. To be fair, running the Ryzen in its intended environment with integrated graphics or a dedicated GPU would likely have been more efficient, but wouldn't have cleared the gap with the Artix, much less a more advanced FPGA part.

The developers think their demonstration could have applications far beyond game development. The tiny TDP requirements of methods like CflexHDL and PipelineC could have benefits in areas including aerospace, industrial control, or networking. Virtual Reality and Augmented Reality headsets could become smaller with longer battery life and less latency. For security, the fixed latency and lack of stored instructions could dramatically shrink a system's attack surface.

In the future, Rovere and Kemmerer plan to transfer their work to RISC-V and ASIC while making it open-source. Along with the above video, a white paper on GitHub explains the demonstration in-depth.

Permalink to story.

 

Mjsun

Posts: 20   +41
This, seems like smoke and mirrors. FPGAs by nature can be tailored to specific tasks, that is literally their function. Should it surprise us that a chip specifically configured for ONE function runs better than a general purpose processor? Further, an ASIC configured for this would by definition blow the FPGA out of the water, as the silicon itself is application specific.

So, what am I missing? Other than including FPGAs in hardware to serve as reprogrammable silicon for specific tasks, we will continue to use CPUs for general purpose and accelerators (ASICs) for specific tasks.
 
Last edited:

NeoMorpheus

Posts: 1,393   +2,960
So, why compare to a Ryzen and not and Intel one?

Why it seems that AMD is only mentioned when they can be placed on a bad light?

I know that Daniel didnt do this, just reported it from another site.
 

Aranarth

Posts: 144   +144
This, seems like smoke and mirrors. FPGAs by nature can be tailored to specific tasks, that is literally their function. Should it surprise us that a chip specifically configured for ONE function runs better than a general purpose processor? Further, an ASIC configured for this would by definition blow the FPGA out of the water, as the silicon itself is application specific.

So, what am I missing? Other than including FPGAs in hardware to serve as reprogrammable silicon for specific tasks, we will continue to use CPUs for general purpose and accelerators (ASICs) for specific tasks.

"built the demo for an Artix 7 100T in C, expressing the code directly to the circuit"

They took an fpga and programmed it for a specific task. They literally took software and turned it into hardware.

It is smoke and mirrors AND it is also magic.

You could say this is a SPECIFIC PURPOSE processor as opposed to a general purpose processor. They programmed it to be even more specific than a video card!

Hopefully this helps you understand what is going on.
 

neeyik

Posts: 2,253   +2,713
Staff member
So, why compare to a Ryzen and not and Intel one?

Why it seems that AMD is only mentioned when they can be placed on a bad light?

I know that Daniel didnt do this, just reported it from another site.
The researchers used a Ryzen-powered laptop. I'm guessing that's all they had in the office.

So how does the fgpa ray tracing prowess compare to GeForce or Radeon ?
For the power usage, it'd still beat them. But remove power constraints, and a top-end GPU would leave it for dust.
 

human7

Posts: 127   +98
So, what am I missing? Other than including FPGAs in hardware to serve as reprogrammable silicon for specific tasks, we will continue to use CPUs for general purpose and accelerators (ASICs) for specific tasks.

Perhaps that is it: PCs of the future could have FPGA cards. When you run a game or whatever, it reconfigures the FPGA to run something on it. The only trick would be having a standard FPGA (so that software support can be standard) and a common software interface like OpenCL or CUDA (for the FPGA).
 

jbc029

Posts: 133   +252
Observer: "Very pretty, FPGA."
FPGA: "Thank you"

Observer: "Can I interact wi-"
FPGA: "No"

Observer: "Can I take a screensh-"
FPGA: "No"

Observer: "Can I email any-"
FPGA: "No"

Observer: "Can I expor-"
FPGA: "No"

Observer: "...thank you for your time"
 

VitalyT

Posts: 6,404   +7,207
Real Performance = Announced / (Dev Overestimate * MBSR).

MBSR = Marketing BS Ratio (~10x).

This will get you twice the performance, if you are very lucky.
 

Mr Majestyk

Posts: 1,449   +1,356
Well to the naysayers get used to FPGA, because it will be implemented in future desktop chips. I think even Meteor Lake might have something but if not Arrow Lake definitely does and AMD now owns Xilinx.

And yes FPGA's can be used to accelerate aspects of ray tracing such an the time consuming intersection tests. Will we see FPGA in future GPU's for desktop? Not sure about price at this stage, it maybe reserved for higher end stuff like Hopper class devices.
 

zamroni111

Posts: 372   +216
So, why compare to a Ryzen and not and Intel one?

Why it seems that AMD is only mentioned when they can be placed on a bad light?

I know that Daniel didnt do this, just reported it from another site.
if compared to intel, the result will be more overwhelming.
4900h uses tsmc n7 which is way ahead of latest intel "fake" 7
 

suarezvictor

Posts: 6   +10
This, seems like smoke and mirrors. FPGAs by nature can be tailored to specific tasks, that is literally their function. Should it surprise us that a chip specifically configured for ONE function runs better than a general purpose processor? Further, an ASIC configured for this would by definition blow the FPGA out of the water, as the silicon itself is application specific.

So, what am I missing? Other than including FPGAs in hardware to serve as reprogrammable silicon for specific tasks, we will continue to use CPUs for general purpose and accelerators (ASICs) for specific tasks.
Hi, author here. We don't just state that an FPGA can be better in certain task sthan a CPU, but "translating" an algorithm from a CPU to hardware is usually a very complex task, that's more complex if you need to involve for example, a mix of floating point and fixed point calculation, or vectors of those. We show here how a same unmodified source with a complex algorithm, can target an FPGA, simplifying the design. Since there's no CPU involved, you get efficiency gains. And you still keep the sources intact. Indeed, we showed how you can implement a peripheral in hardware, or run it as software in a microcontroller, with the same source and thus, logic (see the linked article in the video desciptions). We propose this like a tool that ease hardware development. The ability of running your design in software also makes it possible to rapidly test your design, so we selected a realtime game as an good example.
 

suarezvictor

Posts: 6   +10
The researchers used a Ryzen-powered laptop. I'm guessing that's all they had in the office.


For the power usage, it'd still beat them. But remove power constraints, and a top-end GPU would leave it for dust.
Hi, author here. Let's consider that a GPU has lots of limitations in terms of capabilities, while a FPGA can implement almost any logic circuit that you can imagine, within the limits of its size. And we can target not just FPGA but ASICs, once you define your processing in plain C (with some C++ additions).
 

mbk34

Posts: 389   +289
I read the article and came away wondering what the hell does FPGA stand for? I know I could google it but you kind of expect the writer to start with a quick definition.
 

neeyik

Posts: 2,253   +2,713
Staff member
Hi, author here. Let's consider that a GPU has lots of limitations in terms of capabilities, while a FPGA can implement almost any logic circuit that you can imagine, within the limits of its size. And we can target not just FPGA but ASICs, once you define your processing in plain C (with some C++ additions).
Hi Victor, thank you for taking the time to stop by and respond to some comments. GPUs do indeed have lots of limitations, but that's simply by design - they are, after all, pretty much a massively parallel collection of FP ALUs, lots of SRAM, and a handful of ASICs thrown in for good measure.

Could you describe the process you followed for compiling the code for the x86 platform?
 

suarezvictor

Posts: 6   +10
Hi Victor, thank you for taking the time to stop by and respond to some comments. GPUs do indeed have lots of limitations, but that's simply by design - they are, after all, pretty much a massively parallel collection of FP ALUs, lots of SRAM, and a handful of ASICs thrown in for good measure.

Could you describe the process you followed for compiling the code for the x86 platform?
Yes, the way of compiling the code is simple, just a call to the clang C++ compiler over the raytracer code at https://github.com/JulianKemmerer/PipelineC-Graphics/blob/main/tr.cpp plus the source that sends the calculated pixel colors to the display

This full command produces the executable code:
clang++-14 -DRTCODE=\"tr.cpp\" -D_FRAME_WIDTH=1920 -D_FRAME_HEIGHT=1080 -I../PipelineC -I../CflexHDL/include -O3 -fopenmp=libiomp5 -ffast-math `sdl2-config --cflags` simulator_main.cpp `sdl2-config --libs` -o tr_sim

The same code is processed untouched to generate a the hardware circuit, that runs in the FPGA without any kind of CPU nor program instructions, it's just interconnected logic.

In the video the workflow is described in full detail, the code shown there is exactly the one used for the demos.
 

suarezvictor

Posts: 6   +10
I read the article and came away wondering what the hell does FPGA stand for? I know I could google it but you kind of expect the writer to start with a quick definition.
There's a good article about what the FPGA is https://en.wikipedia.org/wiki/Field-programmable_gate_array

Basically, it's a chip with independent logic circuit blocks that you can interconnect at will with a lot of flexibility to implement your custom hardware. It can be a CPU, a communication receiver, or almost anything that you can image, within the constraints of the device size and maximum speed.
 

kiwigraeme

Posts: 1,309   +955
As stated above by Mr Majestyk this is why AMD bought Xilinx- all the big guys are trying to get this tech in their fold . Nvidia tried to get Arm ( not FPGA- but with AI - time from design to chip will get quicker )
as someone mentioned Ray tracing -image have some FPGA right next to RTX hardware - you could tweak existing input , processing , output etc
I have stated it multiple times - building your PC in future with be far more malleable - with so many modules you can add .
Process Video - programable FPGA filters etc - need super efficient background task while PC sleeps - run ARM chip only
 

Mjsun

Posts: 20   +41
Yes, the way of compiling the code is simple, just a call to the clang C++ compiler over the raytracer code at https://github.com/JulianKemmerer/PipelineC-Graphics/blob/main/tr.cpp plus the source that sends the calculated pixel colors to the display

This full command produces the executable code:
clang++-14 -DRTCODE=\"tr.cpp\" -D_FRAME_WIDTH=1920 -D_FRAME_HEIGHT=1080 -I../PipelineC -I../CflexHDL/include -O3 -fopenmp=libiomp5 -ffast-math `sdl2-config --cflags` simulator_main.cpp `sdl2-config --libs` -o tr_sim

The same code is processed untouched to generate a the hardware circuit, that runs in the FPGA without any kind of CPU nor program instructions, it's just interconnected logic.

In the video the workflow is described in full detail, the code shown there is exactly the one used for the demos.
This I get. Having the FPGA configured automatically is pretty nifty. Perhaps work with the article authors on a follow up that focuses less on the performance improvement that is inherent with FPGAs and more on the software itself. A good article highlighting this would show two use cases with the FPGA reconfigured automatically without user input, and how quickly this can be accomplished. Demonstrate say the ray tracing program running, then close that application and open and execute a video encoding task, or file compression/decompression.

I know the above also really has little to do with what you have accomplished, which again I take to be dynamic and automatic FPGA programming independent of the coder (Again, this is wild and Bravo!) but make sure you highlight what this enables: Allowing FPGA reprogramming with a naïve coder. This is kind of like high level code versus assembly. I am betting configuring the FPGA by hand tails better results, just as coding in x86 assembly with a skilled cider should perform better than someone coding in Python, but with much more time investment and skill required. You have created an FPGA compiler, essentially. Work with the article author to stress that accomplishment, not to stress what an FPGA can do which is not news.

Awesome work!
 

suarezvictor

Posts: 6   +10
This I get. Having the FPGA configured automatically is pretty nifty. Perhaps work with the article authors on a follow up that focuses less on the performance improvement that is inherent with FPGAs and more on the software itself. A good article highlighting this would show two use cases with the FPGA reconfigured automatically without user input, and how quickly this can be accomplished. Demonstrate say the ray tracing program running, then close that application and open and execute a video encoding task, or file compression/decompression.

I know the above also really has little to do with what you have accomplished, which again I take to be dynamic and automatic FPGA programming independent of the coder (Again, this is wild and Bravo!) but make sure you highlight what this enables: Allowing FPGA reprogramming with a naïve coder. This is kind of like high level code versus assembly. I am betting configuring the FPGA by hand tails better results, just as coding in x86 assembly with a skilled cider should perform better than someone coding in Python, but with much more time investment and skill required. You have created an FPGA compiler, essentially. Work with the article author to stress that accomplishment, not to stress what an FPGA can do which is not news.

Awesome work!
Focus of this work is to use C for targeting a CPU or FPGA with no code changes, and gain in development speed. This also shows that a CPU or program instructions are not always needed even for complex algorithms, they can be hardwired. Since the resulting architecture is simpler than a CPU, a side effect is reduction in power consumption. A known language like C should motivate software developers to design hardware, and considering that here are not as many hardware develpers, that should increase availability of new hardware.
 

WhiteLeaff

Posts: 56   +66
It's a stupid comparison, CPUs are not intended to run graphics, it would be fair to compare to a GPU, the latter would have the same abysmal difference in performance compared to a CPU.
 

suarezvictor

Posts: 6   +10
It's a stupid comparison, CPUs are not intended to run graphics, it would be fair to compare to a GPU, the latter would have the same abysmal difference in performance compared to a CPU.
FPGAs are not intented to run graphics either, we use a graphics application just as an example of a complex algorithm
 
Might as well have wrote an article on Robert Baruch's work, but you can't sensationalize it with "50x better than ryzen" can you?