New technology enables GPUs to use PCIe-attached memory for expanded capacity

Skye Jacobs · Jul 4, 2024

In brief: GPUs have memory limitations when facing the demands of AI and HPC applications. There are ways around this bottleneck, but the solutions can be expensive and cumbersome. Now, a startup headquartered in Daejeon, South Korea, has developed a new approach: using PCIe-attached memory to expand capacity. Developing this solution required jumping through many tech hoops and there are still challenges ahead. Namely, will AMD, Intel, and Nvidia support the technology?

Memory requirements stemming from advanced datasets for AI and HPC applications often swamp the memory built into a GPU. Expanding that memory has typically meant installing expensive high bandwidth memory, which often introduces changes to the existing GPU architecture or software.

One solution to this bottleneck is being offered by Panmnesia, a company backed by South Korea's KAIST research institute, which has introduced new tech that allows GPUs to access system memory directly through a Compute Express Link (CXL) interface. Essentially, it enables GPUs to use system memory as an extension of their own memory.

Called CXL GPU Image, this PCIe-attached memory has a double-digit nanosecond latency that is significantly faster than traditional SSDs, the company says.

Panmnesia had to overcome several tech challenges to develop this system.

CXL is a protocol that works on top of a PCIe link, but the technology has to be recognized by an ASIC and its subsystem. In other words, one cannot simply add a CXL controller to the tech stack as there is no CXL logic fabric and subsystems that support DRAM and/or SSD endpoints in GPUs.

Also, GPU cache and memory subsystems do not recognize any expansions except unified virtual memory (UVM), which is not fast enough for AI or HPC. In tests by Panmnesia, UVM performed the worst among all tested GPU kernels. The CXL, however, provided direct access to expanded storage via load/store instructions, eliminating the issues hampering UVM such as overhead from host runtime intervention during page faults and transferring data at the page level.

What Panmnesia developed in response is a series of hardware layers that support all of the key CXL protocols, consolidating them into a unified controller.

The CXL 3.1-compliant root complex has multiple root ports supporting external memory over PCIe and a host bridge with a host-managed device memory decoder that connects to the GPU's system bus and manages the system memory.

There are other challenges that Panmnesia is facing that go beyond its control, a big one being that AMD and Nvidia must add CXL support to their GPUs. It is possible that industry players decide they like the approach of using PCIe-attached memory for GPUs – and go on to develop their own technology.

Permalink to story:

New technology enables GPUs to use PCIe-attached memory for expanded capacity

i like foxes · Jul 5, 2024

What is the use of it ?! Even now the GPUs use system RAM when dedicated VRAM is not enough . PCI-E 4.0 speed - 31.5GB/s max . DDR4 3600MT/s dual channel - 37-51GB/s .

Freddie159 · Jul 5, 2024

I like foxes said:
What is the use of it ?! Even now the GPUs use system RAM when dedicated VRAM is not enough . PCI-E 4.0 speed - 31.5GB/s max . DDR4 3600MT/s dual channel - 37-51GB/s .

Not in all cases they don't, I run a program that uses the gpu and when the gpu's VRAM is full that's all the data that can be loaded onto the gpu, trying to run more tasks means I need gpu's with more VRAM. My pc's generally have 32gb, 64gb or 128gb of ram in them and it would be very nice to let the gpu use some of that memory to run more tasks at the same time.

i like foxes · Jul 5, 2024

Freddie159 , my GPU has 8GB VRAM but Task manager says 16GB in total . Apparently it can use 8GB of the sys RAM in addition

Puiu · Jul 5, 2024

I like foxes said:
Freddie159 , my GPU has 8GB VRAM but Task manager says 16GB in total . Apparently it can use 8GB of the sys RAM in addition

Performance-wise using PCIe should be much faster. We are talking about huge data sets that can take a very long time to process.

i like foxes · Jul 5, 2024

Puiu said:
Performance-wise using PCIe should be much faster. We are talking about huge data sets that can take a very long time to process.

How could it happen !? GPU will work with this kind of memory without a limit , is this a joke !? Moreover PCI-E 4/5 is not faster than DDR4/5 respectively .

Puiu · Jul 5, 2024

I like foxes said:
How could it happen !? GPU will work with this kind of memory without a limit , is this a joke !? Moreover PCI-E 4/5 is not faster than DDR4/5 respectively .

Depending on the implementation, it is. We are talking about huge numbers, not 64-128GB of RAM that can run at 6-7000MHz. And if it needs to go through the CPU first or not, not everything is direct.

Since they are comparing it to SSDs this means that they expect your workload to exceed your RAM capacity by a significant amount.

This reminds me a lot of the GPUs AMD used to make for servers/workstations that had SSD slots that could act as extra VRAM. But that didn't work well because of the over-reliance on the API they made for it.

i like foxes · Jul 5, 2024

PCI-E is the limiter and GPU abilities themselves .

purpleduggy · Jul 5, 2024

I like foxes said:
How could it happen !? GPU will work with this kind of memory without a limit , is this a joke !? Moreover PCI-E 4/5 is not faster than DDR4/5 respectively .

you forget that is just per lane, if you use multiple lanes, you get more bandwidth. Server GPUs have multiple lane access unlike desktop GPUs performance CPUs have multiple PCIe lanes. the ryzen 9 7950x has 28 pcie5 lanes. server CPUs have 100+. if the motherboard supports it you can add Terabytes of GPU RAM over multiple pcie lanes.

purpleduggy · Jul 5, 2024

I like foxes said:
What is the use of it ?! Even now the GPUs use system RAM when dedicated VRAM is not enough . PCI-E 4.0 speed - 31.5GB/s max . DDR4 3600MT/s dual channel - 37-51GB/s .

just use more lanes at the same time. CPUs have multiple PCIe lanes. high end has 28, Threadripper has 64, Epyc servers 100+, we just tend to not use it on desktop because there has been no need to yet. if desktop GPUs are designed for this, they would do pcie6 for 126GB/s on 16x lanes. it will definitely be worth it.

New technology enables GPUs to use PCIe-attached memory for expanded capacity

Skye Jacobs

Posts: 607 +13

i like foxes

Posts: 312 +218

Freddie159

Posts: 198 +116

i like foxes

Posts: 312 +218

Puiu

Posts: 7,056 +6,440

i like foxes

Posts: 312 +218

Puiu

Posts: 7,056 +6,440

i like foxes

Posts: 312 +218

purpleduggy

Posts: 6 +7

purpleduggy

Posts: 6 +7

Similar threads

Latest posts