Intel AVX10, APX are the next step in the x86 instruction set evolution

Alfonso Maruccia

Posts: 1,025   +302
Staff
Why it matters: Intel is gearing up for what the company considers the "next major step" in the evolution of the original x86 instruction set architecture (ISA). The Santa Clara corporation is expanding the number of registers for general-purpose x86 operations while introducing a new, all-inclusive vector instruction set based on the renowned AVX-512 ISA.

As explained on Intel's official site for developers, the x86 architecture is now utilized in data centers, personal computers, and various other environments requiring performance-oriented CPUs and heavy computational workloads. Originally introduced in 1978 with the 8086 CPU, the original x86 ISA featured only eight 16-bit general-purpose registers, which were later doubled in number and quadrupled in size.

Registers play a critical role in a CPU, as they store the bits of data the processor actively works on at any given moment. This is why Intel presents the Advanced Performance Extensions (APX) tech as a significant evolutionary step for the x86 ISA. It expands the entire x86 instruction set, granting access to more registers and introducing new features to enhance overall CPU performance.

APX, according to Intel, doubles the number of general-purpose x86 registers from 16 to 32, providing compilers with more space to store data. Compared to a binary program compiled for the "baseline" Intel x64 ISA, the corporation explains, APX-compiled code contains "10% fewer loads and more than 20% fewer stores."

In simple terms, register accesses are faster and consume "significantly less dynamic power" compared to complex load and store operations. This improved efficiency in next-generation Intel CPU models could result in higher performance levels. APX will also expand the conditional instruction set of the x86 ISA, which was first introduced with the Pentium Pro processor with the CMOV/SET instructions.

These instructions are extensively used by today's compilers, and APX seemingly improves the branch prediction capabilities of Intel CPUs. According to Intel, programmers can take advantage of APX features by simply recompiling their code, as no source code changes are expected. APX once again demonstrates the advantage of the "variable-length instruction encodings of x86," with new features enhancing the entire ISA through "only incremental changes" to the underlying silicon for decoding instructions in hardware.

In addition to APX, future Intel CPU generations will include the new AVX10 ISA. This tech, as explained in the official paper, is a new major implementation of the AVX-512 vector instruction set first proposed by Intel in 2013. The new ISA will establish a "common, converged vector instruction set" across all Intel CPU architectures, making it supported on all future processors, including both Performance cores (P-cores) and Efficient cores (E-cores).

Intel initially introduced support for AVX-512 vector instructions on 12th-Gen Core consumer CPUs, but they only worked on P-core units and were later unexpectedly disabled with a firmware microcode update. Among other things, vector extensions to the x86 ISA proved to be extremely popular among developers trying to emulate complex, modern console architectures like the PlayStation 3 (RPCS3).

The AVX10 extension of the x86 ISA will provide support for all previously introduced AVX (vector) instruction extensions, with a maximum vector register length of 256 bits. The initial AVX10 version (AVX10.1) won't include any new instructions; its sole purpose is to ease the transition from AVX-512 to the proper, all-core compatible (P-cores, E-cores) AVX10 implementation known as AVX10.2.

Permalink to story.

 
AVX10 is nothing else than Intel admitting big-little approach on current Alder Lake CPUs simply sucked from beginning. Intel just re-enables AVX512 support they previously disabled. Good luck for those who own Alder Lake, it's EOL already. It does not support instruction set almost every previous and future Intel CPU supports.

APX is different thing. Since x86-64 introduced, only 16 bit general purpose registers were strange thing. Perhaps to avoid cost issues. Took 20 years to get more, better later than never.
 
"The AVX10 extension of the x86 ISA will provide support for all previously introduced AVX (vector) instruction extensions, with a maximum vector register length of 256 bits."

This is imprecise. You're referring to AVX/256, which is a subset that might be implemented in E-cores for example (doesn't need full-width 512-bit register file). There is also AVX10/512 that supports all of AVX-512 but will only work on the P-cores.

This is an interesting solution for Intel's hybrid arch screw-up, the VMX capabilities they mention allow you to have a P+E CPU where the operating system enforces a limit of AVX/256, in my understanding this limit will apply to all cores, even to the AVX10/512-capable P-cores. But it's a good tradeoff for conventional applications (99% of consumer stuff we run on Windows Home/Pro or Linux) that cannot be bothered with much more complex coding, tuning and scheduling. I suspect you will also be able to use virtualization for more flexibility, for example you want to run an app like Adobe Premiere that has some functionality that's 2X faster with 512-bit vectors, it will stuff that into a Windows Service that runs in a small Hyper-V environment which has access only to P-cores but with AVX10/512 enabled.
 
AVX design was a mistake to begin with (IMHO). ARM's NEON extensions (optional on 32-bit ARMs, although almost all Android-era phones/tablets had it... and required on 64-bit ARM), allow one to specify how many bits of data they wanted the SIMD instructions to work on. They do tell you what the hardware capabilities are but you don't have to keep adding new instructions to be able to have wider SIMD units.

So the equivalent of AVX, AVX2, AVX512? Instead of 3 sets of instructions, you have a single set of instructions, if you tell it to run 512-bit SIMD (AVX512-equivalent) on a chip with 128-bit SIMD hardware? Fine, the instruction will run in 4x the cycles it would on a chip with 512-bit SIMD hardware. If some future chip comes out with like 4096-bit SIMD, the compiler can feel free to spit out 4096-bit SIMD instructions and any chip with NEON will run them.

I'm not sure I quite understand the part of the article about AVX10, 10.1, and 10.2... how having no new instructions eases a transition to AVX10.2, and if AVX10.2 is just re-adding AVX512 instructions, why they would give anything of this a new name at all. Ahh well, my current Ivy Bridge doesn't even have AVX2 (just AVX), and my 11th gen Intel notebook has full AVX512, so I really won't have to worry about it.

In contrast, more CPU registers sounds interesting. I mean I'm concerned about backwards compatibility here, but if it's just for AVX10.x stuff (as it appears it may be) I suppose they'll just have some AVX10.x-compatible code and some other code that uses AVX or AVX2 (or no AVX at all) instead, as they (usually...) do now.
 
Last edited:
Back