Why it matters: The Linux kernel includes an ancient trick to deal with possible incompatibilities in early ACPI implementations. Nowadays, the trick isn't needed and just makes thing worse for AMD CPUs by penalizing performance. A patch should arrive soon.
The incredibly successful Zen architecture has turned the modern CPU market upside down, bringing AMD to the top of the performance race and giving users a much needed competitor to the long-lasting Intel run. There is a computing area, however, where AMD CPU are still suffering from a performance penalty – even though there are no actual reasons to justify this state of affairs.
While using a Linux-based operating system, AMD CPUs are slowing down when they shouldn't. The reason for this weird behavior dates back to 2002, when support for the Advanced Configuration and Power Interface (ACPI) standard was first added to the open source kernel. Earlier ACPI implementations had to deal with some compatibility issues, hence the developers had to anticipate some weird behaviors like chipsets moving to an idle state a bit later than expected.
The issue forced the aforementioned developers to introduce a "dummy wait op" in the kernel, forcing a redundant data reading operation before the CPU could stop completely with the STPCLK# command. The dummy wait op was added to Linux in 2002, and it's still there even though processors based on the Zen architecture don't need the workaround anymore.
The worst part is that the dummy wait op is slowing down the CPU, as AMD engineer Prateek Nayak explained in a recent patch to the kernel: in specific workloads on Linux systems, Nayak said "a significant amount of time is spent in the dummy op which incorrectly gets accounted as C-State residency." C-states are an ACPI feature designed to save power when the CPU doesn't need to stay awake, and the dummy wait ops can make an AMD CPU go even deeper in the C-state hierarchy thus slowing down its return to fully awake operations.
While testing different versions of the kernel (on a dual-socket Zen3 system), Nayak discovered that his patch for removing the dummy wait op brought a remarkable improvement in tbench performance – from a 1,390 percent increase in minimum MB/s throughput to a mean 51 percent increase over the baseline kernel.
The decades-old code for ACPI compatibility doesn't need to stay in Linux anymore, so Nayak's patch will likely be added to an upcoming version of the kernel - maybe even version 6.0 expected to ship next week, just before the introduction of the Rust programming language in Linux 6.1 as decided by Penguin Maximo Linus Torvalds.
As for Intel processors, the dummy wait op doesn't seem to pose a performance issue even though an urgent patch has been already submitted.
Image credit: Ryan