So speaking as someone that knows very little on that part, GPUs' dont have something equivalent to IPC as CPUs does?
Example, a 4 core CPU with higher IPC than a 4 core CPU will be faster , so does the same thing applies on the GPU realm?
To a certain degree, yes. CPUs have to deal with a lot of highly variable, branching threads, whereas for the most part, GPUs don't (unless it's complex compute stuff).
The latter typically handles threads (sequences of operations) in batches of 32, all doing the same instruction at any one given moment in time to 32 pieces of information. There's relatively little difference between today's GPUs in how quickly they issue and process those instructions - for example, FP32 multiply can be issued in 1 cycle and processed in around 4; FP64 multiply is about ten times longer.
But those IPCs are heavily dependent on the data being ready to hand. As they have to read and write vast amounts of data all the time, the flow of bits throughout the GPU is critical to its IPC. In the case of the GPUs tested in this article, both have 40 Compute Units - each of which contains two 32 thread units. So, depending on the instruction, the chip could be trying to read or write 40 x 2 x 32 = 2560 32-bit data values at once. At a clock speed of 2 GHz, that would require a peak total internal bandwidth of 2 TB/s.
It sounds like a lot but in the case of the RX 6700 XT, it has a total peak theoretical bandwidth of 5.29 TB/s between the Level 1 and Level 2 caches. That sounds great and the combined peak bandwidth is even higher in some areas, but in reality, the actual bandwidth is a little less than this (and a
lot less for the VRAM).
In short, a GPU's IPC is complex