Thank You for the very detailed response. My understanding is that a CU is composed of shaders, TMUs and ROPs. I'm still not sure what to make of these responses. I don't know if your trying to say that die space is limited so much so that AMD cannot actually increase the number of CUs on the 5000 series of chips. Or if your saying that they could produce an APU with 16 CUs but it would not help because...... 16 x 64 SPs is 1024 shaders.
In the slide below, you can see an annotated die shot of the Renoir APU - I.e. the 4000 series:
The GPU part of the chip is the orange block - each CU is a horizontal strip, so there are 8 in this die. The ROPs haven't been labelled but they lie in between the CU block and the I/O strip, in green, on the right. As you can see, space is a real premium in the APU die layout, and finding room for 16 CUs is a real challenge.
But the point I was making before, was that just increasing the CUs (which adds in more shaders and TMUs) and doing nothing else, won't make much of a difference. This is because the GPUs in APUs are mostly limited by the number of ROPs and the memory interface.
AMD's GPU architecture has the ROPs directly linked to the GPU's L2 cache, so they can't add more of one, without adding more of the other - and doing so requires more space. So increasing the CU count, but not the ROP and L2 cache, would result in little benefit - increasing all it absolutely would improve matters, but would then require a lot more die space.