Latency between CCX is probably still there as probably nothing changes except L3 cache that is shared with both 4-core groups (it might be no longer CCX) whereas on Zen/Zen+/Zen2 it's split.
This is still best we have
and unfortunately that latency between 4-core groups is still there. To clarify: so far AMD hasn't said
anything about "8-core CCX". They only said that 8 cores share same L3 cache that is totally different thing than "8-core CCX" that some sites tend to promote.
But picture above is about server version and desktop version might be different. With integrated memory controller like latest APU's perhaps.
AVX-512 support with 512-bit wide FPU units easily give that "around 50% improvement". In supported software of course.
Skylake is more effective on games primarily because overall memory latency is much lower. Zen2 with chiplets is purely server design after all. While chiplet design with IO die gives some advantages, like same memory latency for all chiplets, that hardly matters when there is only one chiplet.