It can be, due to the high number of passes that take place in a modern game. Each one will write and subsequently read frame data out of the VRAM, and the allocation of memory for this is more important than the asset buffers (e.g. vertex, index, constant, texture). Thus you'd want frame resources stay put and assets get hoofed out.
DRAM latency is pretty poor, even with the GDDR6, so any reads that result in a complete cache miss (I.e. it's not in the shared memory, L1 or L2) will be stalled badly. Since this is all known and well understood, the drivers and the architecture of the GPU is designed to have more than enough threads on the go to mask stalls caused by memory accesses. And where it just can't be helped, having a wider bus means more parallel accesses can take place concurrently, resulting in threads being stalled for a shorter amount of time.
However, if a thread needs such data and it's not in the local memory, the stall will be a far more a serious one, although it will only impact a fairly small section of the overall render time. But that said, it is very dependent on the game, the rendering within, and the settings used.
For example,
Doom Eternal uses single large memory pools for vertices and textures, and streams data in and out of them as required, throughout numerous passes. Ordinarily it wouldn't be a massive problem if there wasn't enough room in the local memory for them, but as the renderer accesses this information very frequently, it's performance is hit a lot harder than for other games.
Another aspect to note is that in Ampere, the ROPs are no longer tied to the L2 cache blocks and memory controllers. In Turing and all previous architectures, each ROP clusters is directly connected to a 512 kB L2 cache block, which in turn is connected to a memory controller. So the size of the memory bus dictated the number of ROPs (and thus the performance of all read/writes).
In Ampere, the ROPs are part of the GPCs, so the bus width and ROP count are independent of each other. This is why the RTX 3060 Ti has 80 ROPs in total (16 more than a 2080 Super), so coupled with its greater memory bandwidth, it'll have better read/write performance at all times than the 3060. And even in titles where the latter's larger memory footprint would be expected to come into play, neither model is really aimed at 4K max-quality-settings.