Could someone explain to me why the card with less memory suffers with less bus bandwidth?
I'm under the impression that the 4 GB card wouldn't be able to saturate the bus as well because its own throughput would be inherently lower. I would expect a higher performing card to suffer from reduced bandwidth sooner.
The reasons for this will vary from game to game, but if we take Shadow of the Tomb Raider as an example, we can get a good idea of what's going on.
For each frame presented on the monitor, there are an awful lot of separate processes that have to take place. Let's assume all of the required assets (vertex buffers, index buffers, and texture buffers) from the frame are already loaded into the local memory - the graphics card's RAM. SotTR doesn't do much in the way of asset streaming, which is why this game is a good example to use here.
To render the frame, the GPU first draws out the scene's depth values, at the monitor's resolution, into a buffer that's stored in the RAM. The scene is then rendered again, generating two textures in the process - one to generate a texture that contains the scene's normal map and a shadow mask, the other is a velocity map. Both of these are also stored in the RAM but the important thing to note here is that these are larger than the monitor resolution.
Then the primary lighting and shadowing are done, all of which gets rendered into another set of buffers, again stored in the RAM (the shadow map buffer is especially large). After that, the scene is rendered for the third time, where three additional textures (HDR, albedo + roughness, normal + metallic). A forth rendering pass follows to generate the SSAO (screen space ambient occlusion) and yet more buffers are required for this.
Rendering pass number takes place after the SSAO pass to produce the screen space reflection texture (stored in RAM), before the final pass to do the post-processing (blur, bloom, DoF, TAA, tone mapping, etc) and the UI. All of this produces a final buffer that's actually HDR, so if the monitor is SDR,
another buffer is required to sample the HDR on into.
So TLDR version, the available space in the RAM is under serious demand during all of the rendering to create a single frame. The buffers generated during the passes can't be stored in system RAM (well, they can, but no developer would ever let this happen) so if there isn't sufficient room, then assets need to be swapped out - they're not sent back anywhere, as they're always stored in system RAM. It's simply a case of the video RAM is flagged as being available and if an asset is required, and it's not in local memory, another asset may need to be reallocated and then copied over.
This is why low RAM cards suffer the most because once assets start flying about during gameplay, the PCIe interface then gets hit.