"Effectively, we began this step with three channels (Y, Cb, and Cr), and ended up keeping the information intact in one full channel (the brightness component, Y) and got rid of half the information in the other two channels (Cb and Cr). By going from 3 channels to 1 + ½ + ½ = 2, we are now effectively at 2 channels, or 66% of our original image!"
I think this is wrong. In a 4:2:0 subsampling (replacing the Cb and Cr values of 2x2 blocks with a single one), the compression level should be 50%, not 66%. Instead of storing 4 Y + 4 Cb + 4 Cr = 12 values, you instead store 4 Y + 1 Cb + 1 Cr = 6 values (50% compression).
For the 4x4 case, instead of storing 16 Y + 16 Cb + 16 Cr = 48 values, we can store 16 Y + 1 Cb + 1 Cr = 18 values, or a 33% compression level.