Your regular DRAM memory controller will use the 64B bus to pull in a single 64B line of memory. In your modern x86 systems, that's equal to a cache line.
If you are only accessing a single 4B or 8B element of that cache line >80% of the time, as shown on the slides, you are wasting 7/8th of all memory bandwidth with irrelevant data. If you were to use that 64B memory bus to access 8x different 8B memory locations, you get a large boost to your effective bandwidth.
If you are only accessing a single 4B or 8B element of that cache line >80% of the time, as shown on the slides, you are wasting 7/8th of all memory bandwidth with irrelevant data. If you were to use that 64B memory bus to access 8x different 8B memory locations, you get a large boost to your effective bandwidth.