mmap() will keep things in memory after first loading, but the page cache will _also_ keep things in memory after first loading. The difference is in order to re-use that you still need to read the file and store yourself (requiring 2x memory), instead of just doing a memory access. This has two consequences:
* 2x memory. A 20G data set requires 40G (20 for page cache and 20 for LLaMA)
* Things would be _even slower_ if they weren't in page cache after first loading. mmap is fast because it does not require a copy and reduces the working set size
* 2x memory. A 20G data set requires 40G (20 for page cache and 20 for LLaMA)
* Things would be _even slower_ if they weren't in page cache after first loading. mmap is fast because it does not require a copy and reduces the working set size