The footguns can be solved in part by the type-system (preventing certain types ...

simias · on Jan 23, 2021

>How else would you lazy-load a database of (say) 32GB into memory, almost instantly?

By using an existing database engine that will do it for me. If you need to deal with that amount of data and performance is really important you have a lot more to worry about than having to use unsafe blocks to map your data structures.

Maybe we just have different experiences and work on different types of projects but I feel like being able to seamlessly dump and restore binary data transparently is both very difficult to implement reliably and quite niche.

Note in particular that machine representation is not necessarily the most optimal way to store data. For instance any kind of Vec or String in rust will use 3 usize to store length, capacity and the data pointer which on 64 bit architectures is 24 bytes. If you store many small strings and vectors it adds up to a huge amount of waste. Enum variants are also 64 bits on 64 bit architectures if I recall correctly.

For instance I use bincode with serde to serialize data between instances of my application, bincode maps almost 1:1 the objects with their binary representation. I noticed that by implementing a trivial RLE encoding scheme on top of bincode for running zeroes I can divide the average message size by a factor 2 to 3. And bincode only encodes length, not capacity.

My point being that I'm not sure that 32GB of memory-mapped data would necessarily load faster than <16GB of lightly serialized data. Of course in some cases it might, but that's sort of my point, you really need to know what you're doing if you decide to do this.

burntsushi · on Jan 23, 2021

> How else would you lazy-load a database of (say) 32GB into memory, almost instantly?

That's what the fst crate[1] does. It's likely working at a lower level of abstraction than you intend. But the point is that it works, is portable and doesn't require any cooperation from the OS other than the ability to memory map files. My imdb-rename tool[2] uses this technique to build an on-disk database for instantaneous searching. And then there is the regex-automata crate[3] that permits deserializing a regex instantaneously from any kind of slice of bytes.[4]

I think you should maybe provide some examples of what you're suggesting to make it more concrete.

[1] - https://crates.io/crates/fst

[2] - https://github.com/BurntSushi/imdb-rename

[3] - https://crates.io/crates/regex-automata

[4] - https://docs.rs/regex-automata/0.1.9/regex_automata/#example...