Great lecture so far. Haven't had the time to watch the whole lecture, but one thing I want to mention is that there are techniques to improve the buffer manager performance as for instance described here by Goetz Graefe:
http://www.vldb.org/pvldb/vol8/p37-graefe.pdf
I've implemented an even simpler solution for my Open Source Data Store (https://sirix.io) in that each page stores a number of references, which are itself lightweight pointer objects (in Java) and it simply stores an in-memory reference as well as a pointer to the location to fetch it from disk/a flash drive. If the buffer manager has these Objects as keys on eviction we can simply null the reference to the in-memory page instance.
I find it hard to reconcile the incredible generousity of making this material available for free on the internet with the cringeworthiness of Andy Pavlo's style. I love the material but the "6th form humour" is really off-putting and he doesn't need it.
Nobody needs anything. You didn't need to write your comment. And neither did I have to write this one. That's terrible criteria to judge anything, if it even means anything sensible at all.
I have to agree with the parent poster: the skits do take away from the content. I'm grateful the author was so generous to share his knowledge with the world, but the skits are distracting and off-putting.
Ha! It really comes down to personal preferences. Distinctive style of presenting material is what really makes a lecture engaging to me. Also, as a selection filter given the abundance of content online.
Personally, I enjoy the material on databases—but if I’m being honest, the main reason I watch the first lecture is to find out what sort of trouble Andy Pavlo has gotten into.
Andy’s gags are a tiny, tiny fraction of the material. Like, ~5min out of 10+ hours of lectures in a given class. They’re really easy to skip over if you don’t like them.
I've watched some of these and the material and teacher are awesome. Two questions come to mind:
1. A ton of effort seems to be spent on making things run in parallel, but that introduces quite a bit of overhead too, so how well does a sequential baseline actually perform? By sequential baseline I mean a single thread that just executes all incoming transactions one by one in sequence.
2. This course seems to spend a lot of time on things that the teacher says are things you shouldn't do anyway. For instance there is an entire lecture on skip lists and Bw-trees, and at the end the teacher mentions that these are terrible. This is interesting from a historical perspective, but not only does this take a lot of time, I also lose track of which things you should and which things you shouldn't do. It'd be interesting to have a compressed course that spends less time on things you should not do, perhaps by adding annotations to the video to skip sections that are about things you should not do.
A ton of effort seems to be spent on making things run in parallel, but that introduces quite a bit of overhead too, so how well does a sequential baseline actually perform? By sequential baseline I mean a single thread that just executes all incoming transactions one by one in sequence.
You should check out the H-Store research project[1] and its commercial successor VoltDB. They’re basically a study in how much you can win with a federation of single-threaded database systems.
I've implemented an even simpler solution for my Open Source Data Store (https://sirix.io) in that each page stores a number of references, which are itself lightweight pointer objects (in Java) and it simply stores an in-memory reference as well as a pointer to the location to fetch it from disk/a flash drive. If the buffer manager has these Objects as keys on eviction we can simply null the reference to the in-memory page instance.