More

dougws · on Dec 2, 2016

This is a pretty good blog post (based on a really good paper) that describes the fundamental differences between interpreters and compilers: http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...

dougws · on Sept 21, 2016

You've just invented the field of program synthesis, which is a super active research area right now in programming languages! Here's the paper list from a class at UW focused on this area: http://courses.cs.washington.edu/courses/cse599a2/15wi/ . The Sumit Gulwani paper from the first week is a really good introduction to the field.

dougws · on Aug 24, 2016

Speaking from academic computer science, I have never heard of anyone making the decision not to work on something for this reason--perhaps because if an idea could immediately be turned into a successful company, it's probably not good research! And most (all?) CS PhD programs (which include an MS in passing in the USA) guarantee full funding for a number of years and most successful students are funded the whole time.

dougws · on March 1, 2016

You absolutely shouldn't feel "mentally inferior" to folks with PhDs. Doing academic research, which probably involves getting a PhD, is almost certainly the best way to make a significant contribution to Computer Science. These two facts aren't really in conflict, I don't think.

arcanus · on March 1, 2016

Not at all, and I'm skeptical any PhD (myself included) would ever say otherwise.

dougws · on Aug 6, 2015

We are reading! This is a great question. We currently don't prove anything about liveness. We'd love to work on this.

As you probably know, Raft and other consensus algorithms are not guaranteed to be live in all situations. But subject to some assumptions about the frequency of failure, they are guaranteed to make progress.

In Verdi, systems are verified with respect to semantics for the network they are running on. Our semantics currently don't include any restrictions about how often failures can happen; a failure "step" can occur at any time. We're not sure what the best way is to introduce this kind of restriction, but we've got a couple ideas. One would be to guarantee that the total number of failures has some finite bound which is unknown to the system itself but which is available during verification. Another would be to model failure probabilistically. We will probably end up doing at least one of these things in the next year or so :).

mjb · on Aug 6, 2015

Thanks. That sounds like extremely interesting research. Being able to say things about liveness relative to probability of failure (or the distribution of probability of failure) would be very interesting.

dougws · on Aug 6, 2015

Hey, I'm Doug Woos--thanks for this excellent summary! It's worth noting that the Raft proof was completed by a team of people, including me, my research partner James Wilcox (http://homes.cs.washington.edu/~jrw12/), and the other folks listed on our web page at http://verdi.uwplse.org/

r0naa · on Aug 6, 2015

Apologies to your research partner for leaving him out, it's edited! Congratulations on the proof by the way, I am looking forward to this week-end so I can have some time to appreciate it with more depth! :-)

dougws · on Aug 6, 2015

Thanks! Let us know if you have any questions about the proof itself--unfortunately, it's not particularly well-documented, and Coq isn't super easy to read in the first place.

notsony · on Aug 6, 2015

Can you recommend and (1) books and (2) online lectures/courses on proofs which use Coq? Thanks!

dougws · on Aug 6, 2015

At this point, the standard intro text is Software Foundations [1]. I highly recommend it; it will teach you Coq and also probably make you a better programmer. After SF, Certified Programming with Dependent Types [2] gets more into the practice of proving serious programs correct. These books are both available online in the form of literate Coq files.

As far as online lectures, OPLSS [3] often has Coq lectures which are quite good.

[1] http://www.cis.upenn.edu/~bcpierce/sf/current/index.html

[2] http://adam.chlipala.net/cpdt/

[3] https://www.cs.uoregon.edu/research/summerschool/summer15/

notsony · on Aug 6, 2015

Thanks!

whitten · on Aug 6, 2015

What would be the process to show that another key-value pair system is able to satisfy all the specifications so it is comparable to vard ?

dougws · on Aug 6, 2015

One could imagine doing some kind of differential testing of another key-value store against vard to ensure that for a random but large number of requests, they return the same value. I'm not sure that doing that would gain you anything over using Jepsen [1], though. One could also imagine implementing another key-value store in Verdi and proving it equivalent to vard; reasons to do this might include using better data structures or serialization mechanisms, or a different consensus protocol. I'm not sure that this would require significantly less effort than proving this new key-value store (usrd? whittend? :)) correct from scratch, but I'd love to see someone do this!

[1] https://github.com/aphyr/jepsen

dougws · on Aug 26, 2014

Investigating theft takes up police time, which expends taxpayer resources. The market wouldn't take this into account.

Also, "doesn't California have bigger issues to worry about" is just silly, as if the government of an enormous state can only do one thing at a time.

icameron · on Aug 26, 2014

Police don't investigate stolen phones.

staunch · on Aug 26, 2014

They do investigate violent crime. At least theoretically.

http://www.crimemapping.com/map.aspx?ll=-13620775.579218755,...

sbisker · on Aug 26, 2014

What's to say they wouldn't if they were more uncommon?

dougws · on Aug 21, 2014

Your objections to MongoDB's model seem reasonable, but I don't see any evidence in either this comment or the linked blog post that DocumentDB is better (especially in the absence of benchmarks). What is this "battle-tested distributed systems framework"? Several of your complaints about MongoDB have to do with the interaction between persistence to disk and replication to the network; as the Multi-Paxos algorithm does not specify when data should be written to disk (much less what the format should be), what reason is there to believe that DocumentDB does this any better?

I'm totally willing to believe that DocumentDB beats the pants of MongoDB on just about every axis (in fact, that seems pretty likely) but it's going to take some actual numbers and a better description of the internals.

reubenbond · on Aug 21, 2014

I agree with you - we need numbers before making that kind of conclusion and I haven't run any benchmarks on the public version of DocDB. I'd like to see someone measure MongoDB on Azure vs DocDB on Azure - even then it might not be a fair measurement of db vs. db, since we don't know what machines DocDB is hosted on.

All I can really say is that the replication model provides a significant performance boost over MongoDB in the multiple replica (i.e., production) scenario.

We were using MongoDB at Microsoft for a while (I left MS almost a year ago). I was developing a real-time metrics system with it. It was very unstable at our target load (500k increments per minute, high percentage of tomorrow's documents preallocated the day before). We only managed maybe 10% of that with MongoDB, IIRC. Sometimes it would choke and not come back until I restarted the cluster (~30 machines total, I believe. 3 replicas * 10 shards).

We were so sure that MongoDB should be able to handle this scenario, since they talk about it in their documentation. After talking with the MongoDB devs, we came to the conclusion that even though we were issuing increment operations on preallocated documents, MongoDB was:

a) using a global lock on the "local" db used for replication, and

b) "replicating via disk" instead of via the network. In other words, replication requires writing to the journal journal before other members of the replica set have a chance to apply the change and ack back. This results in a loss of concurrency.

The lack of async query support in the C# driver didn't help either.

Eventually we used a replicated, write-back cache which sits atop the framework DocDB uses. Not a fair comparison, but the goal was achieved easily with 1/3rd the hardware. We just backed it onto Azure Table Storage. Our queries were all range queries, which table storage supports.

I can't talk about the framework, unfortunately.

ddorian43 · on Aug 22, 2014

next time you need fast counters, try hypertable (non-reading increments)

reubenbond · on Aug 22, 2014

It would still make sense to use the replicated write-back cache to avoid trips to disk. We were considering replacing MongoDB with Cassandra, though.

I wanted to avoid having to deploy and maintain a database system, so using table storage was a solid choice.

dougws · on Feb 28, 2014

Seems like this is pretty comparable to Vagrant using the Docker backend. Anyone have thoughts on pros and cons of each?

on Feb 28, 2014

[deleted]

derefr · on Feb 28, 2014

The parent didn't say "Vagrant," they said "Vagrant using the Docker backend."

dougws · on Oct 24, 2013

Did you read the whole article? If not, go ahead and read the part about pressuring the intern into giving his assistant oral sex while he (Kennedy) watched. That is absolutely not "normal behavior".

aaron695 · on Oct 24, 2013

This is standard stuff for people in power, don't you read the news?

This includes sports players and it's not psychopathy that would helps sport players, it's about driven people if anything.

Or are you confusing normal with 21st century, western moral?

nyrina · on Oct 24, 2013

I'm honestly a bit scared if you view this as normal. I would really consider seeing a professional, if I were you, just to get those assumptions of yours checked.

They might agree with you, in which case I'm wrong

toasterlovin · on Oct 24, 2013

There's a big difference between thinking something is normal and thinking it is right. Normal != right.