Hacker News new | past | comments | ask | show | jobs | submit | dougws's comments login

This is a pretty good blog post (based on a really good paper) that describes the fundamental differences between interpreters and compilers: http://blog.sigfpe.com/2009/05/three-projections-of-doctor-f...


You've just invented the field of program synthesis, which is a super active research area right now in programming languages! Here's the paper list from a class at UW focused on this area: http://courses.cs.washington.edu/courses/cse599a2/15wi/ . The Sumit Gulwani paper from the first week is a really good introduction to the field.


Speaking from academic computer science, I have never heard of anyone making the decision not to work on something for this reason--perhaps because if an idea could immediately be turned into a successful company, it's probably not good research! And most (all?) CS PhD programs (which include an MS in passing in the USA) guarantee full funding for a number of years and most successful students are funded the whole time.


You absolutely shouldn't feel "mentally inferior" to folks with PhDs. Doing academic research, which probably involves getting a PhD, is almost certainly the best way to make a significant contribution to Computer Science. These two facts aren't really in conflict, I don't think.


Not at all, and I'm skeptical any PhD (myself included) would ever say otherwise.


We are reading! This is a great question. We currently don't prove anything about liveness. We'd love to work on this.

As you probably know, Raft and other consensus algorithms are not guaranteed to be live in all situations. But subject to some assumptions about the frequency of failure, they are guaranteed to make progress.

In Verdi, systems are verified with respect to semantics for the network they are running on. Our semantics currently don't include any restrictions about how often failures can happen; a failure "step" can occur at any time. We're not sure what the best way is to introduce this kind of restriction, but we've got a couple ideas. One would be to guarantee that the total number of failures has some finite bound which is unknown to the system itself but which is available during verification. Another would be to model failure probabilistically. We will probably end up doing at least one of these things in the next year or so :).


Thanks. That sounds like extremely interesting research. Being able to say things about liveness relative to probability of failure (or the distribution of probability of failure) would be very interesting.


Hey, I'm Doug Woos--thanks for this excellent summary! It's worth noting that the Raft proof was completed by a team of people, including me, my research partner James Wilcox (http://homes.cs.washington.edu/~jrw12/), and the other folks listed on our web page at http://verdi.uwplse.org/


Apologies to your research partner for leaving him out, it's edited! Congratulations on the proof by the way, I am looking forward to this week-end so I can have some time to appreciate it with more depth! :-)


Thanks! Let us know if you have any questions about the proof itself--unfortunately, it's not particularly well-documented, and Coq isn't super easy to read in the first place.


Can you recommend and (1) books and (2) online lectures/courses on proofs which use Coq? Thanks!


At this point, the standard intro text is Software Foundations [1]. I highly recommend it; it will teach you Coq and also probably make you a better programmer. After SF, Certified Programming with Dependent Types [2] gets more into the practice of proving serious programs correct. These books are both available online in the form of literate Coq files.

As far as online lectures, OPLSS [3] often has Coq lectures which are quite good.

[1] http://www.cis.upenn.edu/~bcpierce/sf/current/index.html

[2] http://adam.chlipala.net/cpdt/

[3] https://www.cs.uoregon.edu/research/summerschool/summer15/


Thanks!


What would be the process to show that another key-value pair system is able to satisfy all the specifications so it is comparable to vard ?


One could imagine doing some kind of differential testing of another key-value store against vard to ensure that for a random but large number of requests, they return the same value. I'm not sure that doing that would gain you anything over using Jepsen [1], though. One could also imagine implementing another key-value store in Verdi and proving it equivalent to vard; reasons to do this might include using better data structures or serialization mechanisms, or a different consensus protocol. I'm not sure that this would require significantly less effort than proving this new key-value store (usrd? whittend? :)) correct from scratch, but I'd love to see someone do this!

[1] https://github.com/aphyr/jepsen


Investigating theft takes up police time, which expends taxpayer resources. The market wouldn't take this into account.

Also, "doesn't California have bigger issues to worry about" is just silly, as if the government of an enormous state can only do one thing at a time.


Police don't investigate stolen phones.


They do investigate violent crime. At least theoretically.

http://www.crimemapping.com/map.aspx?ll=-13620775.579218755,...


What's to say they wouldn't if they were more uncommon?


Your objections to MongoDB's model seem reasonable, but I don't see any evidence in either this comment or the linked blog post that DocumentDB is better (especially in the absence of benchmarks). What is this "battle-tested distributed systems framework"? Several of your complaints about MongoDB have to do with the interaction between persistence to disk and replication to the network; as the Multi-Paxos algorithm does not specify when data should be written to disk (much less what the format should be), what reason is there to believe that DocumentDB does this any better?

I'm totally willing to believe that DocumentDB beats the pants of MongoDB on just about every axis (in fact, that seems pretty likely) but it's going to take some actual numbers and a better description of the internals.


I agree with you - we need numbers before making that kind of conclusion and I haven't run any benchmarks on the public version of DocDB. I'd like to see someone measure MongoDB on Azure vs DocDB on Azure - even then it might not be a fair measurement of db vs. db, since we don't know what machines DocDB is hosted on.

All I can really say is that the replication model provides a significant performance boost over MongoDB in the multiple replica (i.e., production) scenario.

We were using MongoDB at Microsoft for a while (I left MS almost a year ago). I was developing a real-time metrics system with it. It was very unstable at our target load (500k increments per minute, high percentage of tomorrow's documents preallocated the day before). We only managed maybe 10% of that with MongoDB, IIRC. Sometimes it would choke and not come back until I restarted the cluster (~30 machines total, I believe. 3 replicas * 10 shards).

We were so sure that MongoDB should be able to handle this scenario, since they talk about it in their documentation. After talking with the MongoDB devs, we came to the conclusion that even though we were issuing increment operations on preallocated documents, MongoDB was:

a) using a global lock on the "local" db used for replication, and

b) "replicating via disk" instead of via the network. In other words, replication requires writing to the journal journal before other members of the replica set have a chance to apply the change and ack back. This results in a loss of concurrency.

The lack of async query support in the C# driver didn't help either.

Eventually we used a replicated, write-back cache which sits atop the framework DocDB uses. Not a fair comparison, but the goal was achieved easily with 1/3rd the hardware. We just backed it onto Azure Table Storage. Our queries were all range queries, which table storage supports.

I can't talk about the framework, unfortunately.


next time you need fast counters, try hypertable (non-reading increments)


It would still make sense to use the replicated write-back cache to avoid trips to disk. We were considering replacing MongoDB with Cassandra, though.

I wanted to avoid having to deploy and maintain a database system, so using table storage was a solid choice.


Seems like this is pretty comparable to Vagrant using the Docker backend. Anyone have thoughts on pros and cons of each?


[deleted]


The parent didn't say "Vagrant," they said "Vagrant using the Docker backend."


Did you read the whole article? If not, go ahead and read the part about pressuring the intern into giving his assistant oral sex while he (Kennedy) watched. That is absolutely not "normal behavior".


This is standard stuff for people in power, don't you read the news?

This includes sports players and it's not psychopathy that would helps sport players, it's about driven people if anything.

Or are you confusing normal with 21st century, western moral?


I'm honestly a bit scared if you view this as normal. I would really consider seeing a professional, if I were you, just to get those assumptions of yours checked.

They might agree with you, in which case I'm wrong


There's a big difference between thinking something is normal and thinking it is right. Normal != right.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: