Before we go too deep into Zed-hating, I’ve got to say I am psyched for Mongrel2, and I don't even have a particular use in mind for it.
Aside from the fact that it'll undoubtedly be a great piece of software, it's informative and interesting to follow along with his progress on his blog and on his fossil repo.
And if what he promises in terms if features and performance comes to fruition, it'll be a fscking awesome server/gift to hack on and hack with.
Zed, do you think you could have written Mongrel2 in C, whilst doing a good job, without having already had the experience with Mongrel v1? What I'm trying to get at is our we just seeing another example of scripting languages being good for prototyping, but getting better raw performance(memory footprint,/speed/disk usage) from a lower level language? Or do you think you could have ended up with a decent architecture on a first go?
I'm not Zed, but I have taken this approach several times (Primarily Ruby->Objective-C, but also Ruby->C). I often find it helpful to validate an idea in code quickly, and rewrite for speed/scalabilty/platform integration/etc once I'm happy with a design.
Yes, actually I do this all the time. While I think the total effort of a web server is about the same in most languages, an initial prototype is way easier in a language like Python or Ruby. That's mostly because there's much less ceremony involved and all the bits you need are right there.
As I work on the prototype though, I start to figure out if the language I decided at first will work well for it or not. Since I usually throw my first prototype out, with a fixed time to get it working, I'll end up picking a different language or not depending on my experience so far.
I'm not Zed either, but it's worth noting that you can usually do this without having to throw away the higher-level language entirely - most scripting languages have C APIs so that you can profile and move the expensive parts to C. Sometimes it's necessary to change the architecture of the entire project, but more localized changes are often good enough.
Lua and Tcl were designed with this approach in mind (they expect to be embedded), but you can do it with Python, Ruby, Lisp, Erlang, etc. too.
Can you describe your experience trying to port whichever method you were using for strings to bstring approach ? I'm curious as to what you thought about it.
It sucked, and I should have just started using bstring right away, but I was going with the inspiration at the time and wanted to get something working to see if the idea of HTTP->0mq was viable. As you can see, it didn't take very long to switch at that stage, just a couple days. Now if I'd waited it would have been a total nightmare and probably impossible.
First up, I had some unit tests, which are essential if you want to be able to pivot like this suddenly. Second, I knew a lot about the tools I could use so I had a clear idea of how to attack the problem. Finally, it's just a matter of picking the smallest little piece, converting it, and then stepping through all the parts that piece touches. Kind of like spreading a virus.
"Despite all the evidence that C is quite alright for many applications, and that no, the world doesn't end because of its use"
The world only doesn't end because people are so used to the terrible security record of C apps that they don't even blink when new remote code execution vulnerabilities are found daily. Perhaps someday we'll realize it doesn't have to be this way.
Also, of course the Boehm GC isn't very good. It's ancient and in no way representative of the latest technology in garbage collection. GC can only really be good when it's integrated with the language. A precise generational collector would perform far better but wouldn't be so easy to drop into a C codebase.
Indeed. What Zed demonstrates by writing stuff in C is that he is good at cutting off his nose to spite his face. The reason mongrel2 works so well is because it has made good design decisions, not because it's written in C. I've played with similarly-designed servers written in Perl (Coro/EV), and they are so fast that the disk is the limiting factor for file serving (and use so few resources on top of your web app that you don't even know you have a server). Good design, not language choice, is what makes for good software.
(In other words, the conclusion "C is fast and good for apps" is not what you should reach from watching the mongrel2 development. "You can use any language if you have good design and a clue," is.)
I would have picked Haskell. Then I could drop to C when needed, but probably not ever need to. But then what would there have been to blog about? :)
You're right, it doesn't matter too much what language you do a piece of software in, it's more the design. BUT! When the language tries to make sockets way more complicated than needed, screws with memory management to the point of uselessness, or cram its little version of "stateless" false dogma down your throat, it sure doesn't make it easy.
Alright then, if you'd do it in Haskell, why haven't you? I spent about 2-3 weeks so far on Mongrel2 and it already does quite a lot. I figure, hey, Haskell's supposedly so much more bad ass, let's see you get it done. In fact, I'll be you don't even use any of these Haskell web servers as your main web server:
Thus my point. Language fetishists become distracted by the hype and dogma of their favorite language tribe and then become blind to the evidence. Evidence so far says C makes good web servers. Web servers in other languages tend to need to use C. Lots of great software powers the world and is written in C. Using C does not cause the death of the universe. It isn't hard, or scary, it's actually pretty simple.
After a while, you have to start wondering, is it true that the language is so dangerous, or is it just more marketing?
It's not more marketing, Zed, it's years and years and years of evidence that poorly-managed memory is a major source of error in design, and folks in the application development and security fields trying to pass along a message to avoid it if possible. State is also a major source of error in design, which is why stateless dogmatists keep hammering on it. You can get good design without them, and some people do. But everyone makes mistakes sometimes. Picking technologies that help you avoid making them isn't marketing, it's just good sense.
People have a bad habit of calling any received wisdom they disagree with "marketing". There are people out there far smarter than I am, and possibly smarter than you are (though I'm not in a position to judge), who disagree strongly with what you say, and they're not marketeers. No need to go down that route in this debate.
Bad programmers are the cause of everything you said, not the tools they use. For example, if you compile a language to machine code and use a linker, then I can inject memory errors and violate your view of state. Unless you have link time assertions, your carefully crafted "safe state free" languages aren't.
"State is also a major source of error" is an unsupported claim. Show me any usability study that confirms removing state improves a programmer's source and understanding of the solution.
The truth is, all of the concepts of state free, type systems, assertions, and design by contract are just specific limited cases of a more flexible assertion system. Type systems are a compile time assertion that you are getting a type you expect. Design By Contract are runtime assertions about state of the system. So is assert in C. Removal of state like in Haskell is just a very restrictive design by contract alternative compile time assertion.
If all of these are specific cases of a general assertion system, then none of them cover the complete set of assertions you'd need to truly make "safer" software. You'd need link time assertions (dll hashes and version required, function prototypes required), compile time assertions more flexible than types, and runtime assertions like design by contract.
Unless your language has all of those in an optional way (which would be a pain in the ass to use) then your language is truly no better than C.
I agree that not all of the things you list necessarily reduce errors, or that those that do increase productivity over the long run, because you sort of straw-manned me a little bit there. Me, I don't love strong type systems. But garbage collection doesn't fall into that category.
As for state, the concerns there are confined primarily to concurrent systems, and about a gazillion people have discussed its problems with greater eloquence than I can muster (Joshua Bloch comes to mind). But you're a great programmer--no sarcasm intended--and if you say you can produce a great concurrent system and still carry around a ton of state then I believe you. I'm genuinely looking forward to the fruits of your labor. Most people, myself included, can't.
The goal isn't to twist yourself or your users into a pretzel trying to build a language that prevents any error whatsoever from occurring, as far as I'm concerned. The goal is to produce a language that shrinks the potential for bad stuff on the downside and encourages good practices (and stays out of your way) on the upside. C did, in its day, accomplish both things admirably, and for some tasks it remains the best tool. It does not, however, follow that all of its shortcomings are in fact virtues, and that those who say otherwise are marketing.
I'll give you a hint about how it's done: Finite State Machines. They're provable, simple, and are designed to handle state. Removing state isn't the answer. Giving programmers great FSM tools is (like in Erlang).
You are confusing mutable state and state. Of course you need state. What you don't need is randomly overwriting blocks of memory when your program makes a state transition.
Instead, you want more structured state transitions. A FSM is one way of getting that.
Not really a fair comparison. From 1972 until 2000, C was the most performance/expression balance that industry could bear. Lisp sacrificed raw computational performance for increased readability and writability, but that's a tradeoff that was unpopular in industry. Hence, it was never a viable competitor. (I like it, but I am not industry.)
A fair comparison is C to C++ or Java to C#. C has been around longer than C++, and there are more C apps than C++ apps on UNIX. (Windows is another story; MS pushed C++ pretty hard back in the day. Same for Obj-C on Mac.)
Java has been around longer than C#, and there is more stuff written in it, even though it is technically inferior. Time always beats language features.
If you want an accurate measurement, start a totally new project today that depends on nothing. Clone yourself. Then start writing it in C, and in $some_other_language. Then see which one meets your expectations soonest. It will probably be $some_other_language. But add in some dependencies on historical code, and C becomes competitive again.
Standing on the shoulders of a giant is faster, in the short term, than becoming a taller giant.
I suspect that technical matters such as deployment and integration with other components is of greater concern (though I also suspect that the kind of language fetishism which looks balefully on the use of anything other than C as a system programming language has some effect).
Honestly, it comes down to other languages not understanding how sockets, pointers, and memory really work. When you do servers like this, that's all you're dealing with (plus parsing). In other languages they try to hide these things away from you, or just screw up the concepts making them no easier to use.
For example, if you want to parse an HTTP header in Python you have to iterate through either integers, or break it up into smaller strings, or use the really bad tokenizer library. Most languages with immutable strings make fast parsing and "copy on write" semantics difficult, so they're already harder to work with.
Another example is how they screw up sockets. In C sockets are simple, requiring only a few function calls to work with, and then a bunch of error conditions. Everything you get out of them is bytes put into a buffer of your choice. In other languages you have to work with some badly designed OOP layer that makes no sense, exception classes for each errno, randomly sized buffers with no Chord data structure, or an event system like Twisted that gives no ring buffer or state machine functionality.
In the end, implementing the solid core of a web server is just easier in C. I wouldn't want to implement a full blow web app in it, because doing something like a template system is murder (dynamic languages win hands down there). But the core is much easier.
While in general your criticism of network libraries is generally sound, I believe Haskell gives you more or less the same API as C sockets if you want it.
Boehm's a very good garbage collector, given this:
> A precise generational collector would perform far better but wouldn't be so easy to drop into a C codebase.
It's that last part that's key. Without language support at even the basic level, you won't get much better.
As for this "precise generational collector", I doubt it. Like everything, saying you need a GC all the time to be safe is just more boogey man fear mongering. You don't need a GC. They're nice yeah. But if you know the application very well then memory management isn't too big a deal.
GC isn't required to be safe, and it doesn't guarantee safety, but it does makes safety a whole lot more likely.
Boehm is the best garbage collector given its constraints, but that doesn't make it a good garbage collector overall, or even one worth using. I agree that good GC requires language support; what I disagree with is the attitude that people should use C anyway and forget GC because Boehm isn't good.
> But if you know the application very well then memory management isn't too big a deal.
Agreed.
A related case is with Objective-C's memory management scheme which isn't a full GC, but lets you scope your memory allocations. It gets you a level of control that is somewhere between a GC and the kind that boost::shared_ptr does ... neither having the "wait for the trashman" problem that GC brings, nor overdoing the reference count touching like boost can get you into.
I had to smile, when I read the head line and later the article. During my first year of university, I wrote a verb conjugation engine and a very simple webserver for the UI (I wanted cross-plattform compatibility and it was easier than learning GTK+).
I, too, choose bstring for strings and used valgrind a lot. It's nice to see I'm not the only one writing C code this way. (Although I don't write much C anymore.)
> stupid J2ME clients that send chunked-encoding when they aren't supposed to (it's for servers morons).
Sorry, Mr. Shaw, but you are wrong (but I still like you and I'm looking forward to Mongrel2).
RFC 2616 is careful to say 'server' when it means servers and 'client' when it means clients. Section 3.6.1 specifically states that HTTP/1.1 applications (so both clients and servers!) MUST support chunked encoding.
The developers of that Java HTTP client (Jakarta?) seem to have read the spec more thoroughly than you did. :-)
Furthermore, Chunked to the client is having a renaissance at the moment thanks to the BigPipe ideas coming out of Facebook. Speaking of which, Mongrel2 will be a tremendous server for apps that need this optimization.
No, this is chunked from the client, not to the client. To the client is quite alright and works great. From the client is absolutely retarded and total abuse of the protocol.
Which brings up another point: HTTP is like the old testament. Wanna kill your brother? Probably find something to support it. Wanna sacrifice a goat? Yep, you could do that. Don't want people to sacrifice goats? Sure, rock on. Want adultery? You could probably find that too. It's got everything depending on what you read, where you read, and what context you take it in.
The HTTP RFC is like that. Want to send chunked encoding mixed inside a keep-alive with request pipelining when you don't really need to? Rock on. Want google web accelerator to hit links without you asking it to (in clear violation of every other part of the browser standard)? Go for it. Don't want people to do that? Cool there's another part.
After a while, you have to go with what's most probable, and just be explicit in your own design about the rest.
"Memory is predictable, and with Valgrind not a danger."
I find that statement a tad too optimistic. If you use Valgrind continually (and people really should), you do catch memory misuse in normal situations. It does not say much about how well programs function with 'abnormal' inputs (accidents, exploits).
Given that bugs do happen, even to the best of programmers, would you rather have a buffer overflow or an exception? It depends on the application and context, but it is harder to answer (and get it right) than the author pretends.
Well, your logic applies to every language, no matter what, always. The only difference is the types of errors it can't handle and what tools you need to avoid them.
Of course, except that many runtimes/libraries handle tripping over array boundaries gracefully, rather than overwriting memory. And some languages avoid null/wild pointers completely.
Your programs will still be suspectible to buffer overflows, invalid dereferencing, etc. While, say, Haskell programmers do not have the same problems. Valgrind is no substitute to runtime checks and a proper type system.
Your runtime checks and proper type system and at dynamic linking. That means you don't have them, and they aren't a complete total truth of any kind at all. They're just a highly probable enhancement.
Aside from the fact that it'll undoubtedly be a great piece of software, it's informative and interesting to follow along with his progress on his blog and on his fossil repo.
And if what he promises in terms if features and performance comes to fruition, it'll be a fscking awesome server/gift to hack on and hack with.
Cheers, Zed. Keep it up!