As someone whose last 8 years of professional work have been largely split between...C# and Ruby, I have to disagree. I do agree about C# being quite nice; it's all the things Java should be but isn't. And it's evolving quickly in all the right ways.
But it isn't really like Ruby with static typing. The language isn't Rubyish (no mixins or blocks, for example). It makes you put everything in a class, a la Java. It can verbose and is occasionally downright clunky (though syntactically it's categorically slicker than Java). The .NET ecosystem doesn't have the Ruby characteristic of lots of small, fast-evolving libraries that are easy to use. In fact, the C# open source ecosystem is kinda poor in general and not a huge part of most developer's lives, whereas Ruby's ecosystem is vibrant and an integral part of its coding culture.
Another way to put all that is that if C# were purely dynamically typed, it wouldn't feel anything like Ruby.
I do see what you're saying: LINQ feels like a static (and lazy!) version of Ruby's Enumerable module, the lambdas look similar, C# actually does have optional dynamic typing, and C# is increasingly full of nice developer-friendly features. In general, I'm a fan. But switching between them doesn't feel like just a static/dynamic change.
Yeah, I mostly of agree with that and sort of said so in my post. Not sure it has much impact on my central point though.
One really key difference with LINQ is that it doesn't produce arrays (or dictionaries, as in your example); it produces Enumerators, which you then have to do call toList() or toDictionary() on. That laziness is actually an awesome feature and my favorite thing about LINQ, because it can massively improve performance by shortcutting work and not creating intermediate arrays. You can even work on infinite sequences with it. Besides performance, it's just tastier. It's so great I actually wrote a Ruby library to imitate it: https://github.com/icambron/lazer
Is LINQ really fast/performant though? Wouldn't the above expression cause three sequential loops to run?
One of the biggest performance issues I've seen with modern .NET code is people abusing LINQ and lambdas. Chaining functions like this is most decidedly not fast. I once wrote a library that had do do some heavy signal processing on large data sets, and since I wanted to ship the first version as soon as possible, I just used LINQ in a lot of functions to save time. It wasn't very performant so later I rewrote most of the functions to use standard native code such as loops for iteration, hashmaps for caching and all sorts of improvements like that. I completely got rid of LINQ in that version and for many functions the runtime went down from something like 500ms-1000ms to microsecond area.
So sure, LINQ makes development fast and it's very nice to be able to write code such as .Skip(10).Take(50).Where(x => ...). On most web projects, it won't make a huge difference. I've seen Rails "developers" use ActiveRecord in such a way that they would create double and triple nested loops and then hit the database multiple times by using enumerable functions on ActiveRecord objects without realizing how this works, what's going on behind the curtains and so on. I've seen .NET devs do similar things using EntityFramework.
So yeah, it's convenient and all, but it can also be very dangerous when used by someone who doesn't understand the fundamentals behind these principles.
> Wouldn't the above expression cause three sequential loops to run?
No, it wouldn't; that's the really important point about LINQ I was, clumsily, trying to express above [1]. Take this admittedly totally contrived example:
someList
.Where(i => i % 2 == 0)
.Select(i => i + 7)
.Take(5)
This is not equivalent to a bunch of sequential loops. What it is is a bunch nested Enumerators. Here's how it works. It gets the list's Enumerator, which is an interface that has a MoveNext() method and a Current property. In this case, MoveNext() just retrieves the next element of the list. Then Where() call wraps that enumerator with another enumerator [2], but this time its implementation of MoveNext() calls the wrapped MovedNext() until it finds a number divisible by 2, and then sets its Current property to that. That enumerator is wrapped with one whose MoveNext() calls underlying.MoveNext() and sets Current to underlying.Current + 7. Take just sets Current to null after 5 underlying MoveNext() calls.
So all that returns an enumerable, so as written above, it actually hasn't done any real work yet. It's just wrapped some stuff in some other stuff.
Once we walk the enumerable--either by putting a foreach around it or by calling ToList() on it--we start processing list elements. But they come through one at a time as these MoveNext() calls bring through the list items; think of them as working from the inside out, with each MoveNext() call asking for one item, however that layer of the onion has defined "one item". The item is pulled up through the chain, only "leaving" the original list when it's needed. The entire list is traversed at most once, and in our example, possibly far less: the Take(5) stops calling MoveNext() after it's received 5 values, so we stop processing the list after that happens. If someList were the list of natural numbers, we'd only read the first 10 values from the list.
Now, those nested Enumerator calls aren't completely free, but they're not bad either, and you certainly shouldn't be seeing a one second vs microseconds difference. If you craft the chain correctly, it's functionally equivalent to having all of the right short circuitry in the manual for-loop version, and obviously it's way nicer.
So why are you seeing such poor perf on your LINQ chains? Hard to say without looking at them, but a few of pointers are: (1) Never call ToList() or ToDictionary() until the end of your chain. Or anything else that would prematurely "eat" the enumerable. (2) Order the chain so that filters that eliminate the most items go at the end of the chain, similar to how you'd put their equivalent if (...) continue; checks at the beginning of your loop body. (3) Just be cognizant of how LINQ chains actually work.
[1] In the example in the parent, FindAll isn't actually a LINQ method, so there is one extra loop in there. Always use Where() if you're chaining; use FindAll() when you want a simple List -> List transformation.
[2] A detail elided here: each level actually returns an Enumerable and the layer wrapping it does a GetEnumerator() call on that.
Thank you for this awesome explanation, it really clarifies how Enumerable methods work. I guess this is the real reason behind deferred execution. Still, I can't help thinking there is a big overhead involved with using such methods. We did some benchmarks in the past and the code that does the same thing manually always ended up being much faster.
The nice thing about Enumerable methods is that they can significantly speed up development and most projects won't suffer for it. However, for speed critical code it's probably not the best tool in the box.
But it isn't really like Ruby with static typing. The language isn't Rubyish (no mixins or blocks, for example). It makes you put everything in a class, a la Java. It can verbose and is occasionally downright clunky (though syntactically it's categorically slicker than Java). The .NET ecosystem doesn't have the Ruby characteristic of lots of small, fast-evolving libraries that are easy to use. In fact, the C# open source ecosystem is kinda poor in general and not a huge part of most developer's lives, whereas Ruby's ecosystem is vibrant and an integral part of its coding culture.
Another way to put all that is that if C# were purely dynamically typed, it wouldn't feel anything like Ruby.
I do see what you're saying: LINQ feels like a static (and lazy!) version of Ruby's Enumerable module, the lambdas look similar, C# actually does have optional dynamic typing, and C# is increasingly full of nice developer-friendly features. In general, I'm a fan. But switching between them doesn't feel like just a static/dynamic change.