Very interesting, I've been bitten by the same problem with Python once (capturing the iteration variable in a closure). To make things even worse, in Python the behaviour is inconsistent between list comprehensions and generator comprehensions:
In [10]: lambdas = [(lambda: i) for i in xrange(10)]
In [11]: [f() for f in lambdas]
Out[11]: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
In [12]: lambdas_gen = ((lambda: i) for i in xrange(10))
In [13]: [f() for f in lambdas_gen]
Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The problem was not fixed even in Python 3. I don't know if there has been any discussion about it.
You can write something close only with lambda this way:
lambdas = [(lambda j=i: j) for i in range(10)]
The behavior is actually consistent between generators and list comprehensions, but it's giving the same result as a closure because generators are lazily evaluated (so f() is evaluated right on time, but i is still not enclosed), and work only once: running it twice will have the generator exhausted:
In [12]: lambdas_gen = ((lambda: i) for i in xrange(10))
In [13]: [f() for f in lambdas_gen]
Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [14]: [f() for f in lambdas_gen]
Out[14]: []
That's why forcing the generation with list() causes evaluation of the generator, and only then do you evaluate the f()s, hence the same result as the list comprehension case.
Again, changing it to the following properly encloses i:
lambdas_genlist = list((lambda j=i: j) for i in range(10))
> This is because the following is simply not a closure in Python
Well, they should be closures, or they shouldn't be there.
This isn't a question of not understanding Python's semantic rules, it's a question of those rules being screwed. I understand why it's not consistent with generators (as you say - i isn't generated yet). I don't understand why it's not consistent with what you'd expect, namely lambdas not being closures.
It's an even weirder gotcha than:
def f(x = []):
x.append(1)
return x
and we know how many people get hit with that one ;)
Actually, you know, I lied (for the sake of simplicity). They are closures, else how would the lambda evaluate 'i'? The difference is in the binding.
How closures work depend wildly on the language. With lexical closures it all comes down to how scopes are handled [0] and how and when variable binding is done [1] (notably §8). The fact that 'i' can be either bound late (giving the 'outer scope' effect) or bound early (giving the 'inner scope closure' you expect) is actually a quite useful feature (and I assure you both cases are equally useful), although admittedly a bit surprising when coming from other languages.
Default argument value evaluation is a nice gotcha, but it's a trade-off I'm more than willing to accept [2].
Anyway I would definitely not qualify this as 'screwed'.
The inconsistency is because, in the generator-expression case, the calls to f() are being interleaved with the iterations of the generator (so the closed-over variable has the 'correct' value when f() is called). If you change this by running the generator to completion first, the behavior is the same as the list case:
In [1]: lambdas_listgen = list(((lambda: i) for i in range(10)))
In [2]: [f() for f in lambdas_listgen]
Out[2]: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
It's important to note that scoping on Python for loop variables does not behave the same way it does for a C# foreach.
While it's possible to write this in Python:
for i in [1, 2, 3, 4]:
pass
final_value = i
The same in C# is not possible, i.e.:
int[] values = {1, 2, 3, 4};
foreach(int i in values) {
}
var final_value = i; // there is no i here!
It makes no sense to "fix" this in Python because the loop variable is created in the scope outside of the for loop. It seems to make sense in the case of the C# foreach (but not the C# for!) because that variable is inaccessible outside of the foreach loop scope anyway. I would still argue introducing inconsistent behaviour between for and foreach as they are doing in C# 5 is just going to further obscure this problem and not really eliminate it.
Anyway, as far as Python is concerned, closures close over variables not values. Creating special cases where this is not the case is bound to generate even greater confusion.
IMO, the problem here is not the closure but how i is updated instead of being a new variable at each iteration. I guess it's like that for the sake of performance, but if i was immutable, that wouldn't happen. I.e. i would be a whole new variable at each iteration.
Here's another example:
in: lambdas = [(lambda i: lambda: i)(i) for i in xrange(10)]
in: [f() for f in lambdas]
out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Why is it a different i? It's misleading because sometime we have a reference and sometime we've got a whole new variable.
Exactly, it desugars to something like a for loop, in fact the iteration variable remains visible in the outer scope (other possible source of confusion).
I believe it doesn't desugar to a map+lambda for performance reasons.
It is discouraging to see them make the same mistake that early Lisps did, the solution to which was well understood in the Scheme community since 1975.
1958 - John McCarthy and Paul Graham invent LISP. Due to high costs caused by a post-war depletion of the strategic parentheses reserve LISP never becomes popular[1]. In spite of its lack of popularity, LISP (now "Lisp" or sometimes "Arc") remains an influential language in "key algorithmic techniques such as recursion and condescension.[2]"
In fairness, they stumbled into it more than they deliberately perpetrated the error. In the beginning, when the rules for foreach loops were laid out, C# didn't have closures so it didn't really make a difference.
It's really only a breaking change if you designed your program to have the closure from inside the loop purposely use the last element of the collection outside the loop. Who would do this? I've hit this issue lots of times as have others, and it's a bug that is fixed by copying the iterator inside the loop. It's nice I won't need to worry about that anymore.
Yep, that's why it's a breaking change (or why breaking changes are bad). On the other hand, if they're using the C# 5 compiler to output C# 4 projects then maybe there will be a new warning (which, of course, is another breaking change if you treat warnings as errors).
I've mentioned this elsewhere but I think the following needs to be emphasised: "The "for" loop will not be changed."[1]
I would argue it's a terrible idea to break consistency between how this is handled by the for and the foreach loop. The for loop can't be changed because the initialiser expression is general and not restricted to variables. I expect the result will be many bemused developers surprised closures behave one way in a for and another way in a foreach.
The end result is that the problem hasn't really been eliminated, just made even more obscure. My personal preference would be for consistency; one can still achieve the desired effect simply by explicitly defining a variable inside the loop scope for the closure to close over.
Closures behave the same way in either case. The closure semantics in C#5 will be unchanged.
The difference is that the for loop's variable is scoped to the entire loop, whereas the foreach loop's variable is only scoped to the embedded statement (i.e., an iteration through the loop). That difference is perfectly justifiable.
In the semantics of a for loop, the loop variable is very clearly (and necessarily) a persistent value which can be changed according to rules described in the loop statement, by code within its embedded statement, or both.
In the semantics of a foreach loop, on the other hand, the implication is that you're successively retrieving otherwise independent values from a collection and operating on each one in succession. There is no reason to need or expect that the variable will have a scope that extends outside the embedded statement. It is simply replaced on each iteration through the loop. The loop statement doesn't offer any opportunity to perform any logic on the variable in between iterations, since all you can do between the parentheses is bind a variable. You cannot use a varaible from the outer scope as the loop variable, and the loop variable goes out of scope as soon as the loop exits.
In fact, there is really only one place where this scoping difference is visible: When the loop contains a closure. In that case, the semantics that C#5 will use are indisputably superior. Even in cases where that sort of behavior is desired (which is exceedingly rare), it is much better to require that the behavior be accomplished by using a variable from the outer scope, for the sake of readability. Being a language in which it is easy to write obfuscated code was never high in C#'s priority list.
If the cost of doing it this way is that there is a(nother) difference between the semantics of the for-loop and the foreach-loop, I have no problem with that. I can see no demonstrable need for consistency there; it is a foolish consistency.
I never implied closure behaviour would change, only the variable scoping is inconsistent. I agree it makes sense, but it's still going to cause surprises.