C# to take a breaking change in version 5

ot · on Jan 18, 2012

Very interesting, I've been bitten by the same problem with Python once (capturing the iteration variable in a closure). To make things even worse, in Python the behaviour is inconsistent between list comprehensions and generator comprehensions:

  In [10]: lambdas = [(lambda: i) for i in xrange(10)]
  In [11]: [f() for f in lambdas]
  Out[11]: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]
  
  In [12]: lambdas_gen = ((lambda: i) for i in xrange(10))
  In [13]: [f() for f in lambdas_gen]
  Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

The problem was not fixed even in Python 3. I don't know if there has been any discussion about it.

lloeki · on Jan 18, 2012

This is because the following is simply not a closure in Python:

    lambdas = [(lambda: i) for i in xrange(10)]

This works, because it is a proper closure:

    def build_closure(i):
        return (lambda: i)

    lambdas = [build_closure(i) for i in range(10)]
    print("%s" % [f() for f in lambdas])

See http://stackoverflow.com/questions/233673/lexical-closures-i...

You can write something close only with lambda this way:

    lambdas = [(lambda j=i: j) for i in range(10)]

The behavior is actually consistent between generators and list comprehensions, but it's giving the same result as a closure because generators are lazily evaluated (so f() is evaluated right on time, but i is still not enclosed), and work only once: running it twice will have the generator exhausted:

  In [12]: lambdas_gen = ((lambda: i) for i in xrange(10))
  In [13]: [f() for f in lambdas_gen]
  Out[13]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
  In [14]: [f() for f in lambdas_gen]
  Out[14]: []

That's why forcing the generation with list() causes evaluation of the generator, and only then do you evaluate the f()s, hence the same result as the list comprehension case.

Again, changing it to the following properly encloses i:

    lambdas_genlist = list((lambda j=i: j) for i in range(10))

wisty · on Jan 18, 2012

> This is because the following is simply not a closure in Python

Well, they should be closures, or they shouldn't be there.

This isn't a question of not understanding Python's semantic rules, it's a question of those rules being screwed. I understand why it's not consistent with generators (as you say - i isn't generated yet). I don't understand why it's not consistent with what you'd expect, namely lambdas not being closures.

It's an even weirder gotcha than:

    def f(x = []):
        x.append(1)
        return x

and we know how many people get hit with that one ;)

lloeki · on Jan 18, 2012

Actually, you know, I lied (for the sake of simplicity). They are closures, else how would the lambda evaluate 'i'? The difference is in the binding.

How closures work depend wildly on the language. With lexical closures it all comes down to how scopes are handled [0] and how and when variable binding is done [1] (notably §8). The fact that 'i' can be either bound late (giving the 'outer scope' effect) or bound early (giving the 'inner scope closure' you expect) is actually a quite useful feature (and I assure you both cases are equally useful), although admittedly a bit surprising when coming from other languages.

Default argument value evaluation is a nice gotcha, but it's a trade-off I'm more than willing to accept [2].

Anyway I would definitely not qualify this as 'screwed'.

[0] http://stackoverflow.com/a/292502/368409

[1] http://docs.python.org/reference/executionmodel.html

[2] http://stackoverflow.com/a/1651284/368409

obfuscate · on Jan 18, 2012

The inconsistency is because, in the generator-expression case, the calls to f() are being interleaved with the iterations of the generator (so the closed-over variable has the 'correct' value when f() is called). If you change this by running the generator to completion first, the behavior is the same as the list case:

  In [1]: lambdas_listgen = list(((lambda: i) for i in range(10)))
  In [2]: [f() for f in lambdas_listgen]
  Out[2]: [9, 9, 9, 9, 9, 9, 9, 9, 9, 9]

tel · on Jan 18, 2012

It's still really complicated semantics.

baq · on Jan 18, 2012

it's only complicated if you expect something which isn't there. it's a gotchya, true.

obfuscate · on Jan 18, 2012

It's simple semantics with confusing behavior.

hetman · on Jan 18, 2012

It's important to note that scoping on Python for loop variables does not behave the same way it does for a C# foreach.

While it's possible to write this in Python:

    for i in [1, 2, 3, 4]:
        pass
    final_value = i

The same in C# is not possible, i.e.:

    int[] values = {1, 2, 3, 4};
    foreach(int i in values) {
    }
    var final_value = i;       // there is no i here!

It makes no sense to "fix" this in Python because the loop variable is created in the scope outside of the for loop. It seems to make sense in the case of the C# foreach (but not the C# for!) because that variable is inaccessible outside of the foreach loop scope anyway. I would still argue introducing inconsistent behaviour between for and foreach as they are doing in C# 5 is just going to further obscure this problem and not really eliminate it.

Anyway, as far as Python is concerned, closures close over variables not values. Creating special cases where this is not the case is bound to generate even greater confusion.

phzbOx · on Jan 18, 2012

IMO, the problem here is not the closure but how i is updated instead of being a new variable at each iteration. I guess it's like that for the sake of performance, but if i was immutable, that wouldn't happen. I.e. i would be a whole new variable at each iteration.

Here's another example:

  in: lambdas = [(lambda i: lambda: i)(i) for i in xrange(10)]
  in: [f() for f in lambdas]
  out: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Why is it a different i? It's misleading because sometime we have a reference and sometime we've got a whole new variable.

sbi · on Jan 18, 2012

Thanks for pointing that out. I would have guessed that the original form would desugar to

  map(lambda i: (lambda: i), xrange(10))

but apparently it doesn't.

ot · on Jan 18, 2012

Exactly, it desugars to something like a for loop, in fact the iteration variable remains visible in the outer scope (other possible source of confusion).

I believe it doesn't desugar to a map+lambda for performance reasons.

baq · on Jan 18, 2012

if it desugared to map, you could redefine map() and unleash all kinds of hell on yourself inadvertently.

MartinCron · on Jan 18, 2012

It is encouraging to see the C# team learn from real-world scenarios and improve the language.

tlb · on Jan 18, 2012

It is discouraging to see them make the same mistake that early Lisps did, the solution to which was well understood in the Scheme community since 1975.

solutionyogi · on Jan 18, 2012

Well, they are human after all.

Overall, I feel that C# team has definitely achieved their goal of being a 'Pit of Success'. http://www.codinghorror.com/blog/2007/08/falling-into-the-pi...

E.g. Order of evaluation is strictly defined as left to right in C#.

http://blogs.msdn.com/b/ericlippert/archive/2008/05/23/prece...

Disallowing use of uninitialized local variable.

http://stackoverflow.com/questions/7797424/why-arent-unassig...

C# warns you if you do

if (i = 1) //notice the '=' instead of '==' {

}

warning CS0665: Assignment in conditional expression is always constant; did you mean to use == instead of = ?

Obviously, they failed in the case of 'foreach' but I feel great knowing that they will be fixing it in the next release.

skrebbel · on Jan 18, 2012

It's lovely how you confirm the common prejudice:

1958 - John McCarthy and Paul Graham invent LISP. Due to high costs caused by a post-war depletion of the strategic parentheses reserve LISP never becomes popular[1]. In spite of its lack of popularity, LISP (now "Lisp" or sometimes "Arc") remains an influential language in "key algorithmic techniques such as recursion and condescension.[2]"

(from http://james-iry.blogspot.com/2009/05/brief-incomplete-and-m...)

bunderbunder · on Jan 18, 2012

In fairness, they stumbled into it more than they deliberately perpetrated the error. In the beginning, when the rules for foreach loops were laid out, C# didn't have closures so it didn't really make a difference.

pixie_ · on Jan 18, 2012

It's really only a breaking change if you designed your program to have the closure from inside the loop purposely use the last element of the collection outside the loop. Who would do this? I've hit this issue lots of times as have others, and it's a bug that is fixed by copying the iterator inside the loop. It's nice I won't need to worry about that anymore.

solutionyogi · on Jan 18, 2012

Exactly. Their rational was that there won't be that many people using the weird existing behavior:

More discussion on Eric's blog on this topic:

http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closi...

http://blogs.msdn.com/b/ericlippert/archive/2009/11/16/closi...

bunderbunder · on Jan 18, 2012

The bigger issue will be the confusion when people who are used to the C# 5 semantics work on a C# 4 project.

noblethrasher · on Jan 18, 2012

Yep, that's why it's a breaking change (or why breaking changes are bad). On the other hand, if they're using the C# 5 compiler to output C# 4 projects then maybe there will be a new warning (which, of course, is another breaking change if you treat warnings as errors).

frou_dh · on Jan 18, 2012

Agreed. I inherited some code for a live app at work and this exact issue was causing a bug.

hetman · on Jan 18, 2012

I've mentioned this elsewhere but I think the following needs to be emphasised: "The "for" loop will not be changed."[1]

I would argue it's a terrible idea to break consistency between how this is handled by the for and the foreach loop. The for loop can't be changed because the initialiser expression is general and not restricted to variables. I expect the result will be many bemused developers surprised closures behave one way in a for and another way in a foreach.

The end result is that the problem hasn't really been eliminated, just made even more obscure. My personal preference would be for consistency; one can still achieve the desired effect simply by explicitly defining a variable inside the loop scope for the closure to close over.

[1] http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closi...

bunderbunder · on Jan 18, 2012

Closures behave the same way in either case. The closure semantics in C#5 will be unchanged.

The difference is that the for loop's variable is scoped to the entire loop, whereas the foreach loop's variable is only scoped to the embedded statement (i.e., an iteration through the loop). That difference is perfectly justifiable.

In the semantics of a for loop, the loop variable is very clearly (and necessarily) a persistent value which can be changed according to rules described in the loop statement, by code within its embedded statement, or both.

In the semantics of a foreach loop, on the other hand, the implication is that you're successively retrieving otherwise independent values from a collection and operating on each one in succession. There is no reason to need or expect that the variable will have a scope that extends outside the embedded statement. It is simply replaced on each iteration through the loop. The loop statement doesn't offer any opportunity to perform any logic on the variable in between iterations, since all you can do between the parentheses is bind a variable. You cannot use a varaible from the outer scope as the loop variable, and the loop variable goes out of scope as soon as the loop exits.

In fact, there is really only one place where this scoping difference is visible: When the loop contains a closure. In that case, the semantics that C#5 will use are indisputably superior. Even in cases where that sort of behavior is desired (which is exceedingly rare), it is much better to require that the behavior be accomplished by using a variable from the outer scope, for the sake of readability. Being a language in which it is easy to write obfuscated code was never high in C#'s priority list.

If the cost of doing it this way is that there is a(nother) difference between the semantics of the for-loop and the foreach-loop, I have no problem with that. I can see no demonstrable need for consistency there; it is a foolish consistency.

hetman · on Jan 19, 2012

I never implied closure behaviour would change, only the variable scoping is inconsistent. I agree it makes sense, but it's still going to cause surprises.