Reading On Lisp: Then and Now

pge · on April 12, 2011

The Little Schemer book is also a deceptively deep introduction that is a good preparation for On Lisp. If you work your way through it, you'll really understand functions that make functions, which is a good foundation before diving into On Lisp (and, of course, as a bonus, you'll have written an implementation of the famous Y combinator:)).

rickmode · on April 12, 2011

Having beat my head against Common Lisp, then spent a non-trivial amount of time working in Clojure, I totally agree with his premise.

Learning Lisp requires real, significant use of the language. Eventually you begin to think in the language and that's when profiency begins.

Merely reading about the language is not enough.

I'm sure this applies to learning any new programming language paradigm (as in procedural like C, OO like C++, Java and C#, and functional like CL, Clojure and Erlang) or any natural language for that matter.

[Edit: adding following sidenote.] Reading "On Lisp" was a big part of why I recently bought the $139 Kindle. Reading the "On Lisp" PDF isn't bad with the divice in landscape orientation.

adestefan · on April 12, 2011

> Learning Lisp requires real, significant use of the language. Eventually you begin to think in the language and that's when profiency begins.

This applies to any programming language. It's just more pronounced when switching paradigms. When I was learning python I would write python like it was C. The programs work, but it's not a very good way to write python. Even now when I write something quick in any language it ends up being very C-like since that's the language I use the most.

barrkel · on April 12, 2011

"So yes, reading a bottom-up program requires one to understand all the new operators defined by the author. But this will nearly always be less work than having to understand all the code that would have been required without them."

I think this quote (from On Lisp) is not true in practice, for most people. People find it easier to grasp an instance of a pattern or abstraction they already know, than to grasp a new abstraction altogether. It's only after a certain amount of repetition of a new pattern of instances that the need, and understanding, of the new abstraction is understood. Below a certain multiple, it's better not to use the more compressed encoding.

One measure of how maintainable any given piece of software is might be the average portion of the whole one needs to understand to modify a part. There's a tension between reuse (of all kinds) and modularity; reuse increases dependencies, while modularity bottlenecks them. Modifying something that has a lot of dependencies is a hazardous undertaking, because you need to understand more of the program to ensure you get everything right. A piece of software without such intense reuse, with more redundant encoding, is easier to modify without understanding the whole - and will probably be less work to modify too, as the less connected nature of the dependency graph will reduce the impact area.

ohyes · on April 12, 2011

"People find it easier to grasp an instance of a pattern or abstraction they already know, than to grasp a new abstraction altogether."

But the idea is to build the program up until you are working in a set of target abstractions that fit well with the problem domain. If I don't know the problem domain (or have materials to help me understand it), I have no business writing a program for it.

You are always looking for a high level of cohesion and a low level of coupling. If you already have a natural language that experts talk about the problem in, it is likely that the set of terminology derived from it has these properties. (It is well specified, it is precise, work has been done in it). A trivial example might be the difference between doing statistical work in C vs. statistical work in R. R is pretty clearly better for it.

Modifying something that has a lot of dependencies is only hazardous if what that particular something does is underspecified. If it is a bug in one use of that operator, it should be a bug in every use of that operator. If it isn't, perhaps the operator has special cases, or perhaps you really want two different operators (in either case, it needs greater specification).

YMMV, but in my opinion, one should not be modifying a program that one does not understand (at least mostly). It is easier to modify the 'non bottom up' program without understanding the whole, because generally, you do not have a chance at understanding the whole (not really 'whole', more like 'significant chunk'). You have no other option but to try something and see if it works. It also seems that you would frequently 'miss' changes that need to be propagated throughout the code-base (things that are actually bugs in other places, but perhaps are not manifesting).

barrkel · on April 13, 2011

"working in a set of target abstractions that fit well with the problem domain"

I 100% understand the aspiration; I just don't think it works as well as this in commercial software development, where there are competing concerns, not just idealized software construction. Leaving some of the less-commonly used (domain) abstractions out, and instead building them out of more commonly used (non-domain-specific) abstractions, is what I assert is actually desirable, but Lisp doesn't help you much here, as it's almost too easy to create new abstractions. (I'll go further: I think the principal cause of Lisp's relative lack of success is the ease with which it lets you build your own private language, and the concomitant difficulty in teaching other people that language, when you need to bring new people in.)

"If you already have a natural language that experts talk about the problem in"

Your new hires have a natural language they talk about the domain in - a programming language, like Java or C#. They are usually not subject matter experts when they arrive, and certainly they won't be 100% up to date on the particular dialect chosen for your architecture. That's why it's important to choose the right abstractions, and not to move too close to the problem domain.

"statistical work in R. R is pretty clearly better for it."

With concomitant difficulty in finding experts in that language. Whether it is actually a better choice depends on how often those features are needed, which again, comes back to my point: eliminate less-used abstractions, and build them concretely out of more-often used abstractions. So if you don't use statistics much, but do need to calculate a regression or somesuch, then build a function or class for that specific thing, don't immediately jump to R. Every programmer will be familiar with those; bringing in the domain specificity too early will reduce the degree to which you can leverage human talent, and ultimately reduce the value of the business.

"if what that particular something does is underspecified"

In commercial development, everything is underspecified and underdocumented. This won't change.

"one should not be modifying a program that one does not understand (at least mostly)."

This, frankly, is nonsense, and is the reason I've reacted so much to your comment; it makes me inclined to think you don't have much commercial experience. The best way of understanding a program is by fixing bugs in it, and that will involve modifications to areas of it, well in advance of understanding all or most of it.

ohyes · on April 14, 2011

"I 100% understand the aspiration; I just don't think it works as well as this in commercial software development, where there are competing concerns, not just idealized software construction. Leaving some of the less-commonly used (domain) abstractions out, and instead building them out of more commonly used (non-domain-specific) abstractions, is what I assert is actually desirable, but Lisp doesn't help you much here, as it's almost too easy to create new abstractions. (I'll go further: I think the principal cause of Lisp's relative lack of success is the ease with which it lets you build your own private language, and the concomitant difficulty in teaching other people that language, when you need to bring new people in.)"

I think we agree on everything except for what is desirable. Macros certainly allow you to do some interesting things with control constructs, but I have to disagree that they are really that big of a help in constructing your own private language.

I can imagine doing bottom up programming in the style of lisp in a language like C quite easily. The only difference is you wouldn't get to write your own control structures.

"Your new hires have a natural language they talk about the domain in - a programming language, like Java or C#. They are usually not subject matter experts when they arrive, and certainly they won't be 100% up to date on the particular dialect chosen for your architecture. That's why it's important to choose the right abstractions, and not to move too close to the problem domain."

I guess it depends on your business model. If you are a small shop and you expect to have a high retention rate of a few experts that you pay well, I don't see 'new hire' turnover as a big deal. In addition to that, it doesn't seem to me that anyone really knows how to 'program' once they get out of college, unless they were doing it before college.

Your CS student from college is going to need mentoring, just as your mechanical or civil engineer will need mentoring at the beginning of his or her career. If you create the language in a reasonable way (using a functional approach rather than creating piles of macros). It seems it should be possible to read the program just as any other. As you learn about the problem domain and become an expert in it, the program should become easier to understand.

"In commercial development, everything is underspecified and underdocumented. This won't change."

The specification should be in a constant flux of improvement. The initial specification is bad. The finished product specification should be strong. Iterate and get a lot of feedback and clarify that feedback, and the specification has to improve. If it isn't specified as being the wrong thing to have happen, how can it be a bug?

If I want computer programming to be closer to engineering and further from voodoo, the only thing to do is to have better specifications and reasons for doing things.

"This, frankly, is nonsense, and is the reason I've reacted so much to your comment; it makes me inclined to think you don't have much commercial experience. The best way of understanding a program is by fixing bugs in it, and that will involve modifications to areas of it, well in advance of understanding all or most of it."

I think I communicated badly. I agree completely that fixing bugs is the best way to learn a program. I intended to imply that you HAVE to understand what the code that directly interacts with the bug does, in order to actually fix it. If you don't do that, you have possibly created new bugs.

Tycho · on April 12, 2011

sigh

Still don't quite get what people mean by a 'closure.' Maybe because I've not realy tried hands-on programming with something like Lisp. But the concept seems to oscillate between being duh-obvious, to subtle but comprehensible, to transcedentally complicated depending on who's explaining it or in what context.

falcolas · on April 12, 2011

In Python (yes, the heresy of using Python in a lisp article), you can define a closure this way:

    def foo(x):
        y = x
        def bar():
            print y
        return bar

    >>> z = foo(1)
    >>> z()
    1

The value of y is preserved, even through it is not assigned in the function bar (outside of its lexical scope). You can pass bar around all you want, and it will maintain its value of y. Even if you create another instance of bar, they will retain their individual values of y.

Tycho · on April 12, 2011

so if I did this (following from your example):

  n = 35
  a = foo(n)
  a()
  n = 42
  a()

I would get 35 printed both times, despite n having changed. is that the idea?

falcolas · on April 12, 2011

Language specifics mean that this might not always be the case. i.e.

    >>> s = "%03d"
    >>> f = lambda x: s % x # Identical behavior for a regular function closure
    >>> map(f, range(10))
    ['000', '001', '002', '003', '004', '005', '006', '007', '008', '009']
    >>> s = "%04d"
    >>> map(f, range(10))
    ['0000', '0001', '0002', '0003', '0004', '0005', '0006', '0007', '0008', '0009']

I would try and think of it as more of a function parameter that you pre-fill, and never have to provide again for that instance of the function.

rayiner · on April 12, 2011

So say I have a first-order language (where functions are not values). You can write:

    def foo(x): return x + 1

But not:

    def bar(baz, x): return baz(x+1)

That is to say, you pass integers around ('x' above) as values, but not functions. In a minimal higher order language, like C, you can pass functions around like values. Ie: baz could be a pointer to a function in the second line above, which is called by bar.

Now, what happens when you have nested functions?

    def foo():
        def bar(y):
            return y + 1
        return bar

You can't do this in C, but conceptually that's not much different than just defining foo and bar as two separate top-level functions.

But what happens when the inner function refers to a local variable of the outer function?

    def foo():
        x = 1
        def bar(y):
            return x + 1
        return bar

How does 'bar' know what 'x' is? You might ask, why would you want to do this? Well, say you have a function map(fun, list). Map takes a function and a list. It returns a list, each element of which is the value yielded by calling fun with an element in list. Eg:

    def foo(x): return x + 1
    map(foo, [1, 2, 3]) # -> [2, 3, 4]

Map can be defined as:

    def map(fun, list):
        result = []
        foreach item in list:
            result.append(fun(item))

So back to the example, you might want to write something like:

    def foo(list):
        x = some_calculation()
        def bar(y): return x + y
        return map(bar, list)

In other words, the function that you pass to map needs to know not just the current element of the list, but some other information that is held in a local variable in the parent function. This case is very common when using map as replacement for "for" loops (note how in a loop you can refer to the local variables of your function inside the body of the loop).

Now technically the name of the language feature that lets you capture the value of local variables in child functions is "lexically scoped lambda" or something along those lines. A closure is just the most common way to implement the language feature, but people often just use "closure" to refer to the language feature itself.

In a compiler that uses closures, the example above would be translated into code something like:

    class _closure_bar: 
        this.x = Nil

       def _closure_bar(x):
           this.x = x           

       def operator() (y):
           return this.x + y

    def foo(list):
        x = some_calculation()
        bar = new _closure_bar(x)
        return map(bar, list)

Instead of passing around a function as a function pointer like in C, you pass around a callable object whose members are the local variables of the parent function captured by the inner function, and which when called executes the body of the inner function.

Now, there is some additional complexity which occurs when an inner function modified the value of a local variable of an outer function. Languages like Python just disallow such modification, which is why people complain about Python not having "full lambdas". Languages like Common Lisp and Scheme add another indirection for local variables that are captured + assigned-to in an inner function.

sayrer · on April 12, 2011

JavaScript is a pretty good language to illustrate closures, if you use functions and declare all of your variables at the beginning of each function. :)

In the demo below, the closure we're interested in is the combination of the code inside "munger scope", and the environment it resides in. To do its job, munger must search its lexical environment to find each variable. The lexical environment is illustrated by the boxes. Searching for "foo", it must first check munger scope (local), beta scope, alpha scope, and then finally succeeds with the global scope.

In the code at the bottom, the demo1 and demo2 invocations don't influence each other. That combination of code+environment providing independent values is a closure.

   |-- Global Scope ----------------------------------------------------------------|
   |                                                                                |
   |  var foo = "QUX";                                                              |
   |  var alpha = function () {                                                     |
   |                                                                                |
   |     |-- alpha scope ----------------------------------------------------|      |
   |     |                                                                   |      |
   |     |   var a = "AA";                                                   |      |
   |     |   var beta = function () {                                        |      |
   |     |                                                                   |      |
   |     |      |-- beta scope ---------------------------------------|      |      |
   |     |      |                                                     |      |      |
   |     |      |   var b = "BB";                                     |      |      |
   |     |      |   var munger = function () {                        |      |      |
   |     |      |                                                     |      |      |
   |     |      |      |-- munger scope -----------------------|      |      |      |
   |     |      |      |                                       |      |      |      |
   |     |      |      |    a = a + a;                         |      |      |      |
   |     |      |      |    b = b + b;                         |      |      |      |
   |     |      |      |    return a + " " + b + " " + foo;    |      |      |      |
   |     |      |      |                                       |      |      |      |
   |     |      |      |---------------------------------------|      |      |      |
   |     |      |                                                     |      |      |
   |     |      |   }                                                 |      |      |
   |     |      |                                                     |      |      |
   |     |      |   return munger;                                    |      |      |
   |     |      |                                                     |      |      |
   |     |      |-----------------------------------------------------|      |      |
   |     |                                                                   |      |
   |     |   }                                                               |      |
   |     |                                                                   |      |
   |     |  return beta();                                                   |      |
   |     |                                                                   |      |
   |     |-------------------------------------------------------------------|      |
   |                                                                                |
   |  }                                                                             |
   |                                                                                |
   |  js> var demo1 = alpha();                                                      |
   |  js> var demo2 = alpha();                                                      |
   |  js> demo1()                                                                   |
   |  "AAAA BBBB QUX"                                                               |
   |  js> demo1()                                                                   |
   |  "AAAAAAAA BBBBBBBB QUX"                                                       |
   |  js> demo2()                                                                   |
   |  "AAAA BBBB QUX"                                                               |
   |  js> demo1()                                                                   |
   |  "AAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBB QUX"                                       |
   |                                                                                |
   ----------------------------------------------------------------------------------

Tycho · on April 15, 2011

Ahhh, I see, so not only does the closure scoop up the value of variables at 'define-time', but each closure keeps track of the state of its variables (its 'free variables?') so that their values may increment over time rather than reset.

Thanks for you (rather cool) illustration. I tested the code and it also worked :)

rickmode · on April 12, 2011

Here's yet another stab at explaining closures using C. Hopefully this is clear.

The confusion about closures happens because it doesn't exist in other languages. (There isn't a way to express it.)

Let's look at a simple C function that uses a global variable:

  int a = 1;

  int a_plus_2() {
    return 2 + a;
  }

This always returns 2 more than "a". Initially it will return 3. If "a" is set to 5 at some point in the future, "a_plus_2" will then return 7. Simple. No closure here.

In Lisp, "a" would be called a "dynamic variable" because it changing its value affects all code that uses "a" over time, including usage within other scopes; "a" has a dynamic binding. But Lisp also has static binding which means the value of the variable is captured each time a scope is created and remains static over time within that scope.

Imagine a variation of C where variable make this distinction explicit by prefixing with with <dynamic> and <static>. <dynamic> would be the default and in fact is how the real C behaves. <static> now means Lisp notion of static binding (forget C's notion of static, which is totally different). Let's use it.

  <static> int b = 1;
  
  int b_plus_2() {
    return 2 + b;
  }

So here "b" is marked as static. When "b_plus_2" is defined at compile time, the current value of "b" is captured - statically bound - to its current value of 1. So "b_plus_2" will always return 3 _even if the global variable "b" is later modified.

  b = 5; /* modifying b */
  
  int new_b_plus_2 {
    return 2 + b;
  }

Calling "new_b_plus_2" return 7. No surprise here. The twist is that the original "b_plus_2" still returns 3. The value of "b" is captured in a closure. Each new scope encloses the current value of "b", binding its current value statically (unchanging) within that scope. Closures become interesting when the block in question is a function body.

So, yeah - it's a simple concept in the end. It's tricky because it doesn't exist many other languages. It provides a way to encapsulate state analogous to an object in OO languages [1].

The concept is more useful in Lisp with its higher order functions - functions that define other functions. This is very fluid in Lisp, while Python uses its lambda form for anonymous functions, Ruby has blocks and lambdas and...[2], and Java gets close with inner classes.

I hope this helps.

[1] "Objects are a poor man's closure" and "Closures are a poor man's object" http://stackoverflow.com/questions/2497801/closures-are-poor...

[2] I'm don't do Ruby - but's here's a ton of stuff on closures in Ruby: http://innig.net/software/ruby/closures-in-ruby.rb

Tycho · on April 14, 2011

Thanks very much for that. I get it better now. I think. The SO link helped also, but the Ruby one made my head hurt.

What I'm taking from this so far is that

1. Closures preserve any enclosed variables from the original context where the closure was made (defined), rather than using the context of the call.

2. If you didn't have objects, you could use closures instead to capture the 'state' of different instances

3. With some trickery, you can use closures to introduce lazy evaluation. Which is good for computationally expensive functions that you want to forget about until needed.

I'm not sure if I'm right about those though. And I'd probably steer clear of deliberately exploiting closures cause I get a whiff of 'clever code' from the whole subject