Now all you need is to remember what Y is: it's pretty much the Russell's paradox in the shape of a function!
If you represent a set as indicator function, then the Russell's set R = \x. N (x x), where N is logical negation. Now let's ask it whether it's a member of itself, and you get (\x. N (x x)) (\x. N (x x)), which essentially tries to find a fix-point of logical negation (and tragically, diverges in the process). Now abstract N away, and you have the fully general Y you know and love: \f. (\x. f (x x)) (\x. f (x x))
I've seen this trick just this morning at [0], and I am absolutely enchanted with this observation.
This cool observation is generalized in Lawvere's fixed point theorem[1]. It basically says that whenever you have an "interpreter" I : A -> (A -> B) that takes elements of A and turns them into functions A -> B, then if every such function is realized by some element of A (i.e., I is surjective) then any given function f : B -> B has a fixed point. The fixed point can be constructed by (1) letting q = \(a:A), f (I a a), which is a function A -> B, (2) letting p:A be something for which I p = q, then (3) letting s = I p p, which is the fixed point of f.
For sets, the way Russell's paradox appears is this: letting Set be the class of all sets, then every set x determines a predicate, which is a function Set -> Bool that for each x returns true or false depending on whether or not y is an element of x. Have I : Set -> (Set -> Bool) be the function that takes a set and turns it into a predicate: I x = \y, y ∈ x. An axiom of set theory (extensionality) is that I is injective. What if I were also surjective? (This is the non-axiom "unrestricted comprehension", that every predicate determines a set.) Well, then we could apply the fixed point theorem to negation N : Bool -> Bool, but N obviously doesn't have a fixed point! (Using the above notation, q = \y, N (y ∈ y) is the predicate that checks whether a set doesn't contain itself, p = {y | N (y ∈ y)} is the supposed set of all sets that don't contain themselves, and s = (p ∈ p) is the impossible fixed point for N.)
Here's what the theorem says for lambda calculus. Suppose L is the set of all lambda expressions (where equivalent lambda expressions are equal). In lambda calculus, every expression is also a function, in the sense that if E is an expression then \x, E x is equivalent, so we can have an "interpreter" I : E -> (E -> E) be the identity function. The fixed point theorem says that if you have a function f : L -> L, then it has a fixed point. And what is it? Tracing through the construction, we see it's nothing other than \x, (f x x) (f x x)!
(The complexity in stating the theorem precisely is to be able to restrict what we mean by functions A -> B. For the lambda calculus example, we need E -> E to mean just the functions realizable as lambda expressions. This does work out because reflexive objects exist[2].)
(A mistake above: it should be "we see it's nothing other than (\x, f (x x)) (\x, f (x x))")
While we're here, another cool application is that quines exist. I'll give a way that misuses the theorem (though in a correctable way) to derive a quine. Consider the meta-function quote : L -> L that takes a lambda expression and produces a representation of it (like the representation used by the self-interpreter two comments up). I say meta-function because this isn't implemented by a lambda expression itself. Applying the fixed-point theorem to the same I : L -> (L -> L) with quote, if it were an actual lambda expression, we'd get a lambda expression s with s = quote s. That is, the expression s would evaluate to its own representation!
The fixed point is purportedly (\x, quote (x x)) (\x, quote (x x)), which doesn't make sense since quote is not a function. However, suppose q is a lambda expression that takes representations of lambda expressions and quotes those, so it satisfies the equation q (quote x) = quote (quote x) for all lambda expressions x. Also, let app : L -> L -> L be the constructor for application. Then (\x, q (app x (q x))) (quote (\x, q (app x (q x)))) fixes the problems and is a quine:
(\x, q (app x (q x))) (quote (\x, q (app x (q x))))
= q (app (quote (\x, q (app x (q x))))
(q (quote (\x, q (app x (q x))))))
= quote ((\x, q (app x (q x)))
(quote (\x, q (app x (q x)))))
(A way to do this all above board is to use the self-interpreter and somehow use q in the thing we're trying to find a fixed point of.)
It's not quite comparable though, as mine operates on a simple bit stream (akin to stdin), which it needs to tokenize and parse.
Mogensen's one is more like the LISP interpreter that operates on a quoted form of the term it evaluates to.
I'm just curious about something on the theoretical level. I think I can see the beauty of it, but if you take the two extremes:
- a C++ compiler written in C++ in millions of lines of code
- a Perl (PHP, ...) interpreter written in the same language in one line of code (along the lines of `eval $1`)
Given the above the Lisp interpreter written in Lisp is somewhere in the middle and it doesn't really say much about the language. Just that you can use the built-in facilities of the language and write an interpreter that will implement those facilities using... themselves.
In this regard, how is it fundamentally different from `eval $1`?
You've gotten a couple responses making arguments that the atoms required for a language like lisp are fundamentally simpler than eval in Perl and that makes the difference... Fwiw, I don't find these arguments very compelling. It's not so much that they're entirely false, but the "most beautiful program ever written [sic]" that we're talking about here is "beautiful" precisely because it elides all these "simple" details. I do get the vague sense that it is essentially a more verbose `eval $1`.
I agree. You could probably say though, that once you decide to implement anything that's not `eval $1` you jump to the next level that forces you to parse the input and then use your built-in primitives to implement themselves. I think any minimal interpreter for any dynamic language will probably end up being pretty simple, though not necessarily as elegant as Lisp's. But elegance aside, there's nothing groundbreaking here.
It's not different from eval $1 at all. Because it is eval $1.
However, by the time that $1 has been passed, the Lisp zeroes and ones of the eval function have been turned into the appropriate machine language of the CPU.
As of course different CPUs have different evaluation pipelines for integers, floats, vertices, branches etc.
So it evaluates a block of lisp as purely integers, floats, branches, vertices, and sends the appropriate data type off to the appropriate evaluation line, and reassembles it.
While this point of view can be seen as technically valid, I don't see it as relevant because it negates all the specificity of the different paradigms and levels of interpretation. Of course all those runs on the same computer in fine, but how is that what's interesting here? It's a bit like saying that mammals reproduction is not different from plants reproduction because in the end all it is are ADN based cells that work together based on the same chemical mechanisms.
The idea I'm struggling to explain, and probably failing to convey, is that the Lisp REPL does not look like Lisp when it accepts the Lisp.
It will be machine code that accepts unicode strings, parses those unicode strings into Lisp, compiles the Lisp into machine code, then evaluates the machine code.
So essentially it's not the same as passing a Lisp variable to a Lisp function inside a Lisp text file.
One specificity of Lisp however is that the textual representation already looks like the abstract syntax tree that the Lisp variable which is passed to the Lisp function in the text file contains.
But I can easily admit that this fact can also be seen as some sort of illusion since in the end it is just a sequence of bytes.
`eval $1` uses the millions of line you were referring to in the first case.
The few primitives necessary for a metacircular interpreter can be easily implemented from scratch (hence the name "primitives", which clearly does not apply to `eval`).
What matters most is whether the language is homoiconic or not. Lisp, PostScript, Prolog, Rebol, Snobol, and even TCL are, but C and most others aren't. That makes it much easier to implement a metacircular evaluator, because code = data.
TCL is an edge case, since it simply represents everything as text, and its execution model is defined by passing and reevaluating everything as strings.
>In computer programming, homoiconicity (from the Greek words homo- meaning "the same" and icon meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as data using the language, and thus the program's internal representation can be inferred just by reading the program itself. This property is often summarized by saying that the language treats "code as data".
>Languages in which program code is represented as the language's fundamental data type are called 'homoiconic'. Such languages allow code and data to be DeeplyIntertwingled, so that new code can be generated and manipulated by the program itself at runtime. [...]
>Note that HomoiconicLanguages are strongly related to languages with a MetaCircularEvaluator, because it is always easy to make a MetaCircularEvaluator for a homoiconic language, but they are really two different topics. The Lisp example above is not metacircular, but it is homoiconic. You can write (with difficulty) an interpreter for C in C, but it will not make C homoiconic.
>Eliminate this category, mention that meta-circularity is a prerequisite for homoiconicity, but that it doesn't imply homoiconicity. Also, the "strong" vs. "pure" seems to be "real-world implementation" vs. "mathematical ideal". We should mention that (afaik, I could be wrong here) no implemented language is a 'pure homoiconic' language. [...]
>MetaCircularEvaluator (more commonly, "MetaCircularInterpreter"): it is possible to trivially implement a homoiconic language in itself. "Trivially" means that the semantics need not be specified explicitly; instead, they are implemented directly by the language construct being implemented. For instance, Lisp eval might be implemented by calling eval. If you don't already know what eval does, then reading the source code for the MetaCircularInterpreter might not enlighten you, it might just say "eval means eval". Thus "metacircular".
>It is famous that the core of the Lisp language can be written in about 20 lines of Lisp. This is possible because the implementation is a MetaCircularInterpreter. The average person who writes a C compiler or interpreter requires about 20,000 lines of C to do so, and must be (or become) moderately expert about compilers or interpreters. Implementing Lisp in Lisp as a MetaCircularInterpreter teaches one extremely little about compilers/interpreters for other languages.
>A: Tcl is homoiconic because evaluation of data is part of the language: force evaluation of data, and it becomes program, and that's part of the language definition - that's how while works, for instance, in Tcl: it forces evaluation of its first string argument, and if the result is true, then it forces evaluation of its second string argument. Essentially the same is true of Lisp S-expressions as program or data (and is not true of Lisp hash tables nor arrays). The same is true of a subset of Snobol. It is not true of C, Java, C++, even though they're TuringEquivalent.
Also: homoiconicity is something you can't just add to a programming language with a class or library or enough code or extensions. It's a property of the language definition itself.
>Q: Well, the C language doesn't natively support constructs to manipulate C code, true, but what if, when compiled, it...
>A: Nope. Doesn't matter. If a language isn't homoiconic at the source level, then no example of what can be done once it's compiled will change that. Why not? Because compilation means translation to a different language (such as machine language). Anything you can say about the compiled program is a statement about a language other than C.
>Q: Well, but I could write a C program that, when run, would...
>A: Nope. Doesn't matter. Same issue. You could write a C program that implements a Lisp interpreter. That doesn't make C into Lisp.
[ Epic flame war about whether or not machine language is homoiconic redacted! ;) But the final quesiton is interesting: ]
>Q: Is Homoiconic much ado about nothing ?
>A: Yes, after much wasted bandwidth on c2 this seems to be the only logical conclusion. In particular "homoiconicity" should, in principle, facilitate meta-programming techniques, on its own it is of very little value. Languages without homoiconicity have managed to accomplish a lot in this area using lighter techniques. See for example AspectWerkz, RubyOnRails, etc. In the same time some homoiconic languages like Common Lisp fall far short from being fully reflective environments, and this subtracts further from the value of being homoiconic. In the end, what the client programmer should ask for is results. Whether or not a language is fully, 50% or 0% homoiconic matters very little.
> I've heard before that it's only "code is data" and not "data is code" since the data you manipulate will not always be code. Is this correct?
Any data that is going to be processed impacts the result of processing, so data is code, too.
That's most clearly true when data gets passed directly to an execution engine (eval, or sent to a database as SQL after some other bits get stuck onto it, etc.), but there's a perspective in which it is true generally, though not all data processed is unrestricted code.
the difference between "millions of lines" and "a few easily implemented primitives" is quantitative. Thus there is a continuum between the two extremes, which was the point of the OP.
In case the given host implementation can be AOT compiled, the resulting binary will for all practical purposes be a LISP interpreter. Bootstrapping isn’t rare in this area.
This sounds like "Given that modern Javascript engines have JIT capabilities, the eval() function is for all practical purposes a Javascript interpreter"...
The fun of writing an interpreter is that you can extend it with new features easily. With `eval $1` you're stuck with whatever it does (unless you want to do string manipulation of code -- it's only usually a bad idea :-) )
The book Structure and Interpretation of Computer Programs is organized by types of abstractions. The first three sections are about procedures, data, and modularity, but then the fourth is about abstracting language itself (called "metalinguistic abstraction"). This was a principle used often in classic AI research, where you built specific languages to implement different planning systems -- incidentally this is why it's called "scheme". By building a lisp in a lisp, you could make use of all the facilities of the host language and add new useful features. The book demonstrates how to add lazy evaluation, nondeterministic operators, and even Prolog-style logic programming to the language.
These sorts of interpreters depend on having a compatible host language. The last section is about how you can transcend limitations by essentially simulating an appropriate CPU in the host language (i.e., a virtual machine). This is how the original Scheme was implemented in its host Lisp, which was necessary because Scheme has tail recursion and Lisps haven't traditionally had that.
The part that I think I do a poor job of understanding/explaining is that lisp doesn't eval strings of code. Instead, it evals lists of tokenized data.
The main difference is in how basic the language concepts are that the interpreter uses. Pattern matching a list is quite a lot more basic than an 'eval' function that supports executing an arbitrary statement.
Very well written, concise, and helps with getting a novice programmer going with Python. Which is a huge selling point to practically minded new programmers because you can trick them into it with "this is the best way to learn Python read this."
I haven't compared it to SICP though, which is much longer.
Seconding Lisp in Small Pieces. I was so impressed with it during my student years that I even translated it into Russian. They don't lie about this “profound sense of enlightenment” once you finally get it.
I haven't read that one. I've always recommended my devs read The Little Schemer if they ever need something to read on a plane ride. I always recommend reading it while NOT in front of a computer.
This is really cool! Is it the most beautiful program ever written? Not IMHO. I can't find it now, but there was a short live coding demo/presentation that essentially went from a blank file, to a stack-based language, to something like a webasm interpreter in like an hour with full explanations that made it all seem painfully obvious. It sent shivers up my spine and was absolutely amazing to watch.
I recently completed Programming languages - Part B, by Dan Grossman on coursera (https://www.coursera.org/learn/programming-languages-part-b). The homework in Racket was a very similar problem. Needless to say it was mindbending and to get the final interpreter work with all those expressions(nested) / lambdas was so satisfying!
Glenn Reid wrote a PostScript partial evaluator in PostScript that optimized other PostScript drawning programs, called "The Distillery". You would send still.ps to your PostScript printer, and then send another PostScript file that drew something to the printer. The first PostScript Distillery program would then partially evaluate the second PostScript drawing program, and send back a third PostScript program, an optimized drawing program, with all the loops and conditionals unrolled, calculations and transformations pre-computed, all in the same coordinate system.
It was originally John Warnock's idea, that Glenn implemented. And it led to Adobe Acrobat's "Distiller". Acrobat is basically PostScript without the programming language.
No, you could not make it optimize itself by sending it to a PostScript printer two times in a row. It was not magic: all it did was intercept and capture the side-effects of the PostScript drawing commands (fill, stroke, show), read out the path, and optimize it in a uniform coordinate system. Since it didn't do any drawing, so it would just output an empty program if run on itself. (Take that, halting problem!)
>From: greid@adobe.com (Glenn Reid)
Newsgroups: comp.lang.postscript
Subject: release 10 of the Distillery
Date: 10 Mar 89 10:21:52 GMT
>Here is another release of the PostScript Language Distillery. I know
it's not terribly long after the last release, but there are some
significant enhancements, and I thought it would be worthwhile.
>I finally took a closer look at user-defined fonts, which now seem to
be working fairly well. In particular, it seems to handle the
Macintosh screen bitmap fonts that get used if the native font is
unavailable when the print file is genreated. The entire user-defined
font is reverse-engineered to the output file as it stands, and is used
exactly like the original file used it. I also fixed some rotate text
bugs, rotated charpath, and a few other things.
>I want to emphasize that probably the two best uses of this program,
currently, are to do speed comparisons of various PostScript language
drivers and to convert "non-conforming" files into "conforming" files.
It is not particularly well suited to carefully hand-written programs,
especially not those which use looping constructs. It works (usually),
but it unrolls the loops and makes the files much bigger.
It's also possible to simply write a metacircular PostScript evaluator in PostScript:
>The data structure displays (including those of the Pseudo Scientific Visualizer, described below) can be printed on a PostScript printer by capturing the drawing commands in a file.
>Glenn Reid’s “Distillery” program is a PostScript optimizer, that executes a page description, and (in most cases) produces another smaller, more efficient PostScript program, that prints the same image. [Reid, The Distillery] The trick is to redefine the path consuming operators, like fill, stroke, and show, so they write out the path in device space, and incremental changes to the graphics state. Even though the program that computes the display may be quite complicated, the distilled graphical output is very simple and low level, with all the loops unrolled.
>The NeWS distillery uses the same basic technique as Glenn Reid’s Distillery, but it is much simpler, does not optimize as much, and is not as complete.
>The Metacircular Postscript Interpreter
>A program that interprets the language it is written in is said to be “metacircular”. [Abelson, Structure and Interpretation of Computer Programs] Since PostScript, like Scheme, is a simple yet powerful language, with procedures as first class data structures, implementing “ps.ps", a metacircular PostScript interpreter, turned out to be straightforward (or drawrofthgiarts, with respect to the syntax). A metacircular PostScript interpreter should be compatible with the "exec" operator (modulo bugs and limitations). Some of the key ideas came from Crispin Goswell's PostScript implementation. [Goswell, An Implementation of PostScript]
>The metacircular interpreter can be used as a debugging tool, to trace and single step through the execution of PostScript instructions. It calls a trace function before each instruction, that you can redefine to trace the execution in any way. One useful trace function animates the graphical stack on the PSIBER Space Deck step by step.
>The meta-execution stack is a PostScript array, into which the metacircular interpreter pushes continuations for control structures. (forall, loop, stopped, etc…) A continuation is represented as a dictionary in which the state needed by the control structure is stored (plus some other information to help with debugging).
>It is written in such a way that it can interpret itself: It has its own meta-execution stack to store the program’s state, and it stashes its own state on the execution stack of the interpreter that’s interpreting it, so the meta-interpreter’s state does not get in the way of the program it’s interpreting.
>It is possible to experiment with modifications and extensions to PostScript, by revectoring functions and operators, and modifying the metacircular interpreter.
>The metacircular interpreter can serve as a basis for PostScript algorithm animation. One very simple animation is a two dimensional plot of the operand stack depth (x), against the execution stack depth (y), over time.
How beautiful can a nonstop running live Lisp evolve to for the duration of an unsigned 64bit clock count attached to a nuclear or solar or geothermal power source and accreting new hardware like spacelab plus Nauka lab?
I guess as beautiful as C++ and C code controlling the termodynamics of properllers in an hostile weather organism while looking for the presence of life.
See https://tromp.github.io/cl/cl.html and https://tromp.github.io/cl/Binary_lambda_calculus.html for details...