The class mentioned in the article was actually made available under Creative Commons by the MIT OpenCourseWare[1], you can see the entirely of the course in Youtube. I watched and cannot recommend it enough, even for seasoned programmers
$ mit-scheme
MIT/GNU Scheme running under GNU/Linux
Type `^C' (control-C) followed by `H' to obtain information about interrupts.
Copyright (C) 2014 Massachusetts Institute of Technology
This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Image saved on Friday January 4, 2019 at 11:17:34 PM
Release 9.2 || Microcode 15.3 || Runtime 15.7 || SF 4.41 || LIAR/C 4.118 || Edwin 3.116
1 ]=>
MIT scheme does not love us as users. That's fine. That's entirely their decision of which this is one example. This aspect of the mit-scheme culture is a pretty significant hurdle to it gaining popularity. Probably one that was simply too big. I believe it has been replaced with python for 6.001 nowadays too.
Another example of the same mentality can be seen in the really great SICP video lectures. Jerry Sussman treats the audience like a compiler. He takes the view if he says it once correctly, successful communication of the idea is no longer his problem. That's a lot easier for the audience now we can rewind videos and listen to some small number of sentences again, and again until we get it. It's not exactly a fun way to learn things. It is also a massive contrast to the way say Larry Wall or Guido Van Rossum went about things. I've noticed since Perl ever single new language trumpets how friendly and /helpful/ the community is. I think that's probably pretty significant. This is not a criticism of Jerry, I don't know what his objectives with the language and SICP course were. Hard to get into is hard to get into whatever the reasons.
Racket can also be used in terminal. But I think it's harder to debug in terminal compared to debug in DrRacket especially in the later chapter. Besides, racket also has a sicp package for the book(https://docs.racket-lang.org/sicp-manual).
The first version of factorial in SBCL takes less time and processor cycle,
CL-USER> (time (fact 5))
Evaluation took:
0.000 seconds of real time
0.000002 seconds of total run time (0.000001 user, 0.000001 system)
100.00% CPU
1,550 processor cycles
0 bytes consed
120
compared to second version of factorial
CL-USER> (time (fact2 5))
Evaluation took:
0.000 seconds of real time
0.000027 seconds of total run time (0.000027 user, 0.000000 system)
100.00% CPU
64,240 processor cycles
0 bytes consed
120
How it works for other lisp implementation. However when I reach for bigger number things looks different.
CL-USER> (time (fact 10000))
Evaluation took:
0.044 seconds of real time
0.045335 seconds of total run time (0.036745 user, 0.008590 system)
[ Run times consist of 0.021 seconds GC time, and 0.025 seconds non-GC time. ]
102.27% CPU
111,336,538 processor cycles
69,739,552 bytes consed
CL-USER> (time (fact2 10000))
Evaluation took:
0.021 seconds of real time
0.021083 seconds of total run time (0.020312 user, 0.000771 system)
[ Run times consist of 0.003 seconds GC time, and 0.019 seconds non-GC time. ]
100.00% CPU
51,983,719 processor cycles
78,751,760 bytes consed
I wonder some kind of compiler magic kicks in, for higher number.
You don't really have a precise measure when the time is 0.000002 seconds, there are other factors that can fudge the timings. But, there is indeed a better performance for "fact" on the small input. Notice that I have to make a loop and make sure the result from the function is used:
USER> (time (loop repeat 100000000 for f = (fact2 5) finally (return f)))
Evaluation took:
3.208 seconds of real time
3.208269 seconds of total run time (3.208269 user, 0.000000 system)
100.00% CPU
10,241,019,690 processor cycles
0 bytes consed
120 (7 bits, #x78, #o170, #b1111000)
USER> (time (loop repeat 100000000 for f = (fact 5) finally (return f)))
Evaluation took:
2.972 seconds of real time
2.971409 seconds of total run time (2.971409 user, 0.000000 system)
99.97% CPU
9,484,980,134 processor cycles
0 bytes consed
120 (7 bits, #x78, #o170, #b1111000)
I think the additional local functions might add a bit of overhead that is noticeable on small inputs.
So I added (declare (inline i)) in fact2 (i is iter), recompiled it, and I had a warning:
note: *INLINE-EXPANSION-LIMIT* (50) was exceeded while inlining I
Then, the timing for fact2 was faster:
USER> (time (loop repeat 100000000 for f = (fact2 5) finally (return f)))
Evaluation took:
1.900 seconds of real time
1.897450 seconds of total run time (1.897450 user, 0.000000 system)
99.84% CPU
6,056,874,280 processor cycles
0 bytes consed
And if I look at the disassembly for fact2, there is a lot of repetition, and it does look like it was inlined recursively.
--- edit
Here is a loop fact3:
(defun fact3 (x)
(declare (type integer x) (optimize (speed 3)))
(loop
for n of-type integer = x then (- n 1)
for a of-type integer = 1 then (* n a)
while (> n 0)
finally (return a)))
USER> (time (loop repeat 100000000 for f = (fact3 5) finally (return f)))
Evaluation took:
1.636 seconds of real time
1.635138 seconds of total run time (1.635138 user, 0.000000 system)
99.94% CPU
5,219,556,544 processor cycles
0 bytes consed
It could be faster but here this supports big integers. Also, please not that this is a micro benchmark, there is a ridiculous amount of iteration needed to have 2-3 seconds of run time.
You're right that this is not a fair comparison (I was mostly interested in showing a variant with loop that is guaranteed to work on all implementations, unlike fact2).
Without type declaration, and without speed 3 optimization, the fact3 case is:
Evaluation took:
2.628 seconds of real time
2.624228 seconds of total run time (2.624228 user, 0.000000 system)
99.85% CPU
8,395,768,934 processor cycles
0 bytes consed
Still less that fact2.
But fact2 optimized and with types is:
Evaluation took:
1.416 seconds of real time
1.419171 seconds of total run time (1.419098 user, 0.000073 system)
100.21% CPU
4,530,124,009 processor cycles
0 bytes consed
For reference, here are the dissassembly for both optimized versions.
The code after expansion of the loop is a tagbody (labels and goto statements) which calls setq on local variables, whereas the tail-call recursive case is not expanded; the control-flow graph might be easier to optimize given that LABELS is a special form.
I come from scheme land, where that internal tail recursive function is the way to go for speed (internal definitions are easy to verify that they are never changed, which allows to skip a lookup - the same should be true for CL). I thought the same would be true for SBCL, but apparently it needed some nudging.
Look at the disassemble of it. I think the first example is the surprising bit. SBCL does TCO, and the the tail-calling version should always be faster. Maybe SBCL is inlining and unrolling the first version for small numbers.
def f(n):
if n <= 0:
return n
return f(n-1)
f(1000)
will blow out your stack. Meanwhile,
scheme@(guile-user)> (define (f n) (if (<= n 0) n (f (- n 1))))
scheme@(guile-user)> (f 1000000)
$1 = 0
Also bear in mind that this was lesson 1 of an intro programming course that culminates in writing a Scheme interpreter in Scheme. There's something to be said about the simplicity / consistency of a language's syntax such that a first-year undergrad can implement it.
That was the point of the lecture talking about how their system /could/ do this. I get the impression that scheme was one of the first languages that mandated that, though I suspect someone with a better view of history can correct that.
For common lisp, I thought it was just not required, but that it was common to do so?
Although, a more nuanced point: it may not be required by the Common Lisp spec, but that's not to say that it's not done.
clisp:
[1]> (defun f (n) (if (<= n 0) n (f (- n 1))))
F
[2]> (f 1000000)
*** - Lisp stack overflow. RESET
sbcl:
* (defun f (n) (if (<= n 0) n (f (- n 1))))
F
* (f 1000000)
0
(Apparently clisp does do some tail call optimization, but I can't figure what it'd be used for other than an non-terminating loop such as a REPL, if it doesn't even handle this simple case properly.)
Heck, even GCC can do tail call optimization for C, even though that's definitely not in the spec.
#include <stdio.h>
int f(int n) {
if (n <= 0) {
return n;
}
return f(n - 1);
}
int main(int argc, char **argv) {
printf("%d\n", f(10000000));
}
Try compiling that with gcc -O0 (wherein you'll get a segfault) vs -O2 (wherein it'll print 0 as hoped).
> Apparently clisp does do some tail call optimization, but I can't figure what it'd be used for other than an non-terminating loop such as a REPL, if it doesn't even handle this simple case properly.)
You simply forgot to compile the code!
[1]> (defun f (n) (if (<= n 0) n (f (- n 1))))
F
[2]> (compile 'f)
F ;
NIL ;
NIL
[3]> (f 1000000)
0
The limited form of tail call in CLISP is a feature of its compiler only, not of the interpreter.
SBCL compiles everything, so there is no need.
But SBCL requires an existing Common Lisp installation for bootstrapping, whereas CLISP bootstraps itself just from C thanks to its interpreter.
This is what Rich said about TCO in Clojure in 2010:
```quote:
When speaking about general TCO, we are not just talking about
recursive self-calls, but also tail calls to other functions. Full TCO
in the latter case is not possible on the JVM at present whilst
preserving Java calling conventions (i.e without interpreting or
inserting a trampoline etc).
While making self tail-calls into jumps would be easy (after all,
that's what recur does), doing so implicitly would create the wrong
expectations for those coming from, e.g. Scheme, which has full TCO.
So, instead we have an explicit recur construct.
Essentially it boils down to the difference between a mere
optimization and a semantic promise. Until I can make it a promise,
I'd rather not have partial TCO.
Some people even prefer 'recur' to the redundant restatement of the
function name. In addition, recur can enforce tail-call position.
That's not quite right. In Scala, the annotation is there so you get a complier error if it does not do the optimization. It doesn't affect when the compiler actually does the annotation.
As you can imagine, there are cases where the developer is counting on the optimization but the code isn't such that it can be made.
Looks like it's a scalac optimization that doesn't always work hence the need for the annotation to give the developer feedback.
There is also a link to a Closure discussion board exploring why Clojure doesn't have the annotation, by design. I didn't read it as I'm not that interested.
The Common Lisp standard doesn't mandate tail call optimization, but it also doesn't even mandate garbage collection. It can be safely assumed that any non-toy implementation will include both.
> Also bear in mind that this was lesson 1 of an intro programming course
Back in '83 when the author took 6.001 only a minority of students had previously used a computer (note that Hal is quoted as having said, “If you already know how to program..." (emphasis mine). For the majority of students back then this wasn't just "introduction to programming" but "introduction to computers".
But this is not symbolic differentiation, it's numerical differentiation. A true symbolic integrator would not give 48.00120000993502 when asked for the value of 3x^2 at 4.
This kind of approximation exercise is given to calculus students all the time ("Use the definition of derivative to approximate the derivative of f(x) at 3 by taking dx=.001," etc.).
Yann LeCun created a lisp called Lush which allowed for the intermixing of C code with lisp and also allowed for interaction with Python. Eventually he would later make Torch which used Lua which was inspired by Scheme. Then Tensorflow and Pytorch basically are the two large players in deep learning. Now Google has JAX which allows for symbolic differentiation of Python code.
This is certainly one of the earlier motivations on why you should make procedures "first-class" in the language. And, indeed, it should be equally impressive in any language that supports things as a first class member.
I have come to feel that "first class" will eventually be expanded to "can operate against definitions as easily as with them." (That is, can your language let you modify code with the same style code that can be used to execute it?)
If there's already a language that's good at CRUD and ETL (or aims to be), and most programs are that, then it's not a problem to have other languages that are good at other things at the expense of being good at CRUD and ETL.
Even if it's true, as you say, that it's rare to write a derivative function, it still does come up. And there should be a good way to express it in at least some languages.
Every few years I see this great story popping up. I love it! A lot of things seem complicated, but are not, if you just approach it the right way, and if you don't get bogged down in up-front technical optimisations which you feel are necessary, but might not be.
It's a lovely blog series, worth delving into other bits and pieces.
This one is particularly relevant. Tail-call optimsiation _not_ implemented and why. Usually the sort of thing that is not mentioned when celebrating all the good fun of lisp.
When I read this I feel like I do when my older friends talk about seeing Miles Davis or Bill Evans in New York or some such. Man, to be there... then. At the feet of legends. Damn.
- I don't get lisp's (lisp-2) dynamic scoping (a big reason I haven't switched from vim to emacs). I'm much more comfortable with the lexical scoping of scheme (lisp-1).
- I don't get big schemes (like R6RS). For a while, I thought of my own implementation of R5RS. But I've heard R7RS is pretty much R5RS with improvements. So R7RS it is.
- But I'd like to have macros. I wonder if R7RS has macros.
- Mainly interested in crazy metaprogramming and logic/relational experiments using Friedman/Byrd school of thought (reasoned schemer, minikanren, constraint kanren, that sort of stuff).
- Finally instead of rolling my own from scratch, I've heard I could use something like chibi as a starting point (additional points since chibi is implemented in C).
> - I don't get lisp's (lisp-2) dynamic scoping (a big reason I haven't switched from vim to emacs). I'm much more comfortable with the lexical scoping of scheme (lisp-1).
Lisp-1/2 has to do with functions and other identifiers living in separate namespaces. So in a Lisp 2 requires special functions that live in the function namespace to call functions that exist as a local variable, (i.e. funcall or apply in Common Lisp)
If you're having a hard time coming to terms with dynamic scoping (and I think most Lispers don't like it) and you're interested in metaprogramming, try writing a macro in a dynamically-scoped lisp that lets you use lexical scoping. This is kind of a fun exercise and you can do it in Elisp.
I used to be interested in Lisp - in particular Clojure - for metaprogramming and mini-kanren mostly - so you're mindset reminds me of my own. But I realized I could get much the same thing as metaprogramming with data-oriented programming, or making an interpreted language to do what I needed. The performance of raw native code is comparable in some cases. Where that fails, you could theoretically use Perl or Python to generate C code that just has to be performant but is very repetitive. The benefit of that is that mere mortals will understand what you're doing better, and you're not forced to bear the burden of a garbage collected runtime.
> - I don't get lisp's (lisp-2) dynamic scoping (a big reason I haven't switched from vim to emacs). I'm much more comfortable with the lexical scoping of scheme (lisp-1).
"Dynamic vs lexical scope" doesn't have anything to do with "Lisp-1 vs Lisp-2". "Lisp-1 vs Lisp-2" is about whether function values are distinguishable from "defun"-defined functions. "Dynamic vs lexical scope" is about whether variable name-resolution is done based on the lexical scope stack, or the call stack.
You can enable lexical scoping in Elisp files by starting them with:
I think you might really like s7 Scheme. Full TCO, mostly R5RS, and full Common Lisp style macros with keywords, gensyms and first class environments. And it embeds in C in a snap. I'm using it for Scheme For Max and loving it. Based on what you're saying I think it might hit the sweet spot for you.
I don't know about elisp, but common lisp uses primarily lexical scoping. If you ignore ‘defparameter’/‘defvar’, then you'll be living exclusively in lexical scoping land.
> I'd like to have macros. I wonder if R7RS has macros
R7RS only specifies hygienic macros, but unhygienic macros are a commonly implemented extension. They aren't present in chibi scheme, but you find chicken scheme[0] or s7 scheme[1] interesting.
Chibi has explicit renaming macros. If you really want less hygiene than that, you can implement defmacro on top of er-macro-transformer.
Edit: er-macro-transformer IS unhygienic unless you tell it otherwise, much like gensym, except all renamed bindings introduced in the same macro invocation are the same. Gensym is of course available as well
Dynamic scope in Clojure is a key way to model supervised local side effects (side reads and side mutations) in a tightly controlled way while ensuring that the expression as a whole remains referentially transparent and the effect behaviors are late bound. If you are into crazy metaprogramming, this is pretty important imo. The only other way to model effects that I know of is monads which involve lots of opaque closures, which block metaprogramming. Code-as-data captures a lot of static information about your program which monads lose.
I have very little interest in the ML family (well, let's just say that my interest in ML is in a different part of my brain that I don't want interfering with my thinking/exploration about systems like lisp/scheme and what kind of metaprogramming can be done with them).
I guess I'll have to think about dynamic-scoping a little more.
I mean once you have a function object instance, you've lost the source code, you can't know anything about what it does (beyond the type) except what you can glean by running it.
Emacs Lisp is a rare case of being in certain ways older than Scheme.
Common Lisp, which is the main representative of Lisp-2 family, is firmly in lexical scoping, which is what enabled the example in the linked article. It also has wild unrestricted macros, and much more stable language (less redoing work when moving from implementation to implementation, especially if you use wrapper libs around stuff outside standard library)
I must admit that I never understood what these particular examples had to do with Lisp being such a great programing language given that you can implement exactly the same logic in basically any programing language, including for instance C (although you'd have to cheat a bit for the `deriv` function, admittedly):
Any language supporting closures or generics could get rid of the unsexy DERIV macro.
What sets Lisp apart is its "code as data" approach which lets the coder write incredibly powerful macro (and also incredibly hard to understand code). The rest can easily be emulated in other languages.
I do like this naive approach mind you, there's a certain Zen to be found reimplementing these mathematical concepts from first principles, I just really fail to see what makes Lisp so remarkable in this context.
Well, the major difference here is that your C method requires that all this stuff be defined at compile time, while the Scheme deriv works with even functions that are dynamically defined at run time, anonymous functions, etc.
I'm not sure how you would do that in C, other than by Greenspunning up an implementation of first-class functions.
I'm a big fan of C but, to be fair, your example makes border-line abuse of the macro system. In my opinion, the example doesn't show that C language per se (without the preprocessor part) is capable of higher-order functions (but as others have pointed out, it is, in the form of function pointers).
The preprocessor is part of the language. Also there is no really such a thing as "abuse" of the macro system, this is taking advantage of a language facility to do something one wants. It is like saying someone abuses the type system when they define a handle type or abusing the namespace system when they define a static variable.
C's preprocessor is very powerful and like the rest of the language it can be used to shoot yourself in the foot - but that doesn't mean you shouldn't take full advantage of it or that actually taking advantage of it instead of treating it as a poor man's module system is "abusing" it.
You could do deriv properly in C if you have support for “blocks”; otherwise you’ll have to store your state in a struct.
I really wouldn’t have got this stuff 5 years ago but functions returning functions is a super sweet tool and I’m glad it’s finally seeing mainstream adoption. It’s a small leap from the imperative mindset but results in much neater flows when done well.
Doing "code as data" doesn't seem too difficult. The obvious way for C++ would be to link with llvm. That would let you generate strings of C++ that you can then compile and run. Being less strict about the requirements, it is not so hard for C. You'd write out a C file, compile and link it into something appropriate (.so or .dll or *.dylib), and load it. If you don't insist on the code being in the same language, it's even easier. Emit some native code, then call it. Every JIT does this.
It's really not different, particularly the first example.
If you're going to claim it is different, then I claim that LISP can't do addition. When you have (+ 1 1) that is something completely different, because you have the syntax fucked up. It's not addition like I know it, so it doesn't count.
It's very different since Lisp doesn't 'compile and run strings', it treats code as data structures - which means you can manipulate code just like data since they live in the same space, so to speak.
And also, it has nothing to do with the syntax (+ 1 1), the keyword is homoiconicity.
You mean a toy LISP or Scheme? Any serious LISP system uses a compiler. It compiles and runs strings. The "eval" creates native binary executable code.
No decent-performing system totally skips compilation. You just don't notice that it is happening. Even a web browser compiles the javascript before running it; that is what a JIT does.
I mentioned the syntax "(+ 1 1)" because you're trying to disqualify C++ homoiconicity on a superficial syntactic basis. If you can do that, then it is equally valid to say that LISP can't do simple addition.
No, I mean that it has nothing to do with compilation of strings. Homoiconicity means code is represented via data structures, so it can be manipulated with the language itself. C++ is not homoiconic.
You aren't calling them strings, but they are. I didn't specify "NUL-terminated ASCII" or anything like that. Call them S-expressions if you prefer. It makes no difference.
C++ can manipulate itself. Granted, you'll want a library such as LLVM, but you can get the job done yourself if you are a glutton for punishment.
Code is represented via data structures on normal (von Neumann) hardware. LISP is every bit as compiled as C++. Both forms of C++, human-readable and compiled, can be manipulated. It's the same as LISP, except that "human readable" is debatable for any form of LISP.
It's not the same and it's not represented as strings in Lisp, but I'm not going to commit further. You can read up on homoiconicity and Lisp on the internet, then you will see the difference.
OK, done. I read up on both. (and had earlier; I'm not unaware) Still I see no difference. There is none. Superficial syntactical differences do not matter, unless you'll agree that LISP can't do addition.
BTW, in LISP, actual usage of the homoiconicity is the surest way to make a mess of unmaintainable code.
The syntax of those defines sure looks weird. In what dialect does the name of the function being defined appear as the CAR of a list, followed by the parameters?
Yeah but none of those are inherent advantages of Lisp...
I really don't get this sort of "Lisp is cool" article touting advantages the Author was impressed by during their childhood that have long since been overtaken by technical advancements in other languages.
Except in most of the other languages, it just comes out so FUGLY. I have done metaclass hacking in Python, and you can do it sure, you can make it do some cool stuff even. But ugh, it burns the eyeballs to implement. Manipulating the AST is just so elegant in Scheme compared to anything else I've seen.
I'll give you directly rewriting the source tree as a list, that is pretty cool. But about 70% of the coolness of macros can in principle be done in a C-like. (I'm doing it!) It just comes out a bit ugly cause C-like parse trees are inherently ugly, imo, cause the language imposes a more diverse structure.
Pattern matching would probably help a ton for destructuring - being able to take "$a + $b" and decompose "2 + 3" into "a=2, b=3", but that's kind of difficult. There's ways you could do it cleanly, like imposing a very predictable structure on your AST nodes, but they end up sort of reinventing Lisp through the backdoor. Luckily you don't need that for most macros, afaict.
I spend all my time these days in S7 Scheme and ANSI C to be honest. (I guess technically in C, and in Scheme-in-C!) So I won't argue against the beauty of C when you want to know exactly what the computer is doing... It's gotten to the point where I just don't even want C++ anymore, I'm becoming one of those weirdos, haha. :-)
Oh no count me 100% in the anti-C++ camp, it's a horrid language. I'm happy enough using D for the dayjob, but I'm writing a replacement on the side.
Weirdly, lots of D people end up doing that. I blame the fact that it's a language that's both obviously better than the languages it's competing with, and also obviously worse than it could be.
I'm also writing my own Scheme-like language in Zig, which will be used mainly for scripting in my game engine. The simplicity of Zig + Lisp is godsent. (all experimental, just a hobby, no production purpose)
Is a language that's both obviously better than the languages it's competing with, and also obviously worse than it could be the definition of an adoptable language?
> “With an iterative process, the computer doesn't need to use up resources for each iteration. If you know a bit about programming, you might have expected that each iterative call, like each recursive call, would consume a small amount of memory and eventually run out. Our system is smart enough to notice the iteration and avoid doing that.”
Yes, getting recursion right is old hat. But the fact that it was solved a long time ago doesn't mean it's solved for me today. If I program like that in my day job in 2020, I do in fact "consume a small amount of memory and eventually run out"
[1]: https://www.youtube.com/watch?v=-J_xL4IGhJA&list=PLE18841CAB...