One of them is that, each time you read a new program, you have to learn a new language. Probably no language benefitted / suffered from this more than Lisp.
Any signficant project in any language has a "dictionary" of custom identifiers, whose vocabulary you have to learn if you want to become a maintainer.
The thousand functions in a big C program are just as much a language, as some Lisp macros.
The functions all do something. That something isn't "transform the code", but that doesn't matter; if you're looking at the function call and don't know what it does, you're just as lost.
Basically, if you're reading Lisp, you go from "outside in" and just assume that everything you see whose definition you are not familiar with is a macro.
Well-behaved code follows certain unwritten conventions. For instance, it avoids creating confusion by exhibiting multiple completely independent uses of the same symbols in the same scope. So for instance if we have (frobozz a b) wrapped in a lexical scope where we have (let (a b) ...) in effect, this well-behavedness principle tells us that the a and b symbols in (frobozz a b) refer to these variables. So, we can probably lay aside our suspicion that, for instance, frobozz is redefining some unrelated a in terms of some unrelated b. However, frobozz might be a macro; so we can't cast aside our suspicion that either a or b has its value clobbered. (Same in Pascal or C++, with ordinary functions: frobozz(a, b) could take VAR parameters or references, respectively).
This isn't really the case. Some libraries are all macros, some libraries have no macros, but most have a few macros that quickly fall into general patterns: some macros for setting up and tearing down context safely (with-thing macros), others for iterating some data structure without revealing its internals (do-thing macros), generally simple stuff.
> One of them is that, each time you read a new program, you have to learn a new language
Which is not bad at all. You still have to do it with the programs written in the same language built with different libraries and targeting the different problem domains. In the latter case the language is obscuring the essence of the code.
And if you're using a well designed DSL, and you're familiar with the problem domain, it will be readable naturally, just like a pseudocode.
A lot of domain languages need infix operators though. And even if you understand the domain, it's hard to read and refactor code in the presence of unrestricted macros, whereas in a language where such DSLs are implemented via the type system the tooling will understand that.
Why do they need infix operators? Most programmers already use prefix notation with function calls. Infix is primarily restricted to mathematical and logical operations, but the prefix (lisp style, taking an arbitrary number of arguments) is pretty clear when written neatly:
(and cond1
cond2
cond3)
(NB: multiple lines like this would really only be used for a lot of conditions or longer expressions.)
We already do something similar with our algol-like languages:
But at that point it's no longer the language of the domain. If you're going to support writing mathematics that looks like the mathematics that mathematicians write then you need infix operators, because mathematics is written with infix operators. Similarly for many other domains.
That is incorrect; the notation (op arg1 arg2) isn't the surface syntax that is preferred by some practitioners working in that domain. However, it corresponds 1:1 to its abstract syntax.
There are ways to provide infix independently. That is to say, one person can develop this domain specific syntax, and another can develop or customize an infix engine for it.
In Common Lisp, there is a well known "infix.cl" module that provides mappings like a[b,c] -> (aref a b c), f(x, y) -> (f x y), a + b -> (+ a b) and so on.
This isn't understood as creating a new language as such; it's just a sugar. I think it allows new operators to be added with custom precedence and associativity. Or you can hack it however you want.
I've never seen it used in any production code.
The main purpose it serves is to satisfy people who want to know that it can be done; after that it turns out that they don't actually want it done. They just don't want to work with a language that can't do it.
The only program I'm aware of which actually supports writing mathematics that looks like the mathematics mathematicians write (the actual 2D notation) is Tilton's Algebra: written in Common Lisp.
The various mathematics languages out there fail.
Oh, and not to mention that mathematicians have been trained to work with this:
$\sin{(\frac{\pi}{2}-\theta)} = \cos{\theta}$
That's not such a bad example; I can still sort of see the trig identity in that if I squint my eyes.
Even still, mathematics isn't entirely infix. And outside computer algebra systems, you'll be hard pressed to find examples of the postfix operations expressed as postfix in programming environments. !, for example. What language allows you to express:
n P k = n! / (n - k)!
as the above? `n P k` will become something like `nperms n k`, factorial will be moved to prefix as `fact n / fact (n - k)`. We're already diverging from the domain language, there's no reason to consider a more logically consistent framework in this case than one that seems to have arbitrarily moved some things from infix to prefix, postfix almost universally to prefix, and left some infix as infix.
SML allows you to have an infix function named "P", and Haskell allows something similar, although you have to quote it with backticks and can't use a capital letter to start a function name...
In Prolog you have the ability to create new postfix operators. Of course Prolog being a logic language and not a functional language, along with '!' being used for 'cut' opens up another can of worms.
You're right as far as you go, but it's not an all-or-nothing thing. Most languages won't let you write mathematics exactly like mathematicians do, but the closer you can get the better. For many cases avoiding infix entirely would be a big cost.
(FWIW Scala allows postfix operators, though you'd have to put the n! in brackets)
> Most languages won't let you write mathematics exactly like mathematicians do, but the closer you can get the better
See Wolfram Mathematica for example.
Also, one of my usual DSL tricks is to allow arbitrary TeX in identifiers, which, combined with the literate programming tricks, allows to write very idiomatic mathematical expressions as a code.
Mathematics isn't written with infix operators alone. Mathematics is written in a fancy 2d notation with all kinds of operator types: infix, prefix, postfix, around-fix, sub-fix, super-fix.
If you look at actual software for maths - several famous ones were written in Lisp like Macsyma, Reduce and Axiom - they provide more than infix.
And? DSLs can have any syntax you like. And any type system you want.
And DSLs done the right way, via macros, are much better in integrating with tools than any ad hoc interpreted DSLs would ever be able to. You can easily have syntax and semantic highlighting infered, with auto indentation, intellisense and all the bells and whistles. For no extra cost.
> And DSLs done the right way, via macros, are much better in integrating with tools than any ad hoc interpreted DSLs would ever be able to. You can easily have syntax and semantic highlighting infered, with auto indentation, intellisense and all the bells and whistles. For no extra cost.
No you can't. If the macro is arbitrary code then no tool can offer those things - there's no way to offer intellisense if you don't know what strings are meaningful in the language, and an unconstrained macro could use anything to mean anything.
You know how GNU Bash is customizeable with custom completion for any command, so that when you're, say, in the middle of a git command, it will complete on a branch name or whatever?
Similarly, we can teach a syntax highlighter, completer or whatever in some IDE how to work with our custom macro.
Sure - but at that point we've lost a lot of the value of having a standardized language at all. The whole point of a language standard is that multiple independent tools can be written to work with it - that your profiler and your linter and your compiler can be written independently, because they'll be written to the spec. If everyone has to customize all their tools to work with their own code, that's a lot of duplicated effort. Better to have a common standard for how you embed DSLs in the language, so that all the tools already understand how to work with them.
It is a broken approach. A much better way is to have a standard protocol (see slime for example, or IPython, or whatever else), and use the same tools as your compiler does, instead of reimplementing all the crap over and over again from the language standard.
I expect that not that many C++ tools that do not use libclang will remain.
libclang is just an example, maybe not an ideal one. But, yes, I'm advocating for an executable language spec, one that you'd use as a (probably suboptimal, but canonical) implementation.
Yes you can - as soon as you start wrapping your arbitrarily complex macros into a custom syntax. I am easily doing this stuff with any language I am adding macros and extensible syntax to.
Of course I can. If my tools are communicating with my compiler, they know everything it knows. I have a single tiny generic Emacs mode (and a similar Visual Studio extension) that handles all the languages designed on top of my extensibility framework.
It's trivial. Any PEG parser I add on top automatically communicates all the highlighting, indentation data and all that to the tools (and it's inferred from the declarative spec, no additional user input is required). Underlying typing engines do the same, for the nice tooltips and code completion. The very compiler core doest the same with all the symbol definitions, dependencies, etc. Easy.
This sounds very interesting :) And I think Colin Flemming is doing something similar in Cursive? In any case, I'd like to see more of what you are talking about - do you have any more documentation of it, a writeup, blog post or video?
If I understand it correctly, Cursive is something different, they don't want to run an inferior Clojure image (unlike Slime, for example), but reproducing a lot of Clojure functionality with their massively complex static analysis tools. But I might get it wrong, all the information I have about Cursive came from one of its advocates who is very aggressively against the very idea of an inferior REPL for an IDE.
I've got some code published, but not that much in writing, planning to fix it some time later. See the stuff at my github account (username: combinatorylogic). Relevant things there are Packrat implementation, literate programming tools and an Emacs mode frontend.
Exactly! You end up defining your macros in a particular restricted subset of lisp, and your tooling for Emacs and Visual Studio has to know about that particular subset. Other people writing similar macros will no doubt have their own, subtly different subset, and their own integrations for their subset. But since your way of writing declarative specs for language customization isn't standardized, you can't use each other's tool integrations.
The way you express DSLs is something that needs to be understood by language tooling, so it belongs in the language spec.
No. Tools do not know anything about the restrictions. In fact they work with a wide range of languages, not just lisp. The only "restriction" is a protocol, built into the macro expander, syntax frontend and compiler core.
So, in your rot13 example compiler would rat all the new identifiers with their origins to the tools.
> So, in your rot13 example compiler would rat all the new identifiers with their origins to the tools.
How can the compiler know which identifier connects to which origin, unless because the macro complied with some standard/restriction/protocol? From a certain perspective all I'm suggesting is making these protocols part of the language standard - that is, define the DSL that's used to define DSLs, rather than allowing macros to consist of arbitrary code.
> Yes you can - as soon as you start wrapping your arbitrarily complex macros into a custom syntax.
Well, by that definition you get exactly the same if the host language of your DSL is statically typed and doesn't use macros. Custom syntax is custom syntax and whether tools/IDEs understand it has nothing to do with the host language.
Of course macro+syntax extension got absolutely nothing to do with what you can achieve in a language without macros.
And, no, you did not understand. Any custom syntax you're adding (if the right tools are used, like mine, for example) would automatically become available for your IDE and all the other tools, because they're reusing the same compiler front-end.
> And, no, you did not understand. Any custom syntax you're adding (if the right tools are used, like mine, for example) would automatically become available for your IDE and all the other tools, because they're reusing the same compiler front-end.
Just being able to execute the macro isn't enough for the IDE though. E.g. if a macro is "rot13 all identifiers in this block" then sure the IDE can run it, but it can't offer sensible autocompletion inside the block without understanding more about the structure of the macro.
IDE does not execute the macro - it knows the result of its expansion from the compiler. And compiler keeps a track of all the identifiers and their origins.
The IDE can autocomplete the rot13ed identifiers from outside, perhaps. But it can't possibly suggest rot13ed identifiers inside the macro block for autocomplete, because it can't possibly know that that's what the macro does.
Why? You know which macro made the identifiers. You know what this macro consumed. In most practically important cases this is sufficient.
But, yes, you cannot do it with the Common Lisp approach, where macros operate on bare lists, not the scheme-like syntax objects. The problem here is that the lists had been stripped from the important location metadata. For this reason I had to depart from the simple list-based macros and using custom syntax extension with rich ASTs underneath. Still, on top of a Lisp.
Even with location information, if the IDE's going to offer autocomplete inside the macro it would need to be able to invert the way the macro transforms identifiers, which is not possible to do to arbitrary code.
I agree that this is very rarely practically important - but if you think about it that's precisely the fact that a more restricted alternative to macros should be adequate.
> And? DSLs can have any syntax you like. And any type system you want.
While you're technically correct, as a practical matter you won't implement a type system/type checker for all the DSLs you create. (Nor do I believe you'll even do it for anything approaching a majority.). Obviously, I'm using the impersonal "you" here.
Implementing real type systems is hard and mostly rather tedious work.
I have a nice DSL which makes building complex type systems (including dependent) trivial and fun. So I do not mind adding typing to pretty much any kind of DSLs, including the smallest of them.
I still think you should have some links bookmarked to drop in conversations like this. Bookmarked so you can just type a name rather than waste time looking. Many in these discussions might think you're speculating or have some toy project rather than the interesting one I found digging through your old posts that backs up your claims.
Maybe even have a series of examples that illustrate solutions you keep mentioning so others can learn and apply them. Just a thought. :)
Actually I do have a such a framework [1], but this is not my point here. My point was that it's relatively trivial to implement such a collection of DSLs on top of pretty much any sufficiently powerful (i.e., CL-like macros) meta-language.
If you don't have a ready to use language construction framework targeting your meta-language, just build one. Easy.
As for typing specifically, the approach is rather fun and simple. Firstly, you'd need something like Prolog. It's really not that much, you can quickly build a passable implementation in just a couple of dozens of lines of code in pretty much any language. See miniKanren [2] for example. Then, any kind of typing is done easily: write a simple pass over your AST (may require some preparation passes, like resolving the lexical scope - but this is useful for all other things too), that will spit out Prolog equations for each expression node that needs typing. Then execute your Prolog code and get your types back. An implementation won't look any more complicated than a formal specification of a type system in a paper, using the type equations (as in [3]).
At this point "what is a language?" (or perhaps "what should a language be?") becomes a more than academic question. In a language with arbitrary macros one could potentially implement any language on top of that language. If we're using the term by analogy to human language, to my mind the key factor is the ability of implementations with no previous interaction to communicate (that is, how much can one express in a way that another will understand (and how deeply)). Arbitrary macros allow anything to be expressed, but it is very difficult for tools to understand the meaning of arbitrary code.
Of course. But, as I said, you don't have to restrict your macros, you only have to add a bit of a protocol on top of them in order to make them play nicely with all the tools.
And then you'll get the most powerful programming environment possible - building arbitrarily complex hierarchies of languages, with a cost of adding a new language being close to zero, and with a free support from all your tools, for no extra cost.
Actually, while building such a hierarchy I naturally came to a number of "restrictions", although they're not enforced. I prefer to build compilers using chains of very trivial transforms, each implemented with at most a total language (or a simple term rewriting for most of the cases). It also helps to maintain a nice interaction with the tools.
All compilers translate a text stream into a tree structure during parsing, so arguing that some language "needs" infix is saying that there is some tree structure out there that can't be traversed depth-first rather than breadth-first.
That can only be true if your supposed "tree" actually contains loops. What language do you know of that generates cyclical parse structures?