TXR – A Programming Language for Convenient Data Munging

kazinator · on May 16, 2019

Author here. Currently working on a debugger. (Threw the old crappy one out.) Backtraces are working. Some of the remaining work is going to require long, uninterrupted concentration that is hard to come by due to taking care of a six-month-old baby.

I have over 50 unreleased patches. There are some bugfixes, including a compiler one, involving dynamically scoped variables used as optional parameters:

      (defvar v)
      (defun f (: (v v)))
      (call (compile 'f)) ;; blows up in virtual machine with "frame level mismatch"

Patch for that:

  diff --git a/share/txr/stdlib/compiler.tl b/share/txr/stdlib/compiler.tl
  index e76849db..ccdbee83 100644
  --- a/share/txr/stdlib/compiler.tl
  +++ b/share/txr/stdlib/compiler.tl
  @@ -868,7 +868,7 @@
                                       ,*(whenlet ((spec-sub [find have-sym specials : cdr]))
                                           (set specials [remq have-sym specials cdr])
                                           ^((bindv ,have-bind.loc ,me.(get-dreg (car spec-sub))))))))))
  -                 (benv (if specials (new env up nenv co me) nenv))
  +                 (benv (if need-dframe (new env up nenv co me) nenv))
                    (btreg me.(alloc-treg))
                    (bfrag me.(comp-progn btreg benv body))
                    (boreg (if env.(out-of-scope bfrag.oreg) btreg bfrag.oreg))

There is now support in the printer for limiting the depth and length.

I added a derived hook into the OOP system; a struct being notified that it is being inherited.

otoburb · on May 16, 2019

"TXR Lisp programs are shorter and clearer than those written in some mainstream languages "du jour" like Python, Ruby, Clojure, Javascript or Racket. If you find that this isn't the case, the TXR project wants to hear from you; give a shout to the mailing list. If a program is significantly clearer and shorter in another language, that is considered a bug in TXR."

That section made me chuckle. Admirable if true.

auvrw · on May 16, 2019

i agree that the general-purpose programming language space is fairly crowded ... the lisp dialect/user ratio especially so.

DSLs, otoh, are in short supply. while awk or plain sed are great for shell programming, this is the only (open source) DSL i'm aware of targeting certain types of NLP-esque "munging". this space is mostly full of statistical approaches, which, while conceptually pure, don't allow the kind of flexibility that would be useful in many applications.

i wonder if, eventually, the DSL portion of TXR could be sheared off (possibly via metacircular evaluation of the TXR lisp?) into something that's portable across lisps or at least to semi-standardized scheme implementations?

kazinator · on May 16, 2019

N. Westbury has been cloning it in Java:

https://github.com/westbury/txr-java

flavio81 · on May 16, 2019

>That section made me chuckle. Admirable if true

Mostly true for very high level languages like Lisp/Scheme, or ML/OCaml/F#/Haskell, when faced against not-so-high-level languages like C, C++, Java.

Against Racket, i wouldn't be so sure. Nor against Ruby.

Python and Javascript are high level languages but they are crippled by some bad design decisions.

Zaak · on May 16, 2019

I know that Python and Javascript have their warts (as do all languages in my experience), but what decisions in particular are you thinking of?

fnord123 · on May 16, 2019

Lambdas are limited to a single expression. Threads don't work the way anyone would ever want threads to work because o the GIL.

That said, homoiconicity (s-expressions) is not a feature I want. It makes code like reading a wall of text compared to a nicely laid out magazine.

kazinator · on May 16, 2019

Since Python lambdas also are expressions, they cannot contain statements, due to Python adhering to the Algol-like statement/expression syntactic paradigm. If a lambda expression could contain statements, that would mean that almost any other kind of Python expression could also contain statements by containing a lambda expression.

But, here above, I'm writing more from the angle of supporting multi-line lambdas that contain statements. Strictly speaking, you are only bemoaning the lack of multiple expression support, not multi-line lambdas.

Python could adopt something similar to the C comma operator. It almost has that in the form of list constructors, except that these return a list, instead of the rightmost value:

  [foo(), bar(), xyzzy()] #   foo, bar and xyzzy are called

Idea: a dummy function called progn could be used for this:

  >>> def progn(*rest):
  ...    if len(rest) > 0:
  ...       return rest[-1]
  ...    return None
  ... 
  >>> progn(1, 2, 3)
  3
  >>> progn(1)
  1
  >>> progn()

So now we can do:

  >>> x = lambda arg: progn(print(arg), print(arg), "done")
  >>> x(42)
  42
  42
  "done"
  >>>

There you go. Lambdas are (effectively) not limited to a single expression. If progn is too long, call it pg (Paul Graham) or pn (Peter Norvig).

Always have your Lisp hat on, even if you find yourself in Python land.

Maybe this is a common trick? I don't use Python; I hardly know anything about it. I wrote one Python program before which garbage-collects unreferenced files from a Linux "initramfs" image, making it smaller (thus reducing a kernel image size). This was in a Yocto environment, which is written in Python 3, so that choice of language made sense.

BTW does Python require left-to-right evaluation order for arguments? I would sure hope so; it would probably be "un-Pythonic" to plant such a bomb into the language as unspecified eval order.

BTW looks like a more idiomatic definition for progn is:

  >>> def progn(*rest):
  ...    return None if len(rest) == 0 else rest[-1]

klibertp · on May 16, 2019

> Maybe this is a common trick?

No. It's obvious and trivial, but you'd be on a verge of being called names if you tried to use it in a Python codebase. Lambdas in Python are limited to a single expression by convention - which in Python-land is scarily rigid and specific - rather than just by the language spec.

Before `... if ... else ...` was added to the language as an expression (I think around 2.5), people had to make do with some workarounds. The fact that `True` and `False` get automatically casted to ints and back allowed for writing something like `[val_if_false, val_if_true][condition]`. Or you could use `and`/`or` combination as per usual. The official stance at the time was to never do this and use an `if` statement instead, but people still sometimes resorted to it. Then, the `if` expression was introduced specifically to combat the use of such workarounds. Now you'd be lynched if you tried to use one of them.

"There should be one - and preferably only one - obvious way to do it" - from the Zen of Python[1].

In general, despite a lot of effort to eliminate them, there are still some creative ways to use the language. It will always be the case, obviously, as you demonstrate. However, that creativity is 100% rejected by the community, to the point that even mentioning inadequacy of some construct for some use case is frowned upon - because it could lead to people inventing creative workarounds. If you try to complain about something in the language, the general attitude is "write a PEP or GTFO". More often than not it results in the latter.

The saddest part of it all is that this apparently is one of the major factors that made Python as popular as it is. There are valid reasons and a lot of advantages to this strategy. Go is similar as far as I can tell. Among the dynamic languages with rich syntax, Python codebases tend to be stylistically very close to each other, and not because there is a lack of ways this rich syntax could be (ab)used, but because doing so is unpythonic.

Haaah, now I said it... I hope not many Python programmers read this thread; I can already see torches and pitchforks on the horizon...

Source: I've been writing Python for the last 12 years for pay.

[1] https://en.wikipedia.org/wiki/Zen_of_Python

HellsMaddy · on May 17, 2019

My experience with Go is that the syntax is more rigid as compared to Python but the community is much less idealistic.

As long as your code is linted with gofmt and it compiles (and doesn’t abuse reflection), the community tolerates more creative uses of the syntax to get around some of the pitfalls of the language – but there is of course less opportunity for creative syntax than in a dynamic language like Python.

fnord123 · on May 17, 2019

>Strictly speaking, you are only bemoaning the lack of multiple expression support, not multi-line lambdas.

Oh, I'm not bemoaning. I'm just answering GP about the likely limitations that the previous poster was complaining about.

>Maybe this is a common trick? I don't use Python; I hardly know anything about it.

I work around lambdas by naming internal functions. It's easier to read intent if I tell you what I'm trying to do.

Zaak · on May 16, 2019

Gotcha.

I do agree about homoiconicity. It's great for writing macros, and terrible for everything else. Of course, go too far in the other direction and you get perl, so... yeah.

klibertp · on May 16, 2019

I don't want to reopen this 50+ years old can of worms, but I have 2 questions:

1. Is it about homoiconicity in general or specifically s-expressions [EDIT: I see GP writes about s-exps specifically, missed it at first]? Prolog, Erlang, TCL, and Rebol (just some examples) are homoiconic, but not s-exps based. What do you think about them?

2. Do you often read code without syntax highlighting and proper indentation? Assuming the code is properly indented and colorized, what makes it so hard to read in your eyes? Take a look for example at snippets in: https://docs.racket-lang.org/quick/index.html#(part._.Local_... - what do you feel is wrong with them? Is it only the placement of parens, or is there something else?

Zaak · on May 16, 2019

1. It's homoiconicity in general. It's most obvious in arithmetic expressions, but homoiconicity (as far as I've seen) eschews semantic indicators besides function name and argument position. I find that keywords and symbols are almost indispensable for smooth eye-parsing of code.

2. No, I use syntax highlighting and indentation all the time. Trying to put words to my vague thoughts, I think the main issue is that, for example, in a let block the only indication of the meaning of all the items is the single function/macro name at the beginning of the block. Whereas in an algol-type language you have assignment operators in each item to indicate what they mean.

klibertp · on May 16, 2019

> I find that keywords and symbols are almost indispensable for smooth eye-parsing of code.

Sure, but that's a totally separate issue from homoiconicity. All the examples I mentioned are infix languages (although TCL requires you to opt-in for that):

https://github.com/adambard/learnxinyminutes-docs/blame/mast... (I couldn't find Rebol, but Red is very similar)

https://github.com/adambard/learnxinyminutes-docs/blame/mast...

(sorry for the links to blames, but it's impossible to link to a specific line otherwise)

Prolog lets you define your own operators and you have control over associativity and precedence. TCL `expr` works like `$(( 2 + 3 ))` in BASH (which also doesn't support mathematical operators normally) and by reimplementing it you can also add your own operators. Erlang also allows you to define new operators, although it's much more hassle there, as you need to write a parse transform (which is only possible to do so easily because of homoiconicity of the language). I'm not sure about Rebol/Red - I had only passing contact with it long ago.

In other words, homoiconicity itself has little to do with where the "keywords and symbols" are placed in the code, or whether they are used at all.

Accidentally, Lisps also support infix syntax just like TCL: for most Lisps you can grab a lib/package/source of the `infix` macro, which allows you to write infix arithmetic (and code in general) without problems.

> I think the main issue is that, for example, in a let block the only indication of the meaning of all the items is the single function/macro name at the beginning of the block. Whereas in an algol-type language you have assignment operators in each item to indicate what they mean.

Ok, that's one way of looking at this. On the other hand, one can also say that the repeated use of the operator where the meaning of each line is already determined (by the fact that it's inside `let` block) is useless cruft, which actually hinders comprehension by forcing you to mentally parse one element more on every single line of the assignment block. In Lisp, it's actually trivial to extend the language to include this form of `let`:

  (let
      ((a = some_expr)         ;; could be `:=` `<-` `is` instead of `=` 
       (b = some_other_expr))
    some_statements
    ...)

In Scheme, the fundamental conditional construct called `cond` even has an operator-like construct built-in

    ;; in Racket
    (cond
      [some_condition => a_function_called_on_the_result_of_condition_expr_if_its_not_false])

Yet, including this kind of additional syntax is not widespread in the community. Unless the additional syntactic element actually changes the meaning of the code, like in `cond` case, it is seen as superfluous and not needed.

Both opinions have some merit to them. How you feel about both styles is largely determined by what you're familiar with and what you're used to. In the first style, you have to learn to ignore some token in some places as they don't add any meaning to the code. In the second, you need to be careful not to mistake one kind of block for another, and you need to learn what is the relation between subexpressions in each kind of block.

Both approaches require learning. The difference is that you already learned the first one, while you're not familiar with the other. But - and that's what I really wanted to say - there's no difference between readability of the two approaches once you learn them. In other words, Lisp code is as readable for Lisp programmers as JavaScript is for JS programmers. Further, all Lisps are equally readable to (a specific) Lisp programmers, just like all Algol-like languages are readable for JS programmers.

With a bit - and I mean a bit, like a few days; if you want a language you won't find readable after half a year, go for J - of practice you could read Lisp-like code without problems. There's no inherent unreadability to either Lisp- or Algol-like syntaxes - is what I'd like to convince you to :)

draegtun · on May 19, 2019

> ... also allows you to define new operators ... I'm not sure about Rebol/Red - I had only passing contact with it long ago

You couldn't in Rebol 2. But you can in Rebol 3 (Ren/C branch) and Red.

For eg, lets take Rebol/Red `multiply` function and provide an infix operator for it (ie. replicating the `*` infix operator)...

Rebol 3 (Ren/C):

  >> multiply 2 4
  == 8
  
  >> x: enfix :multiply 
  
  >> 2 x 4
  == 8

Red:

  >> x: make op! :multiply
  == make op! [[
      "Returns the product of two values" 
      value1 [number! char! pair! tuple! ...
  
  >> 2 x 4
  == 8

fnord123 · on May 17, 2019

>I see GP writes about s-exps specifically, missed it at first

My experience is with s-expressions being a wall of text. I haven't used prolog (end erlang uses the same syntax) enough to have a beef with them. Perhaps I was being over general.

klibertp · on May 18, 2019

> My experience is with s-expressions being a wall of text.

That's sad, but unfortunately not that rare. Making a Lisp readable is not as easy as doing so in Python.

In my experience, Lisp code written by an experienced lisper who cares about readability can often be much more readable than well-written Python (assuming equal levels of proficiency in their respective languages). On the other hand, the number of ways you could destroy the readability of Lisp source is endless, they are always close by, and are even context-dependent. Beginners or people without the focus on readability use those ways rather liberally.

The situation is the exact opposite: Python defines the lower bound on the code readability ("Readability counts" -> PEP-8 -> linters -> (lately) `black`) while Lisps, in general, don't even have an authoritative PEP-8 equivalent.

On the other hand, Python also has an upper bound on how readable it can be: its syntax is rich, but if you find yourself in a place where it's not rich enough, you're on your own. You could just use the expressive power of the language to hijack the syntax and beat it into shape better suited for your problem, but it will be almost certainly seen as un-Pythonic.

On this point, Lisps have a huge advantage, because you can change the language parser on the fly easily, and you can add whatever syntax sugar you need in a couple of lines of a macro. In other words, Lisps give programmers tools for making their code as readable as they want (and are able to) while at the same time allowing them to write "walls of text" (and honestly, that'd be a very polite way of describing some of the Lisp code I've seen) which - in readability - could be one of the worst among many languages I've seen.

So, what I want to say here is that it's possible - and not that hard - to write readable Lisp code. Unfortunately, a programmer has to both think of readability when writing and have skills to make their ideas on readability into reality.

In effect, yes, there's a lot of Lisp code which is hard to ingest. Some style guides are there or are being created, paredit helps a lot, I'm not aware of any linters yet, but they should start appearing at some point. On the other hand, Lisp code skillfully crafted for readability is rivals (and sometimes surpasses) Python at its best.

I'm not sure what Lisp code you've seen, but I assume it was all of the former kind. This is unfortunate. Without knowing what was it you were reading/working with it's hard to recommend anything, but I found examples in "How to Design Programs" quite readable: https://htdp.org/2019-02-24/part_five.html and there are also other books and Open Source projects with code worth reading, but I'd have to dig through my bookmarks, which I don't have the time for right now, sorry :(

TLDR: Lisps - Schemes, Racket, CL, PicoLisp, Emacs and TXR Lisps to name a few - can be used to write astonishingly readable and to-the-point code, but the languages do absolutely nothing to discourage using them to write the most unreadable mess under the heavens. As for the reasons for this - I've honestly no idea at all.

kazinator · on May 18, 2019

> Python defines the lower bound on the code readability

Ignoring ;... comments for a moment, if we squash a Lisp program into one line and remove all non-essential whitespace, it's possible to recover it into nicely formatted code by machine, more or less.

> TLDR: Lisps - Schemes, Racket, CL, PicoLisp, Emacs and TXR Lisps to name a few - can be used to write astonishingly readable and to-the-point code, but the languages do absolutely nothing to discourage using them to write the most unreadable mess under the heavens. As for the reasons for this - I've honestly no idea at all.

This view is unbalanced without noting that Javascript, Rust, C, Perl, Java, Scala, Go, Kotlin, ... and a large number of other languages, have the flexible formatting that allows for unreadable code. Ruby, anyone? https://github.com/mame/quine-relay/blob/master/QR.rb

> As for the reasons for this - I've honestly no idea at all.

Bad formatting is a bug that is fixable in the actual code (greatly assisted by automation) and a minor social problem in programming that is treatable with education and experience. Therefore, it is nearly a non-issue.

klibertp · on May 18, 2019

You're right, of course, on all points (BTW, WTF is with the downvotes??), but they are technicalities of interest to lispers. I omitted these because I wanted to present a convincing argument that fnord123 simply had bad luck and encountered bad Lisp code. And that there's a lot of good, readable code written in s-exps out there.

The resistance to s-exps in the general population of programmers is bad "news" (if something 50 years old can be called that...) for Lisps. It's hard to fight it in general terms. Pointing out that C, JS, PERL, etc. are often much worse in terms of readability - while obviously true - doesn't really help in convincing someone to look at s-exps differently. This is why I chose Python for comparison and tried to present a positive argument, saying that you can write code "even more readable than Python at its best" in Lisp.

I ignored automatic formatting because it's not part of the language, but of tooling. The problem with tooling is that not everyone uses it. I've had a "pleasure" of working with a 50+ kloc Clojure code base written mostly by C programmers who didn't know or care about formatting tools - honestly, it was a nightmare. Of course, each file could be automatically reformatted into something sensible, but the fact that it was written the way it was and the language did nothing to prevent that still stands. In Python, you at least would get the indentation right.

Readability is a hard problem in general. You're right that it's also a matter of education in the community. You're right that it's almost negligible a problem for lispers themselves, as they know how to reformat the code automatically with a single key press. It is a problem, though, for people who come into contact with Lisp code for the first time. I wanted to convince fnord123 that it's not the syntax itself, but rather how it is used that's a problem - like with every other kind of syntax out there, by the way. I'd be extremely happy if he reconsidered and tried to read some of the better-written s-exps based code.

kazinator · on May 18, 2019

In Lisp languages, we don't use pure S-exps for everything; we have notations. We have 'X instead of (quote X), `(,A ,@B) instead of (list* 'A B), and numerous # notations.

In the area of arithmetic, although the basic operators are functions invoked using (f arg ...), we give them short names like +, -, * and /. Why? The obvious reason is that we would find it irksome to be writing (add ...) and (mul ...).

Lisp can have notations, and they can be had without disturbing the Lisp syntax. Notations that are related to major program organization have payoff.

In TXR Lisp there is relatively small set of new notations, which all have correspondence to S-exp forms, the same way that 'X corresponds to (quote x).

  ;; slot access
  obj.x.y.z  -->  (qref x y z)

Of course, people are going to prefer this to something like:

  (slot-value (slot-value x 'y) 'z)

Then:

  ;; unbound slot access
  .x.y.z --> (uref x y z)

  ;; method call
  obj.x.(f a b)  --> (qref x (f a b))

  ;; x.f(blah).g(foo).xyzzy(x, y) pattern:

  x.(f blah).(g foo).(xyzzy x y)  ;; looks like this

  ;; sequence indexing, function calls (the "DWIM" operator)
  [array i]  --> (dwim array i)
  [f x y]    --> (dwim f x y)

  ;; ranges:
  a..b  --> (rcons a b)

  ;; slice
  [str 0..3] --> [dwim str (rcons 0 3)]

  ;; Python-like negative indexing:

  [str -4..:] --> [dwim str (rcons -4 :)] ;; : means "default value: one index past end sequence (its length)".

  ;; quasistrings -- recently appeared JavaScript in strikingly similar form!
  `@a @b ...` --> (sys:quasi @a " " @b) --> (sys:quasi (sys:var a) " " (sys:var b))

  ;; word list literals
  #"a b c"  --> ("a" "b" "c")

  ;; quasi word list literals
  #`a @b c`  --> (sys:quasilist `a` `@b` `c`)

Some Lisp syntax is streamlined:

   (lambda (a b c : x y  . r) ...)  ;; a b c required, x y optional, r rest

Dot notation allowed without preceding atom:

   (. x)  ->  x

Improper lists can be function calls:

   ;; TXR: verbosity-free wrapping of a function.
   (defun wrapper (. args)
     (wrapped . args))

   ;; CL:
   (defun wrapper (&rest args)
     (apply #'wrapped args))

This works even if the thing in the dot position is a symbol macro expanding to a compound form. The reason is that the code walker/expander will recognize and transform (func ... . rest) into (sys:apply (fun func) ... rest) first, and then expand macros. (I.e. we can't work this into existing Lisps like CL implementations without going down to that level.)

   ;; the : symbol -- symbol named "" in keyword package:
   ;; used as a "third boolean" in various places

   (func 1 2 : 4)  ;; use default value for optional arg, pass 4 for the next one
                   ;; diminishes need for keyword args


   ;; built-in regex syntax
   #/a.*b/

   ;; C-like character escapes
   "\t blah \x1F3 ... \e[32a"

   ;; multi-line strings with leading whitespace control:

   "Four sc \
    ore
   \ and seven years ago"  -> "Four score and seven years ago"


   ;; Simple commenting-out of object with #;

   #; (this is 
         commented out)

Also, there is no programmable reader in TXR Lisp; no reader macros. I'm not a big fan of reader macros. They are only useful for winning "I can have any damn syntax in my language" arguments. Problem is, the whole territory of "any damn syntax" is a wasteland of bad syntax, nt to mention mutually incompatible syntax.

klibertp · on May 18, 2019

Ah, yeah, I forgot to mention this explicitly earlier, although I had this in mind when I wrote "s-exps based" syntaxes and when I wrote this part:

> and you can add whatever syntax sugar you need in a couple of lines of a macro. In other words, Lisps give programmers tools for making their code as readable as they want (and are able to)

But, I didn't provide any examples, so big thanks for filling in this gap! :)

---

> The obvious reason is that we would find it irksome to be writing (add ...) and (mul ...).

Yet, in Scheme, we have `add1` and `sub1`, `expt` instead of `^` and so on. The former two are written `1+` and `1-`, respectively, in Emacs Lisp and Common Lisp. What I want to say is that evidently there are people who don't mind the longer names ;) (Personally, I don't mind either, I think)

> Lisp can have notations, and they can be had without disturbing the Lisp syntax.

Plus, you can introduce them without modifying the language implementation, ie. you can add notations to Lisp even if you're just a language user and not the language developer. This is impossible in 99% of other languages.

> Notations that are related to major program organization have payoff.

I'm seriously wondering if I could fit this on a T-shirt! :) It's very true, and when I wrote about "incredibly readable and to the point" code I was thinking exactly about crafting and using special notations and internal DSLs. The payoff of a well thought-out notation can be tremendous.

> In TXR Lisp there is relatively small set of new notations

I don't know about that :D At least TXR Lisp has more syntactic conveniences than Scheme, CL, raw Racket and Emacs Lisp. I think only Clojure comes close - at least object slots' access is written in a similar way. But I think there's no syntactic sugar for slices, nor for string interpolation in Clojure.

The syntax for "rest args" is similar in Schemes - `(define (f . args) ...)` - but they don't have a convenient way of calling a function which accepts a variable number of arguments, you need to use `apply` for that.

> (I.e. we can't work this into existing Lisps like CL implementations without going down to that level.)

So, if I understand correctly, it would be possible to add it to CL or Scheme via reader macros or reader extension respectively? Although I can't imagine the specific implementation right now, I think reader macros/extensions are executed during parsing, before the code is expanded, so it should work? With some caveats of course (some more tokens would be needed to identify where the `.` should be treated specially, I think).

Either way, use of the dot/improper lists for applying variadic functions is great, I love the symmetry between definition and application here. Makes me wonder why isn't this handled like that in other Lisps? It seems so natural once you see it...

Ah, regarding the slices, I, of course, have to ask - what is `dwim` and how is it implemented? If my guess is correct, `dwim` stands for "Do What I Mean", which would certainly fit in this case :D But it seems to work on everything collection-like. In Racket there's a sequence protocol you could use here (used to great effect in data/collection package[1]), in CL it would be a simple generic method with implementation for various kinds of collections, but how is it done in TXR Lisp?

---

Anyway - if there's ever anyone other than us reading this thread ;) - the TLDR would be that Lisps can, and do, have notations above and beyond bare s-exps. These notations tend to have a deep impact on both convenience during writing and on readability later. There are differences between Lisps in terms of support for additional notations, but you can introduce them (aka. steal from other Lisps ;)) yourself if you find a particular extension to be useful in your situation. An example of this would be the "threading macros", `->` and `->>`, which are now available in every Lisp out there, even though they were first implemented in Clojure only (I think - was there "prior art" somewhere else, maybe?) Anyway, that's part of what makes Lisps so incredibly powerful!

---

EDIT: reader macros. I understand your position, and most of the notations can be introduced as normal macros anyway, but... without them you're f+++ed if there's some notation you'd like to have, but which cannot be introduced without modifying the reader. Yes, compared to the number of possible syntax extensions the number of useful syntax extensions is frighteningly small. Yes, it's a major way of introducing incompatibilities. Clojure & Rich Hickey also excluded reader macros, adding to the previous argument one of security, ie. Clojure, when used as a data interchange format, should not have behave differently in different environments because of loaded reader extensions, plus it's rather scary that a simple `read` can format your disk, kill your cat, steal your car and so on. I agree that in most cases they should not be used... but. That "but" still remains in my head :)

An example: there's a package for CL which allows you to alias module names. Writing them long-hand over and over again is less than ideal, to be honest. Due to how CL module system works it's impossible to achieve such aliasing with normal macros, you have to drop one level below. It certainly introduces an incompatibility, but it made writing code so much more pleasant!

So anyway: TXR Lisp and more or less Clojure have a lot of notations built-in and maybe this is enough. But some other Lisps, without reader macros, would be stuck in the '80s as far as notational convenience go. So, to me, even with all the downsides (and in the context of said Lisps), reader macros are an important tool which allows for language evolution and improvement over the years.

---

[1] https://docs.racket-lang.org/collections/collections-example...

kazinator · on May 20, 2019

Regarding 1+, add1 and so on, I called these succ and pred in TXR Lisp. This is inspired by a an operator keyword in Pascal. There are also ssucc sssucc and sssucc as well as ppred, ppred and pppred. That is inspired by our caddr friends, and also by Douglas Hofstadter's ..SSS0. Now (+ 1 x) is one character shorter than (succ x), and likely clearer to more people. But, succ can be used as a higher order function without requiring partial application of the 1. Where we would have (op + 1) we can just use succ.

Can we have (func arg . rest) using just read macros in CL? It's not so simple because we cannot blindly hijack the parenthesis and turn every dot notation into apply. It has to be the expander recognizing it. The TXR Lisp expander knows that it's dealing with a function call, because the symbol in the CAR position isn't a macro or special operator. Only then does it check for an improper list, and do the "dot apply transform". So this has to be worked into the code expander. (Thus if we work portably over top of CL, we are writing our own code walker and not relying on the CL one.)

> But some other Lisps, without reader macros, would be stuck in the '80s as far as notational convenience go.

Right; that's because all the usual notational conveniences like quote, backquote and whatnot predate the 1980's by quite a bit, and there hasn't been anything new.

> An example: there's a package for CL which allows you to alias module names.

Are you talking about package local nicknames? That's an example of something being rolled into implementations. If everyone was on board in integrating the feature, programs wouldn't have to hack this.

Though it requires hooking into the reader, I don't believe that the feature changes syntax. If foo is a package local nickname, then foo:whatever is the same syntax.

So at least this will not break editing support.

Most uses of read macros break editor support. Syntax coloring and other features break.

Read macros lack print support. For instance, if the 'X syntax weren't built in, it couldn't be produced with read macros completely. The input notation can be produced with read macros; however, Lisps also print the (quote X) object as 'X.

Almost all the junk I put into TXR Lisp prints, e.g.:

  1> '(qref a b (c) d)
  a.b.(c).d
  2> '(rcons a b)
  a..b

Even the singleton consing dot, at least if in a lambda:

  3> '(lambda x y)
  (lambda (. x)
    y)
  4> '(dwim a b c)
  [a b c]

> I love the symmetry between definition and application here

It's just good old "declaration follows use":

   void (*func)(int);

   (*func)(42);

:)

> The syntax for "rest args" is similar in Schemes - `(define (f . args) ...)`

But I don't think they have (lambda (. args) ...). That's lateral thinking. Why not have a consing dot with nothing in front of it such that (. x) is just x?

Here is the intellectual precedent for it, right in ANSI Lisp:

  $ clisp -q
  [1]> (append '(1) 2)
  (1 . 2)
  [2]> (append 2)
  2
  [3]> (list* 1 2)
  (1 . 2)
  [4]> (list* 2)
  2

I.e. the terminating atom of an inproper list is itself an "empty improper list", identical to that atom itself, just like an empty proper list is equivalent to the atom nil.

Hence if () is the empty list, that being the terminator nil, then (. 3) is the empty improper list terminated by 3, which is just the terminator 3 itself.

> what is `dwim` and how is it implemented?

dwim is implemented half trivially, half not-so-trivially. The basic idea is that it is just a placeholder which displaces the operator to the second position of the form, where it is consequently evaluated as an expression, allowing for Lisp-1 style invocation without a funcall symbol. Well, of course dwim is that symbol, but it's hidden behind the [ ] notation.

The not-so-trivial part is that, dwim also applies a special treatment to those of its arguments which happen to be symbols. These are resolved in a single namespace that folds together variable and function bindings. This allows us to do things like [mapcar list '(1 2 3)], even though we are in what is fundamentally a Lisp-2. Why this is not-so-trivial is that it's deeply ingrained into the expander, evaluator and compiler, which understand the conflated namespace and implement it properly (in the face of lexical functions, macros, symbol macros and such).

All the semantics under dwim for objects being callable as functions is actually implemented in the function call mechanism, not in dwim.

By brief example:

  1> ["abcd" 2..3]
  "c"
  2> (call "abcd" 2..3) ;; no "dwim" here at all
  "c"

Simply, the semantics of calling a sequence with a single argument which is a range object is to perform a subsequence extraction.

Dwim differs from call in that it is an assignment place:

  1> (let ((c (copy "abcd")))
        (set [c 1..3] "X")
        c)
  "aXd"

But this is not any sort of magic; it's played out with macros and some run-time-support:

  2> (expand '(let ((c (copy "abcd")))
        (set [c 1..3] "X")
        c))
  (let ((c (copy "abcd")))
    (let ((#:g0073 c)
          (#:g0075 #R(1 3)))
      (sys:setq c (sys:dwim-set t #:g0073 #:g0075
                                "X"))
      "X")
    c)

There is a special sys:dwim-set run-time function which contains the range assignment semantics. It is allowed to mutate the object, but that's not always possible; therefore, the return value must be captured. The place expander framework takes care of it, generating the sys:setq assignment back to the c variable, just in case a new object is required. (The thing that newbie Lisp programmers forget to do when using functions like remove-if or append!)

flavio81 · on May 17, 2019

>I do agree about homoiconicity. It's great for writing macros, and terrible for everything else.

I don't know; i've written code in C, C++, C#, Java, Python, Ruby, Pascal, Delphi, Assembler x86, TCL, Javascript and Common Lisp. Lisp codebases are the cleanest and clearest i've seen by far, although ReasonML/SML/OCaml might be as clean too.

dataangel · on May 16, 2019

Racket also has the GIL though. It's a problem in every dynamic language I've looked at. Last I checked it was true for Ruby too.

flavio81 · on May 17, 2019

Common Lisp implementations often have no GIL, and, btw, are able to reach C speeds in certain cases.

srean · on May 17, 2019

Tcl does not have GIL and is possibly one of the most dynamic languages that has seen nontrivial use. Guile does not have GIL. Running multiple interpreterd in different threads is a common notion in Tcl.

blacksqr · on May 21, 2019

Running multiple sub-interpreters in different threads is easy in Tcl, but it doesn't stop there. If you want, you can run multiple threads in a single interpreter, using locks mutexes, etc., if you're inclined to take on the attendant complexity and risks.

Ability to run a separate interpreter per thread was designed as a handy simplification, so you could sidestep the usual complexity of thread programming if you wanted.

klibertp · on May 17, 2019

That's true about Racket, but it has a quite original way of dealing with it with their futures[1].

To be honest, I'm not sure about the details, but it should let other threads run truly in parallel as long as it's "safe" to do on the VM implementation level. So, if the code inside the future doesn't perform any "future unsafe" operations, it can execute within a separate OS thread without worrying about the main thread.

Examples of "future unsafe" actions were given as memory allocation, and JIT compilation. Further, it's mentioned that some simple (for the language users, at least) operations may be too complex internally to be "future safe". An example of this is using a generic number comparison operators - `<`, `>`, etc. Apparently, these have to handle the full numeric tower of Racket and in the process perform some future unsafe operations.

In the Mandelbrot function given as an example in the guide, simply replacing the generic comparisons with the ones specialized for work on floats specifically (and assuming that contract is not broken, which would immediately stop the future) allows the future to execute fully in parallel.

What is important to note here is that `set!` and friends, and so mutation of shared memory, is considered "future safe", ie. it's permitted to use them! (although then it's you who deals with the usual problems that brings).

I think it's worth mentioning here, because it's a novel strategy that seems to be between the two usual solutions (1. we've got GIL, live with it; 2. spawn more processes and get them to work - well, now you have many GILs...) and is showing some promising results. Plus they have a neat visualization tool!

Currently, it's limited and works best for purely numerical computations (which is also where you'd need it 99% of the time), but in some cases, it appears to work: the programmers of the language (not the implementation of the language) are given a tool to work outside the GIL in a structured manner plus a tool for closely inspecting low-level operations that happen in their code which would suspend or stop the future.

I'm not aware of any other dynamic or not language which has both the GIL and a nice, language-level tool for freeing it and running in parallel. Because what is considered "future unsafe" depends on the details of the implementation, I'm full of hopes for Racket-on-Chez, although I think I read somewhere that work on futures is not a priority at this time.

Also, to confirm the sibling comment, SBCL is happy to spawn truly parallel threads. There are other Scheme implementations (I think Chicken at least, but not sure right now) who allow the same.

[1] https://docs.racket-lang.org/guide/parallelism.html#%28part....

https://docs.racket-lang.org/reference/futures.html

BoorishBears · on May 16, 2019

null and undefined being separate values

notafraudster · on May 16, 2019

This seemed interesting, but when I went through the "Accepted Stack Overflow" links on the main page, I thought "how would I do this in an R tidyverse stack?" and set the goal that my responses should be shorter, clearer, or ideally both, and that I would favour clearer answers to code golf, except that when posting to HN I collapse the code into a single line while in R there would be linebreaks at each semicolon or after each pipe operator (%>%). Here are three examples below:

"Customized sort based on multiple columns of CSV". In R, something like this: `library(tidyverse); read_delim("file.tsv", delim = "@") %>% arrange(.[[2]]) %>% group_by(.[[2]]) %>% arrange(match(.[[3]], c("arch.", "var." "ver.", "anci.", "fam.")), .[[3]]) %>% group_by(.[[2]], .[[3]]) %>% mutate(n = n()) %>% arrange(desc(n)) %>% ungroup() %>% select(1:4)`

"Extract text from HTML table". In R, something like this would suffice: `library(rvest); library(tidyverse); read_html(URL_GOES_HERE) %>% html_nodes("div.scoreTableArea") %>% html_table() %>% write_delim("out.csv", delim = "\t")`

"Get n-th Field of Each Create Referring to Another File". In R: `library(tidyverse); file1 = read_delim("file1.txt", delim = " ", col_names = FALSE); chunks = readChar("file2.txt", 999999) %>% str_split(";") %>% unlist() %>% map(function(x) { matches = str_match(str_trim(x), '^create table "(.)"([^(])\$((.|\n)*)\$$'); title = matches[, 2]; fields = matches[, 4] %>% str_split(",") %>% unlist() %>% str_trim(); return(tibble(table_name = rep(title, length(fields)), n = 1:length(fields), field = fields)) }) %>% bind_rows(); file1 %>% left_join(chunks, by = c("X1" = "table_name", "X2" = "n"))`

The third example trades off a little clarity for a little robustness by adding a regex instead of assuming the SQL table definition is one field per line.

kazinator · on May 16, 2019

There is no HTML parsing library in TXR, yet the code still looks good.

TXR Lisp has support for that type of functional transformation of structured data, with fairly tidy syntax. If a need for a full blown HTML parsing library arises, someone will come up with one; maybe me. It could end up integrated into the TXR flex/Yacc parser, which would make it fast.

In the "Get n-th Field" task, what we can do is snarf the data as a string, then remove all the commas and semicolons. It then parses as a TXR Lisp with the lisp-parse function, resulting in this:

  (create table (qref "def" something)
   (f01 char (10) f02 char (10) f03 char (10) f04 date)
   create table (qref "abc" something)
   (x01 char (10) x02 char (1) x03 char (10))
   create table (qref "ghi" something)
   (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05 char (10)))

That seems to open an avenue to a solution. E.g. we can now partition it into pieces that start with the create symbol:

  28> (partition *26 (op where (op eq 'create)))
  ((create table (qref "def" something) (f01 char (10) f02 char (10) f03 char (10) f04 date))
   (create table (qref "abc" something) (x01 char (10) x02 char (1) x03 char (10)))
   (create table (qref "ghi" something) (z01 char (10) z02 intr (10) z03 double (10) z04 char (10) z05
                                         char (10))))

Now the (qref "def" something) parts are in fixed positions, followed by fixed-shape triplets.

Only problem with this type of solution is that it takes the example data too literally. The user's actual data might not cleanly parse this way.

kazinator · on May 18, 2019

> when posting to HN I collapse the code into a single line

If you just put two spaces of indentationo on every line, you get a verbatim block in typewriter font,

  like
  this.

anentropic · on May 16, 2019

> The PDF rendition of the reference manual, which takes the form of a large Unix man page, is over 600 pages long, with no index or table of contents. There are many ways to solve a given data processing problem with TXR.

"Good luck, you're on your own!"

kazinator · on May 16, 2019

The "no index or TOC" isn't being touted as a feature, just that the page count is that without these (in documents like these, these features can contribute dozens to the page count). An index would be nice; patches welcome!

The HTML version that most people would be using has a TOC with two-way navigation to the section headings and is hyperlinked. Of course, man page reading allows easy searching.

Jach · on May 16, 2019

I guess threads like this remind me why it's nice to have professional doc writers review my customer-facing text at work. ;) Congrats on your project getting some more attention! If you'll indulge a bit of bikeshedding, this particular miscommunication could probably be avoided in the future by changing the sentence to the short "The PDF rendition of the reference manual is over 600 pages long." Even if you add extra things to the PDF later the statement won't be incorrect and so you won't have to deal with nitpickers coming by next time with a comment like "But if you remove the index it's only 597 pages!"

Another edit preserving more of the original would be to replace the final "with no" with something like "even excluding any"...

kazinator · on May 16, 2019

Thanks; I fixed that.

nn3 · on May 16, 2019

I've learned/used basic TXR some time ago. I had a text parsing problem that needed backtracing, and it seemed simpler to use TXR than to use implement this in python or perl.

Basic TXR matching is really quite simple. Match some patterns, generate a report at the end. The patterns are interleaved with the matching text, so it's more like a more powerful version of regexprs (but far more readable), than a normal programing language.

You can learn it quickly based on the provided examples.

It's just a few straight forward commands, although you have to wrap your mind how the backtracing parser works.

Most of the manual is about the LISP. I never used that part and I don't think it's really needed for 95+% of all text parsing/summarizing.

cgio · on May 16, 2019

Well the HTML version has contents. 600 pages of documentation and with the information density I see in a quick skim would not imply a “you are on your own” mentality to me.

ilovetux · on May 16, 2019

This. I have never seen a programming language brag about being inaccessible and having bad documentation.

oddity · on May 16, 2019

I read it as honesty and not bragging. Few people set out to create a inaccessible language with bad documentation, but given enough time and users, most languages become one. I'd prefer language maintainers and users have enough self-awareness to not believe it is still the elegant and simple language of 20 years ago.

Edit: 10 years ago in this case.

js8 · on May 16, 2019

It would be interesting to have a DSL for data munging, but I am afraid TXR is not it. My requirements would be that the language should be functional and total.

Most transformations that we do on data do not require Turing completeness or recursion. I think it would be useful to write these down in a language with semantics that is easy to analyze.

kazinator · on May 16, 2019

The funny thing is, I originally didn't intend the TXR pattern language to be recursive. It needed functional decomposition (pattern functions) to break up a big pattern match into simpler units. When those were implemented, I realized after the fact, hey we have a push-down automaton that can now grok recursive grammars.

I don't see why we would want to rule out a pattern function invoking itself (directly, or through intermediaries); if that hurts, then just don't do that.

(Though I understand that there are languages deliberately designed without unbounded loops or recursion, for justifiable reasons.)

js8 · on May 17, 2019

I found in practice that arbitrary recursion depth is (even on languages with formal recursive grammar) very rarely needed. And where it's needed it can probably be implemented as a primitive in the language (map total function over all the nodes) that can do a similar thing.

srean · on May 16, 2019

Then I think you will like https://tkatchev.bitbucket.io/tab/index.html

"It's statically-typed and type-infered.

It also infers memory consumption and guarantees O(n) memory use.

It is designed for concise one-liner computations right in the shell prompt.

It features both a mathematics library and a set of data slicing and aggregation primitives.

It is faster than all other interpreted languages with a similar scope. (Perl, Python, awk, ...)

It is not Turing-complete. (But can compute virtually anything nonetheless.)

It is self-contained: distributed as a single statically linked binary and nothing else.

It has no platform dependencies."

I am a little suspicous that you may be the author ;)

js8 · on May 17, 2019

Looks very interesting, but I am not the author.

anewhnaccount2 · on May 16, 2019

So XSLT then?

vidarh · on May 16, 2019

XSLT is Turing complete with the usual caveats about memory. Given its complexity it'd be very unlikely for it not to be, but there's clear proof too: someone has implemented a universal Turing machine in it.

cstross · on May 16, 2019

From where I'm standing this looks like someone put a lot of effort into re-inventing Perl, minus the documentation and user community.

TuringTest · on May 16, 2019

I've not studied this language yet, but if its syntax is in any way saner, that would still be a net gain.

nabla9 · on May 16, 2019

Removing the perversion is a good goal IMHO.

(PERL = Perversion Excused by Random Lispiness)

usgroup · on May 16, 2019

I ashamedly had never heard of this before. Could anyone add any colour RE:

1. Parsimony.

2. Performance vs awk and friends.

3. Multi threading.

4. Ideal use cases.

nn3 · on May 16, 2019

4. My use case was: If you have a some what fuzzy parsing problem that is harder than a single regexpr and needs backtracing, and then generate a report from it.

For these things TXR is great.

If you want to do multi threading or best performance it's probably not the thing to use.

uptownfunk · on May 16, 2019

We already have this, it is R with tidyverse. What we need is a fully baked transpiler from R/tidyverse to sql.

crispyambulance · on May 16, 2019

Yep. Seriously. R w/tidyverse is a ridiculously powerful data wrangling tool especially when dealing with text files.

I tend use Notepad++ when starting out on a data-wrangling adventure. It has an uncanny ability, unlike any other editor, to open hundreds of files at the same time and to perform regex operations on all of them without dropping dead. I uses Notepad++ for initial manual exploration to get the lay of the problem, and then switch to R for the actual analysis.

taeric · on May 16, 2019

The irony, of course, is that txr predates tidyverse.

flavio81 · on May 17, 2019

>I tend use Notepad++

I assume, then, that your file sizes are not so big. N++ is not good with big (>25% of your ram) file sizes, refusing to open them.

Is R/tidyverse also limited on the size of the file it can handle? In my job i routinely work with up to 100GB files.

crispyambulance · on May 17, 2019

I guess it depends on what your definition of "big" is, I've never had to deal with 100GB files!

mcguire · on May 16, 2019

Confusingly, there's another language called TXL (https://en.wikipedia.org/wiki/TXL_(programming_language)) that's both obscure and neat.

theon144 · on May 16, 2019

Well, this looks great, but I'm not about to start digesting the self-admitted 600-page tome just to see if it's worth learning for the tasks I encounter - surely there's a "tutorial" somewhere?

TuringTest · on May 16, 2019

This page is quite explanatory:

http://www.nongnu.org/txr/txr-pattern-language.html

tux1968 · on May 16, 2019

Way off topic, but as someone who has recently switched to using a non-standard background color in my browser... that page is horrendous to read:

https://i.imgur.com/pvCnmSa.png

I can accept that doing something non-standard leads to some rough edges like this, but i'm not sure how many web developers know this is an issue. At least it has surprised me how many websites have this issue of assuming the default color is bright white.

kazinator · on May 16, 2019

Hi; try it now! I GIMP-ed the image such that the non-transparent pixels are pure red, and only slightly opaque, instead of 100% opaque pinkish white. It looks about the same on a white background. Thanks, again.

I tested it with a lightly grey background, as well as heavy gray.

This little experiment really made me notice HN's hard-coded light grey background box, BTW.

tux1968 · on May 16, 2019

Yeah, looks great here. Really didn't expect you to dig into it at all let alone so quickly. Thanks very much :-)

kazinator · on May 16, 2019

Good heads up. This background (if it is to exist at all) should be done properly as an alpha blend, not as a transparency with opaque off-white pixels, so that it works with various backgrounds. I will look into it.

mark_l_watson · on May 16, 2019

Interesting lisp’y language. Off topic, but I find the domain name nongnu.org to be amusing for a GNU/FSF web site. “nongnu” to me reads as “not gnu”

buckminster · on May 16, 2019

Exactly. It's GNU hosting for non-GNU projects.

mark_l_watson · on May 16, 2019

Oh, that makes sense. thanks!

kazinator · on May 16, 2019

Of course, Non-GNU projects that are licensed in such a way that they could be GNU projects.

From the registration page, the kind of software project that can be hosted on Savannah is [a] free software package that can run on a completely free operating system, without depending on any nonfree software. You can only provide versions for nonfree operating systems if you also provide free operating systems versions with the same or more functionalities. Large software distributions are not allowed; they should be split into separate projects.

jdmoreira · on May 16, 2019

Very interesting. I'm wondering why they didn't implement the Lisp version on top of CL with macros

kazinator · on May 16, 2019

I can summarize this as follows. TXR is my research platform into various topics, including many Lisp topics. It contains numerous innovations. As a whole, that requires working at the implementation level, ground up.

flavio81 · on May 17, 2019

Thanks Kaz! I had the same question.

vcdimension · on May 16, 2019

Has anyone run any benchmarks of TXR against awk, R, python, or miller?