Hacker News new | past | comments | ask | show | jobs | submit login
Reimplementing TeX's Algorithms: Looking Back at Thirty Years of Programming (infoq.com)
91 points by sdesimone on Jan 12, 2015 | hide | past | favorite | 65 comments



"According to Glenn, it is very hard to figure out what TeX code is doing, mostly due to its terseness and extreme optimization"

This quote surprises me. The TeX code is extremely well documented compared to just about any other piece of source code I have seen.

The literate programming style never caught on but I think you see the beauty of it in the TeX source code. If its hard to read due to the mix of code and comment it can be turned into a book. Or into a compiled program.

I am also surprised that optimized Pascal from decades ago, that compiles on a wide variety of different hardware and software platforms outperforms a modern re implementation.

The point being that if it was optimized for a specific cpu on a specific OS, making it really fast makes sense. But when writing code for "some" or in reality any (ok that isnt quite true but TeX runs on a lot of different hardware) cpu, and any operating system optimizing code is harder.

It does get compiled by a C backend these days though.


For a better sense of what he means, check out the video go get more detail about some of the complexities.

As a counter-point, a codebase can be well documented and yet be hard to understand if the abstractions are not meaningful to the reader. I can say this without attaching any normative judgment to the quality of the code.

If I recall correctly, Glenn found it difficult to reason about the code because many parts were tangled in complex ways. For example, subsequent processing steps often triggered previous steps to repeat in surprising ways.

To speculate a little bit, these parts of the code may be well-documented down in the weeds, but they could still feel non-intuitive in the broader context if they didn't feel consistent or non-surprising.


I find it a sad reflection on the "thirty years of programming" that we are a long way from making anything as a) stable, b) bug free, c) fast, or d) readable as TeX.

The entire pedagogical point of this exercise seems to have completely missed its goal, but then a hand is waved and we all learned something valuable anyway. But what? That for an industry that hates "technological debt," we are highly indebted into those that wrote our platforms for all of the points listed above.


I see this claim a lot, but it's simply not true. TeX is a relic of another age, frozen in time.

a) TeX isn't stable, it's dead. It can't generate PDF, or PostScript, or HTML. It can't handle Unicode, it can't use OpenType, TrueType, or even Type 1 fonts. Without the need to conform to external standards, it's easy to declare a defect to be a feature. Everything which makes TeX usable in the modern world is a third party program, there's an entire "life support" ecosystem written by others, with stability issues, and bugs, and worts in abundance. TeX is by definition "stable" because it's maintainer does not wish to develop it further - and that's not good for TeX, which has now fragmented in to various competing successors.

b) Much of the TeX ecosystem is buggy, CTAN packages frequently conflict with each other, in ways which are very challenging to resolve. The minimal and low-level nature of TeX's macro language is the root cause of such bugs, because it's hard to write macros correctly, and even harder to debug them. By keeping the core of TeX unchanged and minimal, the responsibility for dealing with complexity is forced onto macro package authors.

c) As for the claim that it is fast, that is questionable. TeX is needlessly slow at resolving cross-references, requiring documents to be re-typeset up to three times in a row. It also spends a lot of time doing I/O on its many macro files - the end result is waiting 10sec or more to typeset a file which should have taken under half a second.

d) TeX is written in Knuth's own private language, WEB, which makes it very hard to read. WEB uses a complex system of macros which is highly unconventional by modern standards. The code is highly non-linear, with macros jumping all over the place. Everything is a global variable. No amount of explaining or literate programming can make this mess "readable". It requires the utmost effort and study to extract meaning from - as the OP explains so well.

If you want to praise any open source project of the past 30 years, I'd suggest looking at the Linux kernel. It's alive and well, for a start.


If we are just allowed to say things are true by adding the adjective simply, then I will counter with it is simply true. :) To the points, though.

a) I actually mostly agree with this concern, with the caveat that I have fewer problems producing pdfs of documents from TeX than I do any other tool I have ever had the joy of using for the same purpose. I will also note that even just treating TeX as a "core" for LaTeX, it is still a good example of stability. Specifically in the, "I can still typeset any LaTeX document I have." This is far from true for opening pretty much any other file I have. Period. Hell, even just compiling old c programs is less guaranteed.

b) Applications have bugs. Pretty much period. While the overall ecosystem has unforeseen bugs, especially when you mix and match different contributions, I still know of pretty much no examples of a core as large as TeX that is essentially bug free. Care to provide any examples? And, to my point, I would wager there are more bugs in this clojure rewrite than there are in the original.

c) This is pretty much only claimed because of the failure of the clojure version to achieve similar speeds. I mean, if we are fine with setting contrived expectations ("should have taken under half a second"), then I don't understand why I have to wait at all for a fully typeset product to emerge. In the meantime, I have yet to see a product that competes in speed. Tons of claims for those that should be able to, but few actual examples.

d) Is it a language you are familiar with? Obviously not. Does it require effort and study to extract meaning? I'd actually agree that it does. I fail to see how that is a detriment, though. I have yet to see any solution to any non-trivial problem that does not require the same.

Specifically to the complaint of non-linearity of the code. Code is pretty heavily understood in a non-linear fashion. Same for many other topics, actually. One of the things I actually admire about WEB is that it is non-linear and allows the author to introduce a narrative into the process. Seriously, this is hugely beneficial once it is understood.


> Applications have bugs. Pretty much period. [...] I still know of pretty much no examples of a core as large as TeX that is essentially bug free. Care to provide any examples?

Linux. You can run still run a statically linked executable from 20 years ago. Of course, Linux today has many bugs, but if we use TeX's definition of stability we are only allowed to use the features we had 20 years ago.

qmail. Bernstein's bet is still open.

sh, ed, vi, grep, sed, and more of the default UNIX tools provided you restrict them to their POSIX functionality (no Unicode etc).


sh, ed, vi, grep, sed and such are all good examples. But, much smaller than TeX. Also, probably much less understood at the source level.

Linux is actually a good example. Of course, one of Torvalds' main drivers is "never break user space." Which is essentially the stability I am talking about here.

It is also a great example, in that it is also not written in a way that academia approves of. Is famous for this, actually. (Among many other points, of course.)

I'll have to look up Bernstein's bet.

And, you'd be surprised how many statically linked binaries from 20 years ago won't run on modern setups.


Bernstein's bet: http://cr.yp.to/qmail/guarantee.html

Since IBM z-Series just hit the frontpage, you can add lots of COBOL and PL/I code which runs unchanged for the last 50 years. However, it is all proprietary and impossible to estimate.


Thanks for the link. That is a rather fun read. Is interesting to read someone that has "mostly given up on the standard C library." I think I understand and agree with the reasoning, to be honest, but is still far far from what is recommended in academia. Or industry, for that matter.

The COBOL and related code is an interesting data point. I would be interested to know just how bug free it all is, versus just used in ways that don't trigger bugs. Still, they definitely count as long running software, if not highly ported.


a) I don't see how a piece of software which has essentially not changed for twenty years is a "good example of stability". Stability in the face of what? Real software occupies a constantly changing world and TeX is sheltered from that by the surrounding third-party tools. TeX's stability would only be impressive of it were changing or growing while still retaining it's original functionality. Anybody can refuse to update a program and declare it to be stable - there's no lesson for me as an engineer, beyond just refusing to write code.

b) I don't understand this claim that TeX is bug free - there have been 947 documented bugs in TeX, with 427 of them occurring after TeX82. There was at least at least one bug in TeX found every year until 1995 and over a dozen have been found since then - in a code base which hasn't changed. The number of bugs in TeX as a percentage of the number of lines doesn't seem to be unusually low, and given that very little development has been done in the past 26 years, there has been very little chance of introducing new bugs. Once again, I'm not sure what I'm supposed to be learning from TeX as an engineer other than refusing to write code.

c) I'm not talking about clojure, I'm talking about TeX. I agree that there are few examples of fast, high quality typesetting software. But TeX is still slow - it does so much file I/O because it assumes a tiny amount of RAM. A modern rewrite could be 10 times faster, I reckon.

d) By definition it's a language that almost nobody is familiar with - and that's a problem. After some careful study, I am now familiar with it, but it's still very hard to read a program which is structured to be read as a book. Once cannot skim read the code or get a good understanding of its fundamental structure as a program. Instead one is always trapped in Knuth's narrative. I don't want to read a book - I want to read a program, and web makes that practically impossible. The end result is neither a program nor a book, and is unwieldy to programmer and reader alike - indeed, literate programming never caught on.

The software I really admire is software which has reacted to change, modernised, improved. Even Microsoft Windows is a great example of stability in the long run. TeX isn't, because it hasn't done any of those things. It still compiles - and its very good a typesetting - but as an exercise in writing code, it offers few relevant lessons to the modern programmer.


Stability in the sense that, if you wrote something against TeX and find bugs, the bugs are yours, not TeX's. There are plenty of programs which "have not changed" but retain bugs.

I did not say that the life of TeX was bug free. Just that its current status is essentially bug free. Now, I do grant that a large class of bugs, namely regressions, are renderred impossible by the development style. But, I personally think there may be lessons in that.

For your speed claim, you need more than "it doesn't work how I think it should." Numbers, or you are just pipe dreaming. Clojure is relevant here, because it is a recent attempt to modernize TeX. It is slower, by the author's admission. Any examples that aren't slow?

I mean, yes, I understand your point about I/O being somewhat of a red flag for speedups. I'm curious why it has never panned out that this low hanging fruit has dropped.

For d), I just have to disagree. As someone that doesn't even know Pascal, I found it approachable. Are there parts that are tough? Sure, it is a full product. Try reading parts of git's source. (Granted, the parts that get heavy math in reasoning about are particularly tough, but I consider that my failing.)


Why you're saying is that you don't mean "stable", you mean "bug-free". You listed them as two separate points originally - there is a difference between them, conflating stable to mean bug-free isn't useful.

You did literally claim that TeX was "bug free". It's entirely unremarkable that with no new features added to TeX in 26 years that eventually most of the bugs have been fixed - and theres been at least one bug almost every year since 1977. If anything, the lesson here is that TeX almost certainly still contains bugs.

TeX's slowness can be measured by measuring how much time TeX spends making I/O syscalls. You can pass debugging arguments to TeX so that it will log more information about which phase of the processing it is at - TeX spends very little time typesetting and most of its time reading and writing auxiliary files. Because a document with cross-references must be re typeset 3 times, and most of TeX's running time doesn't consist of typesetting, we can be fairly sure that TeX is almost 3 times slower than it needs to be in this case.

The I/O situation is difficult if not impossible to resolve without completely rebuilding and redesigning the TeX program the ground up - it's a macro processing system, and it has many thousands of lines of macros which need to be processed for even a simple document. When TeX was designed it was impossible to do most of this in-memory, so the entire design of TeX is built around processing streams of data with as little state as possible. TeX processes one page at a time, and all its global state is based on that assumption - certainly not low-hanging fruit.

d) Pascal is a great little language, once popular for teaching students. But TeX is written in WEB, which adds a fairly complex macro layer which is then utilised by Knuth in a very non-linear manner (with respect to the code - the narrative is linear). But it's not just the language which makes TeX tricy to read - it's how it was used, the entire program is one giant global state which is manipulated by a handful of giant functions which perform most of their logic using GOTOs - that was normal for the 1970s but it's unquestioably very difficult to wrap ones head around. At least with git I can skim through the source code and understand it's structure as a piece of code - even if the meaning is still difficult to comprehend.


Stable in that it is stable. Not only bug free, but not constantly shifting under your feet. I don't know of a better word than stable for this. And I do feel that "bug free" is a necessary condition for it. Just as I would feel that a foundation for a house is only stable if it is not moving and free of defects.

I claimed that TeX is literally bug free today. If you feel this is not the case, find a bug. I did not claim that TeX originated bug free. At least, I did not intend to make that claim. Apologies if it read that way.

We can not be certain that TeX is 3 times slower than it needs to be in that case. In that nobody has produced a system that is 3 times faster. Seriously, I am a fan of incremental compiles and multipass processing of documents. I am also more and more a fan of empirical results. Claims that things should be better, without any hard evidence, bother me.

And I think we just have to agree to disagree regarding WEB. I have found TeX far more understandable because of WEB than many other pieces of software. Specifically, the linear narrative is far more of an aid to understanding than any of the newer abstractions that are promoted elsewhere. Sure, the code is rather nonlinear. Outside of almost academically simple programs, I have seen few pieces of software where this is not the case.


I read basically the whole of the TeXbook (the spiral-bound book documenting the TeX language and implementation) many years ago, and found it fun and quirky, and very educational regarding how typesetting is done, and how typesetting practices can be transported into a computer implementation.

It was, for example, where I learned what an "em" is, the difference between a hyphen and an em-dash, what a kern is, what italic correction is, what leading is, and on and on.

But I found the language description not very clear. There are passages in the TeXbook that refer to different stages in processing as the stomach, mouth, gullet, etc. I later read an offhand comment that Knuth did not use a conventional lexer/parser setup in implementing TeX, and that decision made TeX more ad hoc from a language point of view. This, in turn, may have made the macro expansion setup more complex, with its expandable vs. not expandable tokens, restricted modes, etc.

I wonder if others who have more experience with language design had the same reaction?


I think that there aren't many fans of TeX's design. (Well, I rather like it, but probably because I am not a language designer.) Seibel (http://lambda-the-ultimate.org/node/3613#comment-51120) quotes Knuth as saying (in "Coders at Work"—but the relevant page is not on Google books):

    Mostly I don't consider that I have great talent for language design.


Yes, I guess to put a finer point on it, lex and yacc were available in 1975, and the first version of TeX was started in 1977, with a 1982 rewrite.

(OT, I just learned that that Eric Schmidt was a co-author of the 1975 version of lex.)


Some writers of real compilers (vs. school projects) even today would avoid lex and yacc. Stroustrup, for example, wrote that in hindsight he shouldn't have used yacc for his cfront, apparently it brought him more pain than benefit.

Yacc and lex (and even more modern versions of the parser generators) have good "educational" value, but are far from being the silver bullets in practice.


Yes. David Conroy who wrote the first couple of Mark Williams C compilers, said of yacc "It makes the hard part harder and the easy part easier".


To add my example: For my first and simple compiler used in the commercial application I used yacc. The start looked simple, then I hated it. It looked to me that I invested more energy in managing to do something "the yacc way" than to actually do something. For my more recent one, much more complex, I avoided yacc using a variant of recursive descent and never came to the point to miss anything, quite the opposite.


The one production compiler I worked on used YACC because we thought it was a good idea, but this was probably correct in that the PhD that did the parser part was a student of Hopcroft. She actually published a paper about error correction based on the work there.

Any compiler I write these days would do what you are saying--recursive descent. There is a nice technique called "chart" that helps with this.


Thanks. Do you mean this: http://en.wikipedia.org/wiki/Chart_parser It seems it's something to be more used for the natural language parsing, less for the programming languages?


That turns out to not be a very good writeup. The published work is http://dl.acm.org/citation.cfm?id=801348 and Dick Vile did a lot of work on it years after that, using it to describe an in-house production language.


Keep in mind that Knuth invented LR parsing, upon which LALR is based.


Thanks, I did not know that.


TeX is weird. Really weird, if one expects it to work like a modern programming language.

The crux of it is that it's a macro-expansion language, which means that you pretty much need to do the kind of stream-processing approach TeX uses, since you don't know the expansion of a macro up-front, which in turn can very well change the meaning of tokens downstream of that again. For an (admittedly extreme) example of this, consider TikZ; it's basically an entire new programming language, implemented purely by TeX macros.

I sometimes compare it to programming in Common Lisp, but you all you have available are the primitives (except lambda, I supppose) and defmacro.


Thanks, I think that's pretty much the issue. Because of the flexibility of the language (not just macros, but individual characters can change meaning using the \catcode mechanism), you can't have a fixed lexer. This makes the situation unlike the standard language setups I'm used to.


The more I work with languages, parsing, etc. the more I think that the lexer/parser setup is a bad idea. Many non-trivial languages are actually harder to implement clearly in the traditional lexer/parser setup due to required coupling between the lexer and parser.


Really? Why would they be coupled? I mean, maybe a little bit. For instance, maybe the parser could tell the lexer what kind of token it expects to see next, so that the lexer doesn't parse an int as a float by mistake, but that doesn't sound that bad to me. (I should point out that my experience here is a single class in college.)


The traditional theories about parsing and their implementations in tools like yacc, lex, antlr, etc. are not that important in practice. I also used them in some university courses, but after I encountered parsing problems in practice it seemed easier to just implement a recursive descent parser in the host language.

First, you have to learn the DSL's of these tools (which are non-trivial in detail). Second you have to integrate them into your toolchain. Third - and that is my biggest complaint - you have to connect the Parser combinator's AST to your domain AST. The last thing is pretty amusing as I consider it the main purpose of a parser.

In practice (hopefully after evaluating the need for a custom data format as it might be possible to hijack existing languages and standards) I either use a combinator-parser (if FP available) or an ad-hoc recursive descent parser. Maybe I consider a lexer but the representation of tokens is also a non-trivial decision.


Most languages will have some sort of way of letting you extend them; inasmuch as this will require a new way of lexing, this will add some coupling.

One example: shell aliases. The shell syntax in POSIX is defined in the standard lexer/parser divide (which is already a bit of a pain, as a significant of the parsing logic is needed to correctly tokenize command expansions). When you encounter an alias expansion (which since it has to happen in the command position, doesn't happen until parse time) you expand the alias, and then need to re-lex the results.

In practice, many mature compilers and interpreters use a recursive-descent parser rather than any of the LL/LR/LALR things I learned in school.

Common Lisp actually defines its syntax in terms of a recursive-descent algorithm, and that's a language that largely came out of academia.


If you're interested in those details, Victor Eijkhout's TeX by Topic is very instructive as well: http://www.eijkhout.net/texbytopic/texbytopic.html


"Clojure implementation was of course far slower than TeX."

I think that sentence is the most important one in the article. I'd like to know how much, "far slower" sounds like some order of magnitude.


Many of us, sometimes for good reasons, have a habit of thinking performance is the most important thing. However, in this case, focusing on that one sentence overlooks the spirit in which the talk was given: to educate.

There is some key context that you might not get from the InfoQ article. Glenn's talk, as presented at ClojureConj, emphasized the educational benefits of comparing the original procedural style with an adapted functional style.

Another message that came across is that we've lost a historical appreciation of our tools. It is very easy to frown upon the techniques that TeX used. In context, many of them were reasonable (if not state-of-the-art) given the constraints of the time.

It seems to me that it would be quite challenging to dig in and unpack the original intention of TeX and translate it to a modern functional language. I tend to believe that a Clojure version could be made almost as fast, if not faster, given more effort, without destroying understandability.


I found that bit slightly disappointing, and almost dismissive against the PL/compiler/optimization research done in the past 30 years or so. I don't believe that a modern clean rewrite would inherently need to be "far slower" than the original.


It seems though that nobody improved the results of even short programs presented here:

http://benchmarksgame.alioth.debian.org/u32/compare.php?lang...

The slowdown of most of these is 2 to 9 times, only one is comparable to C.

And I can imagine the difference when reimplementing the Knuth's code to be even bigger. For what Knuth did as early as in 1960 see here:

http://ed-thelen.org/comp-hist/B5000-AlgolRWaychoff.html#7

He wrote the whole ALGOL compiler in 1960 alone. During the first week (!) he first wrote the assembler. As written by one of the guys who developed another compiler, the Fortran one:

"Our compilers were both punched or cards and were the same size. We had written ours in STAR 0, the only assembler that Burroughs supported on the 205. It had been Dick Berman's first programming project. Our compiler took one hour and 45 minutes to assemble. The first week of don's project he spent in writing his own assembler. He could assemble his compiler in 45 minutes. We were green with envy. I am sure that don used only half the computer time that Lloyd and I used."

Btw they had no terminals there then, and all shared one computer:

"Our compilers were both punched on cards"

"There was only one 205 at the Pasadena Plant. It was primarily used to run the payroll. Lloyd and I were given top priority on the machine since real money was going to be given to Burroughs as soon as we successfully finished our compiler. Payroll had second priority and don was third."

TeX was written by Knuth between 1978 and 1989, working on it for ten years and starting almost twenty years after his 1960 feat.


Compared to something extensively hand-optimized by Donald Knuth, it doesn't seem that surprising, especially considering Vanderburg went with better abstractions in preference to optimization.


I find it hard to consider the abstractions "better" when they are so far behind the performance of the ones that were used. Higher level? Certainly. Better?


Glenn's goals are clearly tilted towards education and intelligibility. So yes, the abstractions are better by his definition.


I still reject that they are better, and more that education should strive for better results. That is, if you define them as better with no argument allowed, then they are of course better. However, if you were to objectively compare them... I am unsure why they would be awarded the success. Again, higher level and/or easier to understand? Sure. Better?? By what metric?

This would be akin to declaiming that Newtonian physics is "better" than quantam, because it is easier to understand.

Also consider that TeX is still one of the more widely read source programs in existance. And there have been frighteningly few bugs. There are undoubtedly more bugs in this rewrite with "better" abstractions than there are in what is being rewritten. So it loses stability. It is slower. So it loses speed.

What, exactly, is it truly better at?

As indicated in another post, I am unsure that this really succeeded at the pedagogical goals that were intended. If anything, it almost shines as an example of how we have gone awry in teaching developers.


I found his emphasis on history to be unusual and refreshing, but we don't certainly don't need to agree on someone else's goals for their project.

That said, I personally don't see a lot of value in being too critical of projects with different goals. (That feels a little bit like back-seat driving where you have a different destination in mind than the person behind the wheel.) If someone has the motivation and creativity to embark on a project, good for them.

If you or someone else wants to port TeX with different goals, that's also fine.

In any case, until Glenn's source code is available, neither one of us can say very much about the code he has written.


Oh, I am not against the excercise. I would even go so far as to applaud the work. What I do not care for, is essentially sloppy science in how we declare successes.

I mean, this is essentially back-seat driving Knuth, of all people.

And... TeX has been widely ported as is. I am in the crowd that is somewhat unconvinced it needs a rewrite.


Was there a link anywhere to the source code? Would be interesting to take a look. I'm currently learning Clojure.

It's not on his Github-account (https://github.com/glv).


Good point. I haven't found the Cló source code. To make it educational for others, I hope Glenn V. shares it.


The problem with TeX is that it gets most of its current power from the large library of packages written on top of it (or on top of other packages, e.g. LaTeX). People expect those to keep running on any new implementation of TeX. Which means that you need to be almost 100% compatible; you cannot really mess with the TeX language in any way. So the best you could hope for is a clean implementation (or whatever we perceive as clean these days) of a processor of an ugly (again by today's perception) typesetting language. It's not clear that that is worth the quite substantial effort.

You can of course aim for a new typesetting language and just take the typesetting algorithms from TeX, but then it's a completely different product, incompatible with existing TeX packages.


The packages are secondary; TeX stripped of all its accouterments already solves a really really hard problem, and is extremely fast and free to boot. The rate of bug discovery has slowed to about one a decade so there is really no need to mess with the internals at this point. It "just works". I think TeX will be with us for a very long time yet.


Hopefully someone will get so fed up with it that they just make a powerful replacement from scratch. (Maybe with a TeX export utility.)


I think that, as with standards, the problem isn't no replacement but too many: see Pandoc (http://johnmacfarlane.net/pandoc), Lout (http://savannah.nongnu.org/projects/lout), its (not yet existing) successor Nonpareil (http://sydney.edu.au/engineering/it/~jeff/nonpareil), Skribilo (http://www.nongnu.org/skribilo), and Pantoline (which I can't find with a quick Google search), for example. The problem is that TeX has so much inertia, and such an eco-system behind it, that getting significant investment in any one of them is likely to be difficult to impossible.

I think compilation to TeX is probably always going to be less successful than compilation from TeX—that is, taking the same code and (slightly) improving the processing. See the pdfTeX (http://www.tug.org/applications/pdftex) and LuaTeX (http://www.luatex.org) projects.

As a remedy to too much ambition even when trying to build directly on TeX, see the fate of LaTeX3 (http://tex.stackexchange.com/questions/953/why-is-latex3-tak...).


Alternately, you could rely exclusively on a TeX export utility and make a clean, less verbose language that compiles to TeX. Then existing packages could be either wrapped similarly or inserted as TeX blocks.


on that note, check out the combo https://github.com/softcover/softcover/ + https://github.com/softcover/polytexnic/ which is a very good first start on that approach:

    markdown --> .tex ____ pdflatex --> .pdf
                    \
                     \___ tralics  --> ruby_scripts --> .epub


I think one reason most people wanting a next-gen-TeX project don't aim to compile to TeX is that some of the biggest pain points of TeX are baked into the core. So if you want to improve them, you need to change or replace at least some of the core layout algorithms, not just the front-end input language. For example a big wishlist item for many years has been some kind of improvement on TeX's quite frustrating figure placement, possibly with a more pluggable layout algorithm.


I think it would be great if there was a concerted effort to do such a reimplementation (maybe in clojure, but maybe in another language, eg Python). This may be done in the style of NeoVim, I am sure a similar or even greater amount of funds could be fundraised if a project leader with sufficient credentials steps up to do it.

The challenges are of course formidable, as detailed in TFA. Any such project would have to maintain almost full backwards compatibility, unless the huge amount of work that has gone into CTAN is to be lost.



What's TFA?


The {Featured,Fine,F??????} Article. The web page linked at the top, that we're supposed to be discussing here.


"The Fine Article" (if you're in polite company): the original posted link.


"The Fine Article". Originated from Slashdot, I believe.


That's a joke, right? :) I hope that's a joke... I remember RTFM and RTFA from usenet before the web existed.


The phrase existed long before, but as far as I know with a somewhat different meaning, due to the Usenet connotation of "article". I was referring specifically to the usage in the comments of a news post, applied to people who have only read the other comments but not the actual article.


There are many claims that I think are incorrect, misleading or unrelated.

> According to Glenn, it is very hard to figure out what TeX code is doing, mostly due to its terseness and extreme optimization, as outlined above

The TeX source code is in literate programming, so it's not "terse" in the standard sense.

> there was no IEEE standard for floating point arithmetics;

I always suppose that the integer arithmetic was to ensure 100% portability. Every system handles integer equally (if you avoid undefined behavior), but there are lot of small incompatibilities with the floating point numbers (for example, single, double or extended precision).

> portability meant supporting almost 40 different OSes, each of them with different file system structures, different path syntax, different I/O and allocation APIs, character sets;

The original version of TeX was ASCII only, and using other character sets is always a problem. (For example, in LaTeX you must use a package like inputenc.)


I would be curious to know if the woven sources were used over the tangled ones. As probably the only program I own in printed form, the source is much more readable than I would have ever thought possible.


After reading the article, it sounds like a terrible idea to port that sort of program directly to Clojure (gotos, globals, and mutable state) without performing an intermediate port to a C-like high-level language first.

TeX may have fewer dependencies (given that it predates many libraries), so `Go` would probably a good target for an intermediate port - imperative/procedural, globals, goto support, static compilation.

Modernize this program in Go, then port it to Clojure.


Perhaps Glenn has different goals than you might expect? The video goes into this better than the InfoQ article.

In short, I think Glenn embraced the differences and challenges of adapting TeX in Clojure from an educational perspective. I don't think he intended to port it quickly or verbatim.

Glenn wanted to head-to-head comparison to promote education. To explain, in Glenn's talk around 11:36 (https://www.youtube.com/watch?v=824yVKUPFjU) he says:

> While I'm glad that the source code of TeX is still available to study, I sure wouldn't point a new programmer toward it because we should not be doing programs today like we did then.

> But what might be really valuable is to have a modern reimplementation written in a modern functional style to study alongside [the original TeX] to see how much has changed and to appreciate what he have and the value of improved tools and techniques and the tradeoffs that have been made.

His slide on his educational goals can be seen at 12:06:

> Illustrate what's changed

> Demonstrate values of expressive code

> Provide real examples of expressing procedural algorithms in functional style

> Show how functional programs are easier to reason about

> Show different styles of optimization


As the article makes clear he isn't doing a direct port to Clojure. He wants to write it in idiomatic Clojure, so it's in a much more functional style than the original code. Obviously it won't use gotos (because Clojure doesn't have them) and I imagine globals and mutable state would be used very sparingly as well.


After hearing the talk, I certainly was struck by the complexity of TeX and the mental challenge of adapting it.

It seems unlikely that doing two ports (first to Go, second to Clojure) would result in less work overall. I would not expect that a procedural-to-functional adaptation would be made dramatically easier by porting to Go first.


It's written in an obsolete programming style, and I'd never write code like that today, but boy did I learn a lot from reading through the entire source code of TeX.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: