How the first Pascal compiler was written

tankenmate · on Dec 30, 2011

"After this experience, it was hard to understand that the software engineering community did not recognize the benefits of adopting a high-level, type-safe language instead of C." -- N. Wirth

I suspect that the good programmers did notice the benefits of a strict statically typed language, however the cost was probably deemed to be too high at the time. Because C was a "mid-level language" and stayed close to the metal it could efficiently use the limited resources of pre-LSI (let alone VLSI) computers; effectively C allowed you to break the rules if you wanted or needed to. These days obviously the performance difference is generally less than a few percent and hence can be largely ignored.

kabdib · on Dec 30, 2011

Pascal's I/O was broken. Strings were broken.

Every Pascal I ever worked with (with the exception of the Pascal I used in school) had extensions to make it work in the Real World. Unextended Pascal was a toy.

Apple was a Pascal shop in the 80s, but they extended it to the point that I just translated (in my head) the C code that I wanted to write into the variant of Pascal they'd implemented. Pascal had a couple of nice things that C didn't (nested procedures, for one), but they weren't used all that much by us.

Strings were still a pain. Str255 stunk, and everyone knew it, but nobody had anything that was dramatically better.

pmiller2 · on Dec 30, 2011

I disagree. Unextended Pascal is not a toy: it's a tool to teach computer science and structured programming. In a classroom environment in the 1970's, it's not relevant that strings have to have a defined, fixed length. I'm not 100% sure what you mean by I/O being broken, but I'd wager that's not a big deal in a classroom environment, either. In any case, evaluating unextended Pascal as anything other than a teaching tool is like complaining that a hacksaw isn't very good for screwing in screws. Talk about "wrong tool for the job!" :-)

Avshalom · on Dec 30, 2011

The quote -from Wirth himself- was not about classroom environments, it was about software engineer(s|ing).

pmiller2 · on Dec 30, 2011

True. But, "high-level, type-safe language" need not refer to unextended Pascal, and I see no evidence that it was intended to. He could have meant Algol W. Depending on when it was written, he could have meant Modula 2, Modula 3, or Oberon.

kabdib · on Dec 31, 2011

Serious question: Why is it that we generally see only toy systems being done in these languages.

I don't mean to offend. If I'm unaware of major pieces of software done in Modula-N or Oberon, then feel free to correct me.

I only know of some teaching-level and research OS work, and some GUI work, but little in the way of commercial stuff.

narag · on Dec 30, 2011

I've read quite a bit of Pascal's history. There are three main explanations:

* Lack of a complete standard. The language was initially used in academy, so when it was adopted by industry, numerous mutually incompatible extensions were developed. The standard C library today may seem basic, but it was nice in comparison.

* Virtual machines. Now they're the cool thing. Then they were a shortcut to have a quick and dirty implementation for new hardware, instead of writting a compiler. Bad performance was associated with the language, not the implementation.

* Some implementations were of the "bondage and discipline" kind.

TurboPascal compiler in the eighties solved all three problems, adding an IDE and a $50 tag for good measure. But it was too late. Although very popular in Europe and for little shops in general, the big software firms in the USA had adopted C already.

xradionut · on Dec 30, 2011

One wonders where we would be if Pascal or one of it's offspring like Oberon had actually caught on and challenged C. It would be nice to have a alternative system programming language. C sometimes feels like a local maximum instead of a global one.

From the PDF:

"The clean solution given in Oberon [6] is the concept of type extension, in object-oriented languages called inheritance. Now it became possible to declare a pointer as referencing a given type, and it would be able to point to any type which was an extension of the given type. This made it possible to construct inhomogeneous data structures, and to use them with the security of a reliable type checking system. An implementation must check at run-time, if and only if it is not possible to check at compile-time,.

Programs expressed in languages of the 1960s were full of loopholes. They made these programs utterly error-prone. But there was no alternative. The fact that a language like Oberon lets you program entire systems from scratch without use of loopholes (except in the storage manager and device drivers) marks the most significant progress in language design over 40 years."

cpeterso · on Dec 30, 2011

Pascal was the primary high-level language used for development in the Apple Lisa, and in the early years of the Mac. Microsoft assumed Pascal would become the dominant application programming language, so the early Windows ABI used the Pascal calling conventions. That's why we're stuck with Windows' PASCAL/WINAPI/stdcall and cdecl ABI mess today.

narag · on Dec 30, 2011

IIRC the first (unusable) Windows version was written in Pascal.

I remember reading somewhere that Pascal calling convention was more efficient. Return from a function and cleaning the parameters from the stack could be done with a single RET N instruction, while C calls needed both a RET and a separate stack realignment, because C allows variable number of params.

I don't know if it's still the case with recent processors.

caf · on Dec 31, 2011

C requires that variable-argument-list fuctions be called with a correct prototype in scope, so it is perfectly feasible to use a "callee-cleans-up" ("Pascal") calling convention for regular C functions and a "caller cleans up" ("cdecl") convention for varargs functions.

The point is moot though, because just because an operation can be represented using a single assembler mnemonic doesn't mean that it's any faster than an alternative that uses several. A case in point is the "REP MOVS" style string functions, that until very recently were actually slower than opencoding the equivalents, since they trapped to microcode.

pjmlp · on Dec 30, 2011

<quote> One wonders where we would be if Pascal or one of it's offspring like Oberon had actually caught on and challenged C. It would be nice to have a alternative system programming language. C sometimes feels like a local maximum instead of a global one. </quote>

I think we would be using safer systems where buffer overruns and pointer exploits would be almost non existent.

The hegemony of C has cost the industry millions of euros/dollars in software correctness.

WalterBright · on Dec 30, 2011

Much of it is due to C's Biggest Mistake:

http://drdobbs.com/blogs/architecture-and-design/228701625

narag · on Dec 30, 2011

I believe that the single most damaging thing was strings:

http://www.joelonsoftware.com/articles/fog0000000319.html

zem · on Dec 30, 2011

what surprises me is that none of the c-plus-safety languages like cyclone and single assignment c have gained much traction

dchest · on Dec 30, 2011

Full paper "Good Ideas, Through the Looking Glass" by Niklaus Wirth [PDF]: http://www.inf.ethz.ch/personal/wirth/Articles/GoodIdeas_ori...

ericbb · on Dec 30, 2011

I just skimmed through it and there's a really controversial claim in there that the functional paradigm was a bad idea!

I also found it surprising that he disliked Dijkstra's display idea since I've used a variant of that to great effect.

Anyway, there's a lot of fun reading in there so thanks for the link! Here's a discussion on Lambda the Ultimate: http://lambda-the-ultimate.org/node/1773

rpearl · on Dec 30, 2011

I'm always a bit hazy on why people bootstrap languages by writing compilers. I've always imagined it would be easier to write an AST-walking interpreter, write the compiler in the source language, and then interpret the compiler taking its source as input, to produce the first binary.

I guess I've never actually bootstrapped a compiler, but I've always found the interpreters I've written to be easier to reason about than the compilers I've written.

pjmlp · on Dec 30, 2011

Bootstrapping a compiler is a good way to test how the language feels when writing a big project like a compiler. It also helps to find errors in the language/implementation.

rpearl · on Dec 30, 2011

In my proposed way to bootstrap a compiler, you still write the compiler in the language you wish to compile. You also write an interpreter, rather than another compiler for the same source language, written in some other language for which a compiler already exists. This means that the amount of code you are writing to get the language up and running is smaller, since interpreters are easier to write than a whole second compiler.

So what you say is true--it is a good way to see how the language feels--but not relevant to the facet of language development I was musing about.

nradov · on Dec 30, 2011

Hardware limitations on the CDC 6000 may have made an interpreter impractical.

sdevlin · on Dec 30, 2011

So they had three tools to choose among. One was broken, one was wildly unsuitable, and one was unpopular. They chose the second for (apparently) purely cultural reasons. I think it's not so surprising that they ran into problems.

Don't discard unsexy solutions out of hand; choose the right tool for the job.

kd0amg · on Dec 30, 2011

Was assembly really the right tool for this either? It seems like the difficulty of translating Fortran to Pascal due to Fortran's lack of many of Pascal's higher-level features would still apply to assembly.

sdevlin · on Dec 30, 2011

I think so. Wirth called out Fortran's lack of pointers, records, and recursion. You have (or can fake) all of these things in assembly. Recursion is particularly important for building a compiler, since much of it involves manipulating trees.

Also, consider the stated goal of the project. They wanted to write a throwaway compiler so that they could bootstrap a Pascal version. In this case, it's a bad choice to use a language (Fortran) that's so different from Pascal. On the other hand, assembly is very flexible, so you can use whatever idioms you think will apply in the Pascal code you later intend to write. This will make the translation much simpler.

watmough · on Dec 30, 2011

Writing in assembler, it's easy to construct linked records, and recursion works correctly, since you have a stack.

It might take awhile, but it doesn't seem like it would be as hard to macro up some higher-level routines for parsing and code-generating a simple PASCAL.

I'd expect that most Beta-machines (nowadays virtual machines) would have been written in assembler back then, including lots of expertise running compiled Algol amongst other things.

pmiller2 · on Dec 30, 2011

I think writing it in (perhaps a subset of) Pascal, then translating to assembler either manually or by interpreting the Pascal source by hand would have been the Right Thing To Do (tm). Writing in the high-level language makes it easier to get the algorithms and data structures right. Translating to a lower level language later is easier with that framework in hand -- the Pascal source would serve as a sort of spec for the ASM translation. I know that when I write code in Python and translate to C, it's frequently easier than starting off writing in C, partly for that reason.

RodgerTheGreat · on Dec 30, 2011

Assembly allows you to use pointers and recursive methods, which were specific problems with Fortran. You can do anything in assembly, it just takes a lot of lines. Programming languages, especially early ones, can place major limitations on how computations are expressed, even if they are nominally as turing-complete as any other language.

dugmartin · on Dec 30, 2011

I'm not a big Lisp fanboy, but I wonder why it wasn't considered an option for at least bootstrapping the compiler? I think even if no Lisp was available on their machines that hand coding one in assembly and then using that would have been much faster than other options (note: it's been 20+ years since my college Programming Languages class (where ironically I built a Lisp-1 in Pascal) so I may be mis-remebering my programming language history here)

JulianMorrison · on Dec 31, 2011

Shoulda used Forth. It'd be fairly easy to go from assembler to an ultra bare bones Forth, and that could be built up to work like Pascal, until it looks like Pascal-written-backwards. Then write the compiler in that, and the translation and bootstrap would be trivial.

RodgerTheGreat · on Dec 31, 2011

My first thought as well, but in 1969 Forth was still busy being born. Based on the notes here[1], many of the core concepts of a modern Forth like defining words and the structure of the dictionary were being actively hammered out.

[1] http://www.colorforth.com/HOPL.html

guard-of-terra · on Dec 30, 2011

I wonder which "high-level" features does Pascal have which C lacks. When I looked into Pascal, literally everything useful there came from C. All things originating in Pascal seemed useless.

microtherion · on Dec 30, 2011

That would have been a rather neat trick, given that Pascal predates C.

Furthermore, C itself evolved rather substantially over the years, so people who mostly experienced the ANSI/ISO C era may not realize what an appallingly messy language it used to be. By the time the original K&R got published, the situation had improved considerably (with the exception of function prototypes), but I found the C code in 6th edition UNIX a truly hair raising read: http://www.tom-yam.or.jp/2238/src/

weinzierl · on Dec 30, 2011

Pascal has nested functions and supports a limited form of closures.

In Pascal, a function can be passed as an argument to another function, but cannot be stored in a variable or data structure, cannot be returned from a function, and cannot be created without being given a name. However, when a function is passed to another function and later called, it will execute in the lexical context it was defined in, so it is, in some sense, "closed over" that context.

I also liked the I can pass parameters per value or reference without explicitly using pointers.

guard-of-terra · on Dec 30, 2011

In C a function pointer can be passed around, and usually a pointer to contect structure is passed around. That happens to be more practical.

References are cool. That makes one point for Pascal.

barrkel · on Dec 30, 2011

Nested routines in classical Pascal support downwards funargs; they are closures lite, i.e. actually more expressive than C functions. But we are talking classical Pascal; every practical commercial Pascal implementation supports function pointers directly, just like C does.

barrkel · on Dec 30, 2011

Having arrays indexed by enumerations can be nice.

    type
      TFoo = (foOne, foTwo, foThree);
    const
      FooStr: array[TFoo] of string = ('One', 'Two', 'Three');

There's no mucking around with implicit or explicit conversions between integers and enumerations. Similarly, you can enumerate through them easily with standard routines:

    var i: TFoo;
    { ... }
    for i := Low(TFoo) to High(TFoo) do { etc. }

Doing it in C relies on you adding fake enumeration members to stand for counts; and it gets worse in C++, because enumerations are more reluctant to decompose into integers.

Writing scanners (as in, compiler lexers) in Pascal is very convenient with set notation:

    { skip whitespace }
    while cp^ in [#1..#32] do Inc(cp);

With a good debugger, it can be nice to use sets instead of bit flags, because you generally get a more reliable symbolic breakdown.

If you write a lot of code that breaks down in a procedural way, having nested routines is very nice. It can limit the scope of functions and procedures to just the code that needs them. In C, one can be tempted to cram it all into a single long function instead.

Practical Pascals like Turbo Pascal and Delphi have a real live module format that works fairly well for independent compilation. Changes to the interface exported by a unit do not necessarily need all dependent units recompiled. The Pascal linker associates versions with all exported symbols (basically, a hash of their signature), and only units whose import symbol versions don't match the export symbol versions need to be recompiled. This also prevents type mismatches that can (albeit rarely in practice) affect C, where if you change the signature of a function or the type of a variable, and fail to recompile clients, you won't get an error from the linker, because C linkers don't usually encode that info.

pmiller2 · on Dec 30, 2011

Pascal's pointer semantics are interesting. Because it's possible to detect when a value is used before being initialized, this reduces the probability of unintentional null pointer dereferences.

Subranges are also a nice feature.

Those, and nested procedures are about all I can come up with.

billforsternz · on Dec 31, 2011

I'm confused. He starts off by asserting there were only three options; Assembly, Fortran or Algol. But then later on an essential part of the process turns out to be "a syntax-sugared, low-level language, for which a compiler was available". So why wasn't that apparently anonymous language one of the base options ?

mayoff · on Dec 31, 2011

I think that was the "compiler for a substantial subset of Pascal using Fortran" that they originally planned to translate to Pascal after implementing it in Fortran.

billforsternz · on Dec 31, 2011

Ok, thanks, I guess that makes sense.

akkartik · on Dec 30, 2011

"Never contain programs so few bugs, as when no debugging tools are available!"

But that's for the initial draft. What of its later evolution?