I wrote my own “proper” programming language (2020)

Timwi · 2025-01-24T07:42:47 1737704567

I'm so excited to see that the idea of creating new programming languages is getting more popular. There is definitely a lot of space in which to explore more creativity; we haven't even remotely begun to scratch the surface of what's possible!

I just wish more tooling existed that was language-agnostic so that it's easier to get off the ground with something “serious”. I'm talking debuggers, parse-tree-aware diffs, autocompletion like Intellisense, etc.

_kb · 2025-01-24T12:24:44 1737721484

Hard agree. Even without going deep on a "serious language" there's a universe of DSLs that's mostly unexplored.

Debuggers are the outlier in your group but there's not exactly a void for those other wishes. As just one slice, building a tree-sitter [1] grammar gives the basis for good editor integration [2], formatters [3], structural diff [4] and other dev tools. Similarly if you're expressing some form of program, targeting LLVM IR connects your creation with a fairly extensive compiler toolchain.

Language agnostic tooling exists, but there still needs to be some abstraction layer and a mapping to that.

[1]: https://tree-sitter.github.io/

[2]: https://zed.dev/blog/syntax-aware-editing

[3]: https://topiary.tweag.io

[4]: https://difftastic.wilfred.me.uk

ryanhecht · 2025-01-24T12:59:01 1737723541

The advent of the [language server protocol](https://en.wikipedia.org/wiki/Language_Server_Protocol) made me really excited that more abstraction layers for language-specific tooling would pop up -- I feel like we haven't seen as much of that as I would like?

chikere232 · 2025-01-24T09:48:37 1737712117

The lack of such tooling is kind of a feature. Writing your own language is a great way to learn, but almost all of them shouldn't be used for anything serious

lolinder · 2025-01-24T17:00:58 1737738058

Isn't this a self-fulfilling prophecy? Make it nearly impossible to create a new language with a sufficient amount of backing tooling, and sure, you'll find that nearly all new languages should not be used.

genewitch · 2025-01-24T20:09:07 1737749347

I had a similar thought chain. I came to the conclusion, though: You're always going to need people to learn how to build compilers and parsers and lexers. But for a language to be taken seriously, you kind of need a group of people. but from another language or languages. To be analogous, if John Carmack was an indie game developer tomorrow, we'd all buy his games probably. But if I publish a game tomorrow, how many of us are buying it?

Now, imagine five semi-core devs of Python get together and make a compiled language.

edited because i use a local llm on my phone for speech-to-text and i was testing with nearly white noise (fans and aerated water...) "FUTO"

ropejumper · 2025-01-24T13:52:23 1737726743

The lack of such tooling may be precisely why many potentially great "indie" languages never succeed.

hnlmorg · 2025-01-24T17:17:57 1737739077

You’re massively overstating things there.

The reason indie languages never succeed is because without a strong corporate backer or other such commitment to longevity, the risk of doing anything serious in it only to have to rewrite the entire thing in the next new indie language is far too great. Or at least it should be for any seasoned engineer worth their paycheck.

Tooling helps prove that. But a strong standard library and a proven track record counts for so much more.

genewitch · 2025-01-24T20:41:17 1737751277

at some point you risk taking on maintaining the project, which may sound great to some people. i've worked places that bought a company to own a product we used that cost too much in maint contracts. I did cloud SaaS "1 week free demo" platform for that product so our company could recoup the cost of bringing on the maintenance burden in-house. I get the aversion to the risk of using new/untested/fringe products that you may be the only entity that actually can keep it running.

I'm not a "developer". I am a ham. I am not a hamster.

paulddraper · 2025-01-24T22:00:53 1737756053

> we haven't even remotely begun to scratch the surface of what's possible

Huh?

Programming languages have done lots and lots of ideas.

So many that they're starting to blend together.

treve · 2025-01-27T03:12:47 1737947567

They absolutely have, but like the person you're responding to, I also believe there's way more to be done. It's hard to imagine things that don't exists yet (and it's not going to be me) but encouraging experimentation is key.

TikTikFook · 2025-01-24T10:47:56 1737715676

> I'm so excited to see that the idea of creating new programming languages is getting more popular.

https://imgs.xkcd.com/comics/standards_2x.png

Sure, when everyone and his dog can publish their (soon to be unmaintained once the novelty feelings wear off) language, the world will be a better place.

all2 · 2025-01-24T17:30:46 1737739846

This is a really awful way to look at artistic expression. I will note, though, that one of the complaints of the Roman empire in its twilight years was that everyone wanted to write a book, so I guess this 'old man shakes stick at enjoyment' thing has been around for quite awhile.

Litost · 2025-01-27T12:26:15 1737980775

Do you have any further info on this "one of the complaints of the Roman empire in its twilight years was that everyone wanted to write a book".

I'm curious about some of the seemingly slightly oddball signs of late stage civilisation collapse and I'd not seen this one mentioned before?

all2 · 2025-01-27T21:43:02 1738014182

Ah. I am perpetuating a meme that is almost a century old. :D

https://quoteinvestigator.com/2012/10/22/world-end/

recursive · 2025-01-24T17:32:45 1737739965

This but unironically.

And the XKCD character got the reason wrong. People aren't making languages because they think there are too many languages.

As far as I'm concerned, the more the better. I make languages because there aren't enough.

norir · 2025-01-24T17:13:53 1737738833

This is one way to write a compiler. One of the large tradeoffs made is extensive use of libraries on both the front and backends. It's a practical choice but also means that the compiler itself is also likely somewhat slow. Targeting LLVM alone is a choice that will guarantee that code gen is slow (you pay for a lot of unneeded complexity in llvm with every compile).

When you master the principles of parsing, it is straightforward to do in any language by hand with good performance. It is easy to write a function in any language that takes a range string, such as "_a-zA-Z" and returns a table of size 256 with 65-90, 95 and 97-122 set to 1 and the rest set to zero. You can build your parser easily on top of these tables (this is not necessarily optimal but it is more than good enough when you are starting).

For the backend, you can target another language that you already know. It could be javascript, c or something else. This will be much easier than targeting llvm.

Start with hello world and build up the functionality you need. In no more than 10k lines of code (and typically significantly less), you can achieve self hosting. Then you can rewrite the compiler in itself. By this point you have identified the parts of the language you like and the parts that you don't or are not carrying weight. This is the point at which maybe it makes sense to target llvm or write a custom assembly backend.

The beauty of this approach is you keep the language small from the beginning without relying on a huge pile of dependencies. For a small compiler, you do not need concurrency or other fancy features because the programs it compiles are almost definitionally small.

Now you have a language that is optimized for writing compilers that you understand intimately. You can use it to design a new language that has the fancy things you want like data race protection. Repeat the same process as before except this time you start with concurrent hello world and build in the data race protection from the beginning.

atan2 · 2025-01-24T10:43:25 1737715405

I probably won't create a "proper" programming language but this topic fascinates me. As someone that never even took a compilers class in college I was really happy with the content I found at pikuma.com. The course really helped me understand how a simple programming language works. I'm sure others might benefit from it too.

gustavopezzi · 2025-01-24T12:22:33 1737721353

Hi there. Thanks for the mention. I'm glad it was helpful. :)

dunham · 2025-01-24T17:25:20 1737739520

I wrote my own language last year[1], ending the year by doing Advent of Code in it, and then translated it to itself in early January (so it's now self-hosted). I wanted to see if I could learn how to write a dependent typed language, wanted it to be self hosted, and able to run in a browser.

It's perhaps not a "proper" language because I targeted Javascript. So I didn't have to write the back half of the compiler. Since it's dependent typed, I had plenty of work to do with dependent pattern matching, solving implicits, a typeclass-like mechanism, etc.

Next I may do a proper backend, or I may concentrate on the front end stuff (experiment with tighter editor integration, add LSP instead of the ad hoc extension that I currently have, or maybe turn it into a query-based compiler). Lots of directions I could go in.

At the moment, I'm looking into lambda-lifting the `where` clauses (I had punted lambda lifting to JS), and adding tail call optimization. I lost Idris' TCO when I self-hosted, so I currently have to run the self-hosted version in `bun` (JavaScriptCore does TCO).

[1]: https://github.com/dunhamsteve/newt

pyrale · 2025-01-22T10:36:45 1737542205

I'd be interested to understand the design choices behind using protobuf as an interface with LLVM: in my reasoning, it may be more performant, but that serialization step is a very small part of compute, and the serialization format is unusable by humans. For debug purposes, it'd have been nice to have a more human-friendly format. Did the project have other constraints?

kubb · 2025-01-24T08:04:33 1737705873

Serialized protocol buffers can be converted to the human friendly text format:

https://protobuf.dev/reference/protobuf/textformat-spec/

He most likely wants to have the type structure generated by protocol buffer as opposed to parsing JSON

The latter requires asserting in a million places that this key exists in this map with this type, which will require a million lines of crap code.

Not to mention packing and unpacking the serialized data and maintaining two separate sets of corresponding structures/records that have to be kept in sync.

rurban · 2025-01-24T08:47:40 1737708460

Protobuf also needs to deserialized. Cap'n proto would have been the better option.

etyp · 2025-01-24T10:58:32 1737716312

I've actually tried serializing languages into protobufs. The main reason was it made communication from X random programming language to Java in a consistent, structured way. Seems like it's just how they sent the IR from OCaml to C++. On either side you'll get the Bolt IR so I don't think debugging suffers too much. But the extra step for serializing and deserializing is a bit of a bummer

mistrial9 · 2025-01-24T15:20:28 1737732028

there was a guy on a large science team that wrote the "programming language" for that system.. there was a system of dispatch for "verbs" in the system, and it was large.. there were maybe 20 full time engineers building other parts, all the time. The guy who wrote the "language" was a sports guy with a swim background. For years, five days a week, he would get up before dawn and train swimming, then he would arrive at work at 9am and work on the code system.. every day.. for years. It was admirable in a way but also slavish

gjadi · 2025-01-24T11:23:45 1737717825

His progression is wild. Going from top Cambridge graduate in 2021 to Staff Eng at META in 3y. Nice!

fjfaase · 2025-01-25T07:47:12 1737791232

If you want to design your own language, you might want to start with IParse Studio, an interactive online parser that parses input according a grammer at every keystroke returning a parse tree, if the input can be parsed according the grammar.

Once you have the grammar, you can use it with IParse developed in C++, which produces an abstract parse tree.

IParse has a build-in scanner for C like terminals, which are now used in many languages. You can implement your own scanner. IParse also has an unparser, which allows you to generate pretty printed output with just some annotations in the grammar.