I'm so excited to see that the idea of creating new programming languages is getting more popular. There is definitely a lot of space in which to explore more creativity; we haven't even remotely begun to scratch the surface of what's possible!
I just wish more tooling existed that was language-agnostic so that it's easier to get off the ground with something “serious”. I'm talking debuggers, parse-tree-aware diffs, autocompletion like Intellisense, etc.
Hard agree. Even without going deep on a "serious language" there's a universe of DSLs that's mostly unexplored.
Debuggers are the outlier in your group but there's not exactly a void for those other wishes. As just one slice, building a tree-sitter [1] grammar gives the basis for good editor integration [2], formatters [3], structural diff [4] and other dev tools. Similarly if you're expressing some form of program, targeting LLVM IR connects your creation with a fairly extensive compiler toolchain.
Language agnostic tooling exists, but there still needs to be some abstraction layer and a mapping to that.
The advent of the [language server protocol](https://en.wikipedia.org/wiki/Language_Server_Protocol) made me really excited that more abstraction layers for language-specific tooling would pop up -- I feel like we haven't seen as much of that as I would like?
The lack of such tooling is kind of a feature. Writing your own language is a great way to learn, but almost all of them shouldn't be used for anything serious
Isn't this a self-fulfilling prophecy? Make it nearly impossible to create a new language with a sufficient amount of backing tooling, and sure, you'll find that nearly all new languages should not be used.
I had a similar thought chain. I came to the conclusion, though: You're always going to need people to learn how to build compilers and parsers and lexers. But for a language to be taken seriously, you kind of need a group of people. but from another language or languages. To be analogous, if John Carmack was an indie game developer tomorrow, we'd all buy his games probably. But if I publish a game tomorrow, how many of us are buying it?
Now, imagine five semi-core devs of Python get together and make a compiled language.
edited because i use a local llm on my phone for speech-to-text and i was testing with nearly white noise (fans and aerated water...) "FUTO"
The reason indie languages never succeed is because without a strong corporate backer or other such commitment to longevity, the risk of doing anything serious in it only to have to rewrite the entire thing in the next new indie language is far too great. Or at least it should be for any seasoned engineer worth their paycheck.
Tooling helps prove that. But a strong standard library and a proven track record counts for so much more.
at some point you risk taking on maintaining the project, which may sound great to some people. i've worked places that bought a company to own a product we used that cost too much in maint contracts. I did cloud SaaS "1 week free demo" platform for that product so our company could recoup the cost of bringing on the maintenance burden in-house. I get the aversion to the risk of using new/untested/fringe products that you may be the only entity that actually can keep it running.
I'm not a "developer". I am a ham. I am not a hamster.
They absolutely have, but like the person you're responding to, I also believe there's way more to be done. It's hard to imagine things that don't exists yet (and it's not going to be me) but encouraging experimentation is key.
Sure, when everyone and his dog can publish their (soon to be unmaintained once the novelty feelings wear off) language, the world will be a better place.
This is a really awful way to look at artistic expression. I will note, though, that one of the complaints of the Roman empire in its twilight years was that everyone wanted to write a book, so I guess this 'old man shakes stick at enjoyment' thing has been around for quite awhile.
This is one way to write a compiler. One of the large tradeoffs made is extensive use of libraries on both the front and backends. It's a practical choice but also means that the compiler itself is also likely somewhat slow. Targeting LLVM alone is a choice that will guarantee that code gen is slow (you pay for a lot of unneeded complexity in llvm with every compile).
When you master the principles of parsing, it is straightforward to do in any language by hand with good performance. It is easy to write a function in any language that takes a range string, such as "_a-zA-Z" and returns a table of size 256 with 65-90, 95 and 97-122 set to 1 and the rest set to zero. You can build your parser easily on top of these tables (this is not necessarily optimal but it is more than good enough when you are starting).
For the backend, you can target another language that you already know. It could be javascript, c or something else. This will be much easier than targeting llvm.
Start with hello world and build up the functionality you need. In no more than 10k lines of code (and typically significantly less), you can achieve self hosting. Then you can rewrite the compiler in itself. By this point you have identified the parts of the language you like and the parts that you don't or are not carrying weight. This is the point at which maybe it makes sense to target llvm or write a custom assembly backend.
The beauty of this approach is you keep the language small from the beginning without relying on a huge pile of dependencies. For a small compiler, you do not need concurrency or other fancy features because the programs it compiles are almost definitionally small.
Now you have a language that is optimized for writing compilers that you understand intimately. You can use it to design a new language that has the fancy things you want like data race protection. Repeat the same process as before except this time you start with concurrent hello world and build in the data race protection from the beginning.
I probably won't create a "proper" programming language but this topic fascinates me. As someone that never even took a compilers class in college I was really happy with the content I found at pikuma.com. The course really helped me understand how a simple programming language works. I'm sure others might benefit from it too.
I wrote my own language last year[1], ending the year by doing Advent of Code in it, and then translated it to itself in early January (so it's now self-hosted). I wanted to see if I could learn how to write a dependent typed language, wanted it to be self hosted, and able to run in a browser.
It's perhaps not a "proper" language because I targeted Javascript. So I didn't have to write the back half of the compiler. Since it's dependent typed, I had plenty of work to do with dependent pattern matching, solving implicits, a typeclass-like mechanism, etc.
Next I may do a proper backend, or I may concentrate on the front end stuff (experiment with tighter editor integration, add LSP instead of the ad hoc extension that I currently have, or maybe turn it into a query-based compiler). Lots of directions I could go in.
At the moment, I'm looking into lambda-lifting the `where` clauses (I had punted lambda lifting to JS), and adding tail call optimization. I lost Idris' TCO when I self-hosted, so I currently have to run the self-hosted version in `bun` (JavaScriptCore does TCO).
I'd be interested to understand the design choices behind using protobuf as an interface with LLVM: in my reasoning, it may be more performant, but that serialization step is a very small part of compute, and the serialization format is unusable by humans. For debug purposes, it'd have been nice to have a more human-friendly format. Did the project have other constraints?
He most likely wants to have the type structure generated by protocol buffer as opposed to parsing JSON
The latter requires asserting in a million places that this key exists in this map with this type, which will require a million lines of crap code.
Not to mention packing and unpacking the serialized data and maintaining two separate sets of corresponding structures/records that have to be kept in sync.
I've actually tried serializing languages into protobufs. The main reason was it made communication from X random programming language to Java in a consistent, structured way. Seems like it's just how they sent the IR from OCaml to C++. On either side you'll get the Bolt IR so I don't think debugging suffers too much. But the extra step for serializing and deserializing is a bit of a bummer
there was a guy on a large science team that wrote the "programming language" for that system.. there was a system of dispatch for "verbs" in the system, and it was large.. there were maybe 20 full time engineers building other parts, all the time. The guy who wrote the "language" was a sports guy with a swim background. For years, five days a week, he would get up before dawn and train swimming, then he would arrive at work at 9am and work on the code system.. every day.. for years. It was admirable in a way but also slavish
If you want to design your own language, you might want to start with IParse Studio, an interactive online parser that parses input according a grammer at every keystroke returning a parse tree, if the input can be parsed according the grammar.
Once you have the grammar, you can use it with IParse developed in C++, which produces an abstract parse tree.
IParse has a build-in scanner for C like terminals, which are now used in many languages. You can implement your own scanner. IParse also has an unparser, which allows you to generate pretty printed output with just some annotations in the grammar.
I just wish more tooling existed that was language-agnostic so that it's easier to get off the ground with something “serious”. I'm talking debuggers, parse-tree-aware diffs, autocompletion like Intellisense, etc.
reply