Hacker News new | past | comments | ask | show | jobs | submit login

A question for the LLVM experts here:

In the past when I've looked at LLVM from a distance, the biggest stumbling block I found were that it's written in C++ , which isn't the language I'm using for my frontend.

How important is the C++ API in practice? Are the bindings for other languages usable? Is it possible to have my frontend emit LLVM IR in a text format, similarly to how you can feed assembly language source code to an asssembler? Or should one really just bite the bullet and use C++ to generate the IR? I noticed that the compiler in this tutorial has a frontend in Ocaml and a backend in C++, with the communication between them being done via protobufs.




I emit LLVM IR as text in my compiler. It’s painless, almost. Some points off the top of my head…

- I started out just printing strings, but found I wanted a bit more structure so I wrote a dozen or so functions to do the printing. Things at the level of call(name,rags). All make a single line of IR. It helps keep the typos down.

- just run your IR through clang to compile it. It has good enough error messages that you won’t go mad.

- you will need to make a bunch of DI metadata to get debug information into your programs. It took me about 8 hours to get enough that I have line number information and LLDB will show me functions and source lines on a backtrace. I should have done this much earlier than I did. I was getting by with line number comments in my IR, which is simple, but useless if you Give yourself an access violation.

- learn to write little C programs and use clang to emit the IR. This will let you sort out how to do things. The IR manual is good, but there are concepts which are omitted because they were obvious to the authors, but won’t be to you.

- local symbols have to be in strict, ascending, no skipped values numerical order. Screw that. Grab a dollar sign and make your own numeric local-ish symbols and don’t sweat reordering them.

- it doesn’t seem to care what order you emit things in, other than lines in a block of course, so just do what is easy for you.

- get a grip on getelemtptr and phi. The first may not be what you think it is, learn what it is. The second is just part of the magic of SSA.


> learn to write little C programs and use clang to emit the IR.

Highly recommend this. I used this to get an understanding of how to implement the IR for my language.


The command to use is `clang -S -emit-llvm -O1 foo.c`

It'll write it out to a foo.ll file.

(I use -O1 so it cleans up a bit of the messy parts of the IR).


Can you print variables in lldb with the debug information?

One compiler that I use which emits llvm ir added support for debug information recently and its now possible to set breakpoints in gdb but you can't print out any stack variables or anything so its not useful other than figuring out which code paths execute.

I'd like to learn more about this. Maybe contribute to the compiler and fix this issue.


> I'd like to learn more about this. Maybe contribute to the compiler and fix this issue.

You need to create a call to the `llvm.dbg.declare` intrinsic that connects the stack variable alloca to a DILocalVariable metadata node, and place the call after the stack variable alloca. The rest of LLVM will handle tracing the stack location through the compiler to the output, including the alloca being promoted.

See: https://llvm.org/docs/SourceLevelDebugging.html#debugger-int...


Julia has an interesting split here, it does the lowering into SSA from in pure Julia and then has a codegen steps that translates the SSA from into LLVM IR, but for that second step we do use the C++ API. We have very robust bindings to the C-API, but it forever feels just a bit incomplete and less cared for. The C-API is very stable, whereas the C++ API does change quite a bit.


But you cannot use the C-API for symbols/methods. You need a C++ callback for that.


I would avoid any text, but LLVM has mature bindings in ocaml and Haskell, for example. The textual representation isn't stable IIRC, and it adds a step in between you and your already lumbering backend.

Ultimately the C++ API isn't too difficult to use but LLVM mandates a fairly hardcore level of C++ knowledge to play with it's internals.

Quick Tip: if you're thinking "Holy shit how do I get from [complicated] all the way down to IR instructions" lower the big thing to a something simpler so you can reuse the code to generate IR from that - for example, a foreach loop is expressible as a for loop under the hood, now you only have to be compile for loops. This would usually be done in the AST itself.


Regarding interface stability: Indeed, the textual representation is not stable, things like added types in the representation of some instructions can happen when upgrading to a new version. However, to be entirely honest, in the last few years of updating LLVM-based research tools to newer LLVM versions, changes in the C++ API that required me to (sometimes just slightly) change my code happened a lot more often than changes in the textual representation...


I'm not an expert, but there are C bindings: I was able to play around with a toy compiler[1] in Lua using lualvm[2]

I also know of at least one compiler[2] that actually emits textual IR, and then builds and links .obj files from that with the LLVM toolchain...but I think that's just a bunch of work, would be hard to debug, and generally just a bad idea.

1: https://github.com/chc4/solar/blob/master/src/jit.lua 2: https://github.com/gilzoide/lualvm which I actually had to fork to https://github.com/chc4/lualvm for a small bugfix 3: https://github.com/FeepingCreature/fcc/blob/master/llvmfile.... by 'feepingcreature


Hey, author of the post here. Do I think the C++ API is important? For most languages no. The OCaml bindings in my case were almost sufficient, but I planned to do some memory fences and other operations in my language that the OCaml bindings didn't have.

In hindsight, it's probs better to choose OCaml bindings and then link in any special instructions you need from C++ if you need to.


Regarding this post in particular, I chose to document everything in terms of the C++ API as that's the native API. You can use any of the other bindings, and just translate the syntax across to your language.


> Are the bindings for other languages usable?

Yes, I've done several compiler projects using Haskell and LLVM.

That said, not all the bindings were always maintained and up to date and together with LLVM's lack of API stability, there was a significant amount of churn work related to updating from one LLVM version to another. I had to build and install an older version of LLVM that would work with the bindings, several times.

Note: this was years ago, situation may have improved.

I understand that LLVM API doesn't change significantly between versions any more so the work required to update the bindings to a newer version shouldn't be huge for the maintainers of the bindings. But for an end user like me there was quite a lot of manual steps to get my project and the dependencies building.

There's always the option of emitting LLVM IR by writing text, but that doesn't give you the ability to do JITting and a REPL and so on.


I'm currently on a compiler creation course with llvm in University.

Generating it in text form is really simple. Doing just that with Rust. Actually, writing it by hand is too.

You can use llvm-as to convert text form to bytecode and then lli to interpret it (or use one of the other tools to compile it).


I've had success with the llvmlite Python binding. It has a Python-native utility to help build the intermediate code, and then it internally emits text and uses the llvm C api for codegen.

Both the text format and the C api are alledgedly more stable than the C++ api, so this may be a usable pattern in general. The text format is very well documented in my experience.

One downside to using text is an extra emit-and-parse-back step, but unless your code is huge, it's more than fast enough (and it falls away against optimization anyway).




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: