A question for the LLVM experts here: In the past when I've looked at LLVM from ...

jws · on Dec 26, 2020

I emit LLVM IR as text in my compiler. It’s painless, almost. Some points off the top of my head…

- I started out just printing strings, but found I wanted a bit more structure so I wrote a dozen or so functions to do the printing. Things at the level of call(name,rags). All make a single line of IR. It helps keep the typos down.

- just run your IR through clang to compile it. It has good enough error messages that you won’t go mad.

- you will need to make a bunch of DI metadata to get debug information into your programs. It took me about 8 hours to get enough that I have line number information and LLDB will show me functions and source lines on a backtrace. I should have done this much earlier than I did. I was getting by with line number comments in my IR, which is simple, but useless if you Give yourself an access violation.

- learn to write little C programs and use clang to emit the IR. This will let you sort out how to do things. The IR manual is good, but there are concepts which are omitted because they were obvious to the authors, but won’t be to you.

- local symbols have to be in strict, ascending, no skipped values numerical order. Screw that. Grab a dollar sign and make your own numeric local-ish symbols and don’t sweat reordering them.

- it doesn’t seem to care what order you emit things in, other than lines in a block of course, so just do what is easy for you.

- get a grip on getelemtptr and phi. The first may not be what you think it is, learn what it is. The second is just part of the magic of SSA.

mrathi12 · on Dec 26, 2020

> learn to write little C programs and use clang to emit the IR.

Highly recommend this. I used this to get an understanding of how to implement the IR for my language.

mrathi12 · on Dec 26, 2020

The command to use is `clang -S -emit-llvm -O1 foo.c`

It'll write it out to a foo.ll file.

(I use -O1 so it cleans up a bit of the messy parts of the IR).

broken_symlink · on Dec 26, 2020

Can you print variables in lldb with the debug information?

One compiler that I use which emits llvm ir added support for debug information recently and its now possible to set breakpoints in gdb but you can't print out any stack variables or anything so its not useful other than figuring out which code paths execute.

I'd like to learn more about this. Maybe contribute to the compiler and fix this issue.

jmorse2 · on Dec 26, 2020

> I'd like to learn more about this. Maybe contribute to the compiler and fix this issue.

You need to create a call to the `llvm.dbg.declare` intrinsic that connects the stack variable alloca to a DILocalVariable metadata node, and place the call after the stack variable alloca. The rest of LLVM will handle tracing the stack location through the compiler to the output, including the alloca being promoted.

See: https://llvm.org/docs/SourceLevelDebugging.html#debugger-int...

vchuravy · on Dec 26, 2020

Julia has an interesting split here, it does the lowering into SSA from in pure Julia and then has a codegen steps that translates the SSA from into LLVM IR, but for that second step we do use the C++ API. We have very robust bindings to the C-API, but it forever feels just a bit incomplete and less cared for. The C-API is very stable, whereas the C++ API does change quite a bit.

rurban · on Dec 26, 2020

But you cannot use the C-API for symbols/methods. You need a C++ callback for that.

mhh__ · on Dec 26, 2020

I would avoid any text, but LLVM has mature bindings in ocaml and Haskell, for example. The textual representation isn't stable IIRC, and it adds a step in between you and your already lumbering backend.

Ultimately the C++ API isn't too difficult to use but LLVM mandates a fairly hardcore level of C++ knowledge to play with it's internals.

Quick Tip: if you're thinking "Holy shit how do I get from [complicated] all the way down to IR instructions" lower the big thing to a something simpler so you can reuse the code to generate IR from that - for example, a foreach loop is expressible as a for loop under the hood, now you only have to be compile for loops. This would usually be done in the AST itself.

ritter2a · on Dec 26, 2020

Regarding interface stability: Indeed, the textual representation is not stable, things like added types in the representation of some instructions can happen when upgrading to a new version. However, to be entirely honest, in the last few years of updating LLVM-based research tools to newer LLVM versions, changes in the C++ API that required me to (sometimes just slightly) change my code happened a lot more often than changes in the textual representation...

chc4 · on Dec 26, 2020

I'm not an expert, but there are C bindings: I was able to play around with a toy compiler[1] in Lua using lualvm[2]

I also know of at least one compiler[2] that actually emits textual IR, and then builds and links .obj files from that with the LLVM toolchain...but I think that's just a bunch of work, would be hard to debug, and generally just a bad idea.

1: https://github.com/chc4/solar/blob/master/src/jit.lua 2: https://github.com/gilzoide/lualvm which I actually had to fork to https://github.com/chc4/lualvm for a small bugfix 3: https://github.com/FeepingCreature/fcc/blob/master/llvmfile.... by 'feepingcreature

mrathi12 · on Dec 26, 2020

Hey, author of the post here. Do I think the C++ API is important? For most languages no. The OCaml bindings in my case were almost sufficient, but I planned to do some memory fences and other operations in my language that the OCaml bindings didn't have.

In hindsight, it's probs better to choose OCaml bindings and then link in any special instructions you need from C++ if you need to.

mrathi12 · on Dec 26, 2020

Regarding this post in particular, I chose to document everything in terms of the C++ API as that's the native API. You can use any of the other bindings, and just translate the syntax across to your language.

exDM69 · on Dec 26, 2020

> Are the bindings for other languages usable?

Yes, I've done several compiler projects using Haskell and LLVM.

That said, not all the bindings were always maintained and up to date and together with LLVM's lack of API stability, there was a significant amount of churn work related to updating from one LLVM version to another. I had to build and install an older version of LLVM that would work with the bindings, several times.

Note: this was years ago, situation may have improved.

I understand that LLVM API doesn't change significantly between versions any more so the work required to update the bindings to a newer version shouldn't be huge for the maintainers of the bindings. But for an end user like me there was quite a lot of manual steps to get my project and the dependencies building.

There's always the option of emitting LLVM IR by writing text, but that doesn't give you the ability to do JITting and a REPL and so on.

cube2222 · on Dec 26, 2020

I'm currently on a compiler creation course with llvm in University.

Generating it in text form is really simple. Doing just that with Rust. Actually, writing it by hand is too.

You can use llvm-as to convert text form to bytecode and then lli to interpret it (or use one of the other tools to compile it).

tijsvd · on Dec 26, 2020

I've had success with the llvmlite Python binding. It has a Python-native utility to help build the intermediate code, and then it internally emits text and uses the llvm C api for codegen.

Both the text format and the C api are alledgedly more stable than the C++ api, so this may be a usable pattern in general. The text format is very well documented in my experience.

One downside to using text is an extra emit-and-parse-back step, but unless your code is huge, it's more than fast enough (and it falls away against optimization anyway).