Emacs Bytecode Internals (2014)

_ph_ · on Dec 27, 2016

Nice article, but I was surprised to read that array accesses are not range checked for speed concerns. The emacs byte code interpreter is not known to be a very fast one, I doubt that the range checks would make a significant difference in run time. If anything, byte codes should bring high levels of safety.

gus_massa · on Dec 27, 2016

I think this is a difference in the point of view between the Java/C# word and the Lisp/Elisp/Scheme/Racket word.

In the Java/C# word, you are expected to get some libraries or shared code distributed as bytecode, without been able to see the source code. So the bytecode must ensure safety, so an application can run untrusted code.

[I'm more familiar with Racket.] In Lisp/Elisp/Scheme/Racket word you usually don't get shared code as bytecode, you get the source code. (An executable can be actually some bytecode packaged conveniently with the interpreter/JIT, but an executable can do whatever it likes, like formatting your hard disk.) So it's enough to ensure safety in the source code.

throwaway161220 · on Dec 27, 2016

There's a JIT branch in the official upstream emacs repository. Just checkout nick.lloyd-bytecode-jit after cloning emacs from git and run `./configure --with-jit` once you have libjit (originally a part of GNU dotNET) to give it a try. libjit doesn't install a .pc file so you'll have to explicitly set the LIBJIT lib and C ./configure flags if you install libjit from source.

smindinvern · on Dec 28, 2016

If you want to try out that branch, do note that lisp functions are either explicitly JIT compiled with the `jit-compile' function or JIT can be enabled globally by setting `byte-code-jit-on' to non-nil.

In my own testing I've found that global JIT seems to not help very much, and may actually be slower because of repeated compilations. Selectively compiling specific functions can give a decent speedup, though.

Also, check out Burton Samograd's emacs-jit[1], which uses a very similar technique.

[1] https://github.com/burtonsamograd/emacs-jit/

throwaway161220 · on Dec 30, 2016

This branch being in the official repository, I took it to be favored. Do you know what's going on?

__s · on Dec 27, 2016

Python bytecode can segfault too, doesn't detect stack underflow/overflow (can probably write to weird places with LOAD_FAST/STORE_FAST too)

abecedarius · on Dec 27, 2016

Yup, and you can create your own bytecode from within Python and immediately call it. Since the bytecode is imperfectly documented, I ran into some 'fun' problems debugging my Python-in-Python compiler.

rurban · on Dec 27, 2016

the performance hit is 20% typically. it is significant.

khanan · on Dec 27, 2016

Article might need a 2014-tag.

dang · on Dec 27, 2016

Added. Thanks!

mmrezaie · on Dec 27, 2016

"People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity."

How dare they! I will use that disassembler with my dog like curiosity.

macintux · on Dec 27, 2016

As long as it doesn't involve dog-like comprehension[1]. Or butt-sniffing.

[1]: https://www.flickr.com/photos/sluggerotoole/153603564/

gravypod · on Dec 27, 2016

"Byte-code compilation is an underdocumented — and in the case of the recent lexical binding updates, undocumented — part of Emacs"

"People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity."

If I was an emacs fan I'd be weary of this. What happens if the devs are hit by busses? What happens to emacs? No documentation means very little stability.

barrkel · on Dec 27, 2016

Bytecode is not a novel idea and I wouldn't expect it to be difficult at all to map the well-known concepts to the particulars of the Emacs implementation.

Documentation of source code is often overrated IMO. Most software isn't difficult to understand if it's at least somewhat well-structured - and when the structure is poor, documentation doesn't help much.

It's usually the application domain that's hard to understand, because that's where global invariants and assumptions live.

(Don't get me wrong: documentation of module boundaries is great, particularly if there are many users of the module, up until you get to APIs, where documentation is essential for a decent experience. Documentation of the innards of software, not so much.)

imglorp · on Dec 27, 2016

It was probably a novel application when it was first implemented in emacs 1985-ish (going by the bytecomp.el header comment by JWZ). First editor bytecode? First lisp bytecode? First interpreter bytecode?

O-code predates it by a couple decades for general purpose compilation: https://en.wikipedia.org/wiki/O-code

abecedarius · on Dec 27, 2016

Peter Deutsch published a paper on a compact Lisp bytecode in the 70s, http://www.softwarepreservation.org/projects/LISP/interlisp-.... IIRC the Smalltalk-80 bytecode was pretty similar, descended from Smalltalk-76.

kazinator · on Dec 28, 2016

The cell counts seem to be off by one in the Appendix examples (REVERSE and SUBST lambdas). I count 35 and 40 conses, respectively. :)

barrkel · on Dec 27, 2016

The context of my comment is of course in maintaining the source - the comment I was replying to was concerned with bus numbers.

By the time it was implemented for emacs, many, if not most CS students would have been familiar with p-code as used for UCSD Pascal etc.

db48x · on Dec 27, 2016

This page is actually pretty good high-level documentation for the system. Between it and the source code you'll have little difficulty.

Of course, if you have Emacs you'll presumably also have the manuals. Chapter 16 of the Elisp manual seems pretty thorough.

wangchow · on Dec 27, 2016

Source code is the ultimate documentation. ;)

gnuvince · on Dec 27, 2016

I've started doing literate programming these past two months, and I have to say that I believe this statement less and less.

ffggvv · on Dec 27, 2016

That's true for a lot of software. Do you use any SaaS? That's the worse.

gravypod · on Dec 27, 2016

The only SaaS I use personally are websites I visit. I host my own ownCloud, file storage, contact backups, and when I move into my new place I'll be setting up my own email server as well.

Edit: If you consider DNS as SaaS then I also use that but I don't need it internally for my network to function so I won't call it a dependancy.

espadrine · on Dec 27, 2016

Statistically, death is not the danger. I can't think of a single case where a heavily relied-on project caused major issues from a handful of deaths.

On the other hand, developers losing will… The loss of Gmane, for instance, caused major link breakage for months.

GFK_of_xmaspast · on Dec 27, 2016

That was one of the argue points in some of the recent emacs-devel drama about the object dumper.