Hacker News new | past | comments | ask | show | jobs | submit login
Emacs Bytecode Internals (2014) (nullprogram.com)
120 points by noch on Dec 27, 2016 | hide | past | favorite | 25 comments



Nice article, but I was surprised to read that array accesses are not range checked for speed concerns. The emacs byte code interpreter is not known to be a very fast one, I doubt that the range checks would make a significant difference in run time. If anything, byte codes should bring high levels of safety.


I think this is a difference in the point of view between the Java/C# word and the Lisp/Elisp/Scheme/Racket word.

In the Java/C# word, you are expected to get some libraries or shared code distributed as bytecode, without been able to see the source code. So the bytecode must ensure safety, so an application can run untrusted code.

[I'm more familiar with Racket.] In Lisp/Elisp/Scheme/Racket word you usually don't get shared code as bytecode, you get the source code. (An executable can be actually some bytecode packaged conveniently with the interpreter/JIT, but an executable can do whatever it likes, like formatting your hard disk.) So it's enough to ensure safety in the source code.


There's a JIT branch in the official upstream emacs repository. Just checkout nick.lloyd-bytecode-jit after cloning emacs from git and run `./configure --with-jit` once you have libjit (originally a part of GNU dotNET) to give it a try. libjit doesn't install a .pc file so you'll have to explicitly set the LIBJIT lib and C ./configure flags if you install libjit from source.


If you want to try out that branch, do note that lisp functions are either explicitly JIT compiled with the `jit-compile' function or JIT can be enabled globally by setting `byte-code-jit-on' to non-nil.

In my own testing I've found that global JIT seems to not help very much, and may actually be slower because of repeated compilations. Selectively compiling specific functions can give a decent speedup, though.

Also, check out Burton Samograd's emacs-jit[1], which uses a very similar technique.

[1] https://github.com/burtonsamograd/emacs-jit/


This branch being in the official repository, I took it to be favored. Do you know what's going on?


Python bytecode can segfault too, doesn't detect stack underflow/overflow (can probably write to weird places with LOAD_FAST/STORE_FAST too)


Yup, and you can create your own bytecode from within Python and immediately call it. Since the bytecode is imperfectly documented, I ran into some 'fun' problems debugging my Python-in-Python compiler.


the performance hit is 20% typically. it is significant.


Article might need a 2014-tag.


Added. Thanks!


"People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity."

How dare they! I will use that disassembler with my dog like curiosity.


As long as it doesn't involve dog-like comprehension[1]. Or butt-sniffing.

[1]: https://www.flickr.com/photos/sluggerotoole/153603564/


"Byte-code compilation is an underdocumented — and in the case of the recent lexical binding updates, undocumented — part of Emacs"

"People do not write byte-code; that job is left to the byte compiler. But we provide a disassembler to satisfy a cat-like curiosity."

If I was an emacs fan I'd be weary of this. What happens if the devs are hit by busses? What happens to emacs? No documentation means very little stability.


Bytecode is not a novel idea and I wouldn't expect it to be difficult at all to map the well-known concepts to the particulars of the Emacs implementation.

Documentation of source code is often overrated IMO. Most software isn't difficult to understand if it's at least somewhat well-structured - and when the structure is poor, documentation doesn't help much.

It's usually the application domain that's hard to understand, because that's where global invariants and assumptions live.

(Don't get me wrong: documentation of module boundaries is great, particularly if there are many users of the module, up until you get to APIs, where documentation is essential for a decent experience. Documentation of the innards of software, not so much.)


It was probably a novel application when it was first implemented in emacs 1985-ish (going by the bytecomp.el header comment by JWZ). First editor bytecode? First lisp bytecode? First interpreter bytecode?

O-code predates it by a couple decades for general purpose compilation: https://en.wikipedia.org/wiki/O-code


Peter Deutsch published a paper on a compact Lisp bytecode in the 70s, http://www.softwarepreservation.org/projects/LISP/interlisp-.... IIRC the Smalltalk-80 bytecode was pretty similar, descended from Smalltalk-76.


The cell counts seem to be off by one in the Appendix examples (REVERSE and SUBST lambdas). I count 35 and 40 conses, respectively. :)


The context of my comment is of course in maintaining the source - the comment I was replying to was concerned with bus numbers.

By the time it was implemented for emacs, many, if not most CS students would have been familiar with p-code as used for UCSD Pascal etc.


This page is actually pretty good high-level documentation for the system. Between it and the source code you'll have little difficulty.

Of course, if you have Emacs you'll presumably also have the manuals. Chapter 16 of the Elisp manual seems pretty thorough.


Source code is the ultimate documentation. ;)


I've started doing literate programming these past two months, and I have to say that I believe this statement less and less.


That's true for a lot of software. Do you use any SaaS? That's the worse.


The only SaaS I use personally are websites I visit. I host my own ownCloud, file storage, contact backups, and when I move into my new place I'll be setting up my own email server as well.

Edit: If you consider DNS as SaaS then I also use that but I don't need it internally for my network to function so I won't call it a dependancy.


Statistically, death is not the danger. I can't think of a single case where a heavily relied-on project caused major issues from a handful of deaths.

On the other hand, developers losing will… The loss of Gmane, for instance, caused major link breakage for months.


That was one of the argue points in some of the recent emacs-devel drama about the object dumper.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: