Hacker News new | past | comments | ask | show | jobs | submit login
Unexecute (emacshorrors.com)
195 points by ingve on July 28, 2016 | hide | past | favorite | 116 comments



Saving initialized data structures into an executable was the traditional way to build large Lisp systems, and was a built-in capability of PDP-10 operating systems in the 1970's. When I was a student at Utah porting PSL (Portable Standard Lisp) to Vax Unix around 1981 we noticed that there was no such capability available. For a while our workaround was to dump a core file by sending SIGCORE (^\) to the process, then start (resume) our system in a debugger. Spencer Thomas, who was also a student at Utah at the time, wrote the function he named "unexec()" to give us a more sensible path to the same functionality. exec() takes a file and turns it into a process, unexec() takes a process and turns it into a file. This code served our needs very nicely at the time, allowing us to load compiled Lisp code into a bare interpreter and save a complete system. Later, this code was incorporated into GNU Emacs for essentially the same purpose.

At the time, building these systems took several minutes, so it really wasn't feasible to expect users to just load everything they needed on startup. It is highly non-portable, of course, and has caused headaches for Lisp builders ever since. Amortizing startup time over a larger amount of work is still the only portable solution I know, along with keeping initialized application state in databases rather than in-memory data structures.


And really, this is just (a hackish implementation of) an image-based runtime, ala Smalltalk. All ELISP is missing is a big list of all the globals it needs to care about saving and restoring (so it can not save all the random other memory-garbage it happens to still be holding onto), a serialize()/deserialize() pair of functions to run those through that result in a standard on-disk representation, and a boot strategy involving deserializing those structs into memory.

If you want to be fancy, you can make the on-disk VM-image format a database (SQLite, LevelDB, whatever) so as to avoid writing it all out every time. Then it becomes cheap enough to write out a differential state that you can make the runtime do it automatically at intervals, after certain operations, manually with a sync(1)-equivalent call, etc.


> If you want to be fancy, you can make the on-disk VM-image format a database (SQLite, LevelDB, whatever) so as to avoid writing it all out every time.

I took this approach in a game engine I developed at one point. Common Lisp has a very general meta-object protocol that allows you do things like this transparently (see e.g. [1]). I believe I used Berkeley DB as the backing store, which supports in-memory caching of objects. With this approach, I didn't need an explicit save-file format, everything was just "there" on disk, automatically. As far as that was concerned, it was pretty cool.

Unfortunately, it was not fast. At one point, I did an experiment where I ripped out the DB and replaced it with a in-memory hash-map implementation. This was about 10x faster, (despite the supposed in-memory caching at the DB layer). I got an additional similar speedup when I ripped out the meta-classes for the objects.

Turns out, these abstractions are expensive. Writing nice sequential code on compact in-memory data structures has substantial benefits (if you want performance).

[1]: https://common-lisp.net/project/elephant/


Was the DB cache a write-through cache or a write-back cache? Memory-canonical persistent databases (e.g. Redis) and disk-canonical persistent databases (e.g. SQLite) have very different persistence strategies. Only the memory-canonical type can really be used sensibly to persistently back (or, really, partially-crash-restore) a OLTP process's "hot spots." Basically you want the same characteristics for such persistence that you want for a logging engine—nonblocking behavior being first and foremost.

EDIT: "There are currently three different data stores that support the Elephant API: Berkeley DB, Postgresql via the postmodern library, and any database supported by the CLSQL library including SQLite3." — so, write-through, then.


I had a similar realisation recently. I had to learn Smalltalk recently (for my new job, believe it or not!), and Smalltalk really does strike me as image-based programming find right. My previous exposure was Common Lisp, but the image and the source code getting desynchronized was a recurring pain. After deleting a function, but missing some uses for example, the code might work fine until you reloaded the source into a clean image. In Smalltalk, that doesn't happen, because the image is the code.


That's because Lisp isn't image based at all. It merely has a live environment, much to the frustration of anybody who wants to dump the running state of their lisp system to disk.


Yeah, that's true. Nonetheless, Smalltalk really feels like an improved version of working with Lisp.


That's true. Although I still say you'll have my sexprs when you pry them from my cold, dead hands :).

Although, I wonder if I could hack together a lisp on top of the cog vm. That would be cool.

Anyways, what kind of awesome job do you have where you get to write st for a living? :)


The Lisp Machine keyboard had dedicated open and close parenthesis keys, so you could hold a hefty bag of nitrous oxide in your other hand while you typed s-expressions.


Wait, what are you doing with that nitrous oxoide? Should I start worrying?


Reflecting on the S-expressions I just typed!


Well, when you're done with the nitrous oxide, just toss whatever you wrote into the obfuscated code contest.


There's a reason the paren keys had really fast auto-repeat!


I've been sort of thinking of fiddling with a Lisp on the cog vm for a while. That vm does a bunch of things I want (including the images people are talking about here).


That was done all and more in the Interlisp-D system.


I ported unexecute to GNU Make about ten years ago.

I was working in a project in which a "make" would load a huge tree full of rules scattered in sub-makefiles, and take a full 30 seconds to evaluate before kicking off the first incremental build, really putting a damper on the edit-compile-test incremental cycle.

I got sick of this and so took the unexecute code from GNU Emacs into GNU Make, and added an option to do "make --dump" to dump an image after loading the rules. The restarted make image would kick off a build recipe almost instantly.


If you're ever in such a situation again, Ninja (https://ninja-build.org/) was designed to have semantics similar to Make but load build files much faster (~seconds even for megabytes of build files).


FAKE [1] the F# build tool uses a similar technique on the .net vm.

[1] http://fsharp.github.io/FAKE/


Do you happen to have a link to the code in FAKE that implements this for .NET?


Do you have this patch somehere? Sounds useful, even if only for elf.


It's pretty sad to see someone trashing a process checkpoint and restart facility because they've never seen one before. As other commenters have added, checkpoint-and-restart is a pretty useful facility for a variety of reasons; it's a shame that Unix doesn't support it better.

I imagine if the author of this article looked at a Unix kernel they would start arguing that we should remove context switches because they don't understand them.


You can kind of simulate checkpoint w/restart using memory mapped files. I've seen some pretty large late 90's/early 2000's era systems do this for their "in memory" databases.


I have wondered before if emacs could use CRIU on Linux to do this ( https://criu.org/Main_Page ).


I've looked into criu. It's usable, but not as good as unexec and not yet properly supported from packagers. Even if only the build step needs it. And it doesn't produce a single binary, which is ok I guess for emacs.

Easier IMHO is to keep maintaining a proper malloc implementation. glibc were not able to update to ptmalloc3 anyway, and now they want to destroy ptmalloc2 even more. I wouldn't trust them.


Great idea, but please show some respect. Emacs was first designed in 1976, when startup times were a very big deal.

"My .emacs is older... than most engineers..."


I fucking love the .emacs community.

To keep pushing a project from the 70's and keep it current is quite a feat.

Rebuilding from scratch is its own challenge, but to keep maintaing something at this scale and have such quotes be realized by hundreds of people is awe-inspiring.


Even if I rather use IDEs, Emacs was eventually my rescue in the 90's after failing to find anything on UNIX that could somehow resemble the Borland IDEs I enjoyed using.

This were the days when fvwm was considered new.

So even today, if my IDEs aren't around, I get to use my old friend Emacs.


My short-lived experience with Java and Android made me wary of IDEs. It taught me an important lesson: If a language's IDE can autogenerate 90% of the code in "hello world" for you, without knowing anything about the program you are about to write, than that language has far too much boilerplate.

I've never loved IDEs, because they increase the amount of time between having an idea and starting to code it, whereas with emacs, I can just start writing code.

The only IDE that is any good is the Smalltalk IDE, because it's not so much an IDE in the java sense as it is a realtime window into the soul of your application and environment. I mean, if you thought the modern LISP's realtime interaction was good...

But yeah, even then, I start to miss emacs. And I'm not even a serious emacs user. Which is to say, my config can probably fit on only a few pages, and I don't know the 100+ set of basic keyboard shortcuts yet.


I used Emacs for around 10 years, so I do know what it is capable of.

Regarding Lisp many in the FOSS camp that never experienced commercial Common Lisp IDEs should give a try, the REPL is only the tip of the iceberg. Maybe Racket is the closest one can get without paying.

Back in the mid-90's I couldn't even get Emacs to do what Borland got me with their tools or even what I later learned Xerox environments were capable of.

Energize C++ with their custom Emacs was probably the closest thing that one could get, if the company could afford it.


>Regarding Lisp many in the FOSS camp that never experienced commercial Common Lisp IDEs should give a try, the REPL is only the tip of the iceberg.

You've said. However, I am unsure as to whether any Lisp implementation outside of the lisp machine truly allowed for programming inside a live environment, such that there is no distinction between the live environment and the code on disk, where the image IS your environment.

Although being able to serialize your live environment to disk goes a long way. As a schemer, I'm still drooling over that particular feature.


The bigger Lisp IDEs have a different idea. They use the running image as an information system about all the sources it has seen, which means, which it has loaded and/or compiled. Location of source, arguments, who calls who and from where, documentation snippets, ... When you load and compile, this data is continously updated.

See: http://lispm.de/genera-concepts

LispWorks or Allegro CL are in that direction.

It's not as radical as the InterLisp-D system, which is in many ways similar to Smalltalk - when in fact many ideas in Smalltalk are coming from Interlisp and its earlier BBN Lisp, including part of the runtime technology, which enables dumping/restarting images.


Here is how it used to be back in the day on the Mac.

http://basalgangster.macgui.com/RetroMacComputing/The_Long_V...

Allegro Common Lisp now has an express edition:

http://franz.com/products/allegrocl/acl_ide.lhtml

http://franz.com/downloads.lhtml

Also for a bit of time travel, there are quite a few documents from Interlisp-D at Xerox PARC available:

https://archive.org/details/bitsavers_xerox?and[]=subject%3A...

On Xerox, images are called symbolic files.


'the lisp machine' did not do that. There were different attempts. The MIT Lisp Machine used Lisp code in files, with some tool support.

The Xerox Lisp Machine (different hardware, different Lisp, different OS) used a different approach will full managed source code in the image, with managed files as kind of a way to persist sources.


Ah, apologies. And yes, I know there were multiple lispms.


With android it's not just boilerplate code but the undocumented tooling involved as well. It might be to late for you now, but I broke some of it down into something more understandable a while ago:

http://flukus.github.io/2014/08/19/2014_08_19_Android-Take-b...


Well, yes, there is that. Great article, by the way. But the awful boilerplate had a lot to do with me dropping it. If I build an android app, it will be either PhoneGap/Cordova, React Native, or some other tooling.


In which way in an IDE you can't start coding straight away? I always start coding straight away in Idea or Visual Studio.


shrugs

There's a different workflow. I always feel like I'm fighting the IDE, the instant I do anything that is even slightly off the beaten path. And they're complicated: everything about the IDEs that I've used seems to be an overcomplicated mess of settings, options, and menus, whereas in emacs, I could just put a single config option into my init.el to do what I want instead of jumping through all those hoops just to get to the setting. This is a problem compounded by the fact that Java, the only language I've used in an IDE, has a complicated development environment and multiple complex build systems, meaning you HAVE to fiddle with things to a fairly large extent.

To further things, it's rare for an IDE to be as programmable and configurable as emacs, or even close. It's considered above and beyond the ordinary to write extensions for most IDEs: the job of an extension is to add some big feature, like support for a new programming language. My first emacs "extension" was a simple function that wrapped around some actions I found myself constantly using in sequence, so I could type one command sequence instead of six. It wasn't something I wanted to dedicate a macro to, I just wanted a command to exist that would do what I wanted.

And then, 5 minutes later, I had one.


So I have burnt out and tried sticking to a Scala class on Coursera. But to be honest, the Scala IDEs were not the problem, not the tooling as everyone jokes.

I have run back to ensime and sbt-mode. I have an old Thinkpad but with 8GB memory. The class flipped between IntelliJ or Eclipse depending on platform. I find them exhausting. They consume a ton of resources (as IDEs go, Emacs and piped in tools can be 8 megs and swap all damn day and still cut through them in performance and speed).

There is just too much going, despite having nice auto-complete and error check defaults, they struggled to pick up Scala compiler bin paths, even at defaults, and there are just so many damn menus. What did emacs, and later helm perfect? M-x and search all the damn options. Even every OS on the desktop imitates this stuff now! I know Intellij, to a lesser extent Eclipse can do that, but they are so damn busy with GUI tiles and buttons and UI choices I cannot be bothered to wade through the chunkiness of it all, even if it means I will have the potential for superior coding. I feel it is the lazy way out.

I am slowly eeking my way to intermediate Emacs use, and I keep forcing myself because all the GUIs and lack of consistency cannot make up for the minimalist, trim your own topiary of combined tool bliss that is Emacs for me!


I'd give Intellij another try. It is heavy, but you gain a lot from having your IDE understand your static language. And it's really easy to set it up to look noise-free; I almost never touch the menu.

Double shift (to lookup every project symbol) and meta-shift-a (to lookup every IDE function, like M-x) gets me almost everywhere, aside from navigating along definitions/implementations/usage sites which is the reason to use an IDE. Toggle distraction free mode (via meta-shift-a) to get rid of everything beside the open file.

I think the typical way to set it up is to manage the Scala runtime itself (maybe via SBT? I don't even know off-hand), so having it pick up an existing installation is kind of non-standard. Can't remember when I've ever manually downloaded Scala.


I used it for a Java class. I like to build up instead of strip down. I will stick to Emacs, thanks.

Emacs can do those things, I just have to selectively decide what and how and when. I also do not like dependence on proprietary software. Yes, I know there is a free copy. But when they try cool new UI or feature X and I must suffer through it, I yearn for the consistency of Emacs for three decades for a reason. I have been doing Windows sysadmin for approaching a decade. W10 imposing changes I hate on me was the last straw: at home, open source or nothing. Haha.

Yes, re SBT. I tried both methods (system-level install of SBT and Scala with pacman and ~/.sbt or whatever install), it went so-so.


For what it's worth, IDEA Community Edition is free as in speech; Apache 2.0 license.


Embarassing. I stand corrected then!


Just a tiny little thing: Once you have SBT there's no need to actually install Scala. (Though, obviously, you may want to for other reasons if e.g. ENSIME requires a standalone Scala install. I wouldn't know, being a happy IDEA user :))


To be honest, I have to thank Google's work on Android Studio and Gradle for making me enjoy Eclipse again.

As for Scala I don't know how good the support really is, as I never managed to use anything besides Java for production code on the JVM.


Anroid studio isn't based on eclipse...


I didn't said it was, do I need to explain the satiric remark?


...And the light dawns.

I'm an idiot.


I started coding using IDEs some years ago, but after the quick Emacs tutorial, I never went back to GUIs for programming. I like the simplicity of working with text (a terminal) to create text (code). I feel like I can automate anything in the world in Emacs.


I am very visual in terms of computing so I rather use IDEs, but I was a daily Emacs between 1996 and 2005.


> "My .emacs is older... than most engineers..."

:-)

Does this quote have a source?


- asah

Until proven otherwise. >.<


Respect really isn't Schneidermann's thing.


This isn't specific to Emacs; TeX is another notable example. Generally, this was a popular technique to dramatically improve start-up time on DEC PDP-10 and -20 machines. There was support from the OS as well as the language runtimes to make it work (for instance, saving an image of a running program that had fd's open would still have to know they weren't open by the time the saved image was restarted).


I worked with this recently. This feature is not a nightmare, just because the author doesn't understand what a linker does, and the necessary separation of old and new dynamic memory.

It's rather a stable and very useful feature, which just recently got under attack, because glibc doesn't want to maintain malloc_get_state() / malloc_set_state() anymore. XEmacs has a portable dumper pdump, which is a hack compared to emacs unexec.

See https://lwn.net/Articles/673724/ and esp. https://lwn.net/Articles/673815/

I recently re-added unexec support (i.e. native compilation) to perl5 in my cperl fork, but I haven't got it stable yet. Super trivial on solaris, but not so easy on elf, darwin and windows with its various compilers and the different way to treat their segments. But it's still the easiest way to do it, compared to pdump or a seperate compiler or criu, which is still not in the kernel and not in debian. They are saying it's unstable for 2 years, where it's stable for 1 year already.

self-dump via crui is besides unexec the most stable variant, but it needs either a service or root perms, first of all a package, and then it's not so attractive because it produces many files instead of just one binary.

If glibc removes malloc_get_state() even if darwin still has a similar API, I'll happily build with a static ptmalloc3, which is the better variant of the glibc ptmalloc2 anyway, and they never where able to update this. (much faster, but needs a bit more memory for housekeeping).

https://github.com/perl11/cperl/issues/176

https://github.com/perl11/cperl/commits/feature/gh176-unexec

https://criu.org/Main_Page


It's not really valid to critique a compiler for being too system dependent.

Being used to this layer of abstraction being hidden in ld(1) doesn't mean that reimplementation of it is wrong, just perhaps an ill-advised maintenance burden.

An appeal to ASLR is a bit fallacious - that technology developed for C's deficiencies, including ISAs tailored for it. There are likely better ways than using an untyped language that begs attackers to forge object handles, and then kludging around that by making attackers guess.


Apparently someone looked into the FreeBSD temacs commentary on the status report on the HN landing page today ...

https://www.freebsd.org/news/status/report-2016-04-2016-06.h...

https://news.ycombinator.com/item?id=12178766

I love Lisps, but to an amateur with rudimentary infosec coursework this does scream scary.

I LOVE THIS SITE. Hello early weekend entertainment reading, emacshorrors.com ...


All Lisp systems basically work in a manner similar to this, by dumping their 'boot image' to disk so it can be loaded and worked with, as an interactive image. Even systems like Smalltalk do similar things. A setup like this is not particularly unusual in concept, although some of the exact specifics may be different than e.g. SBCL.

ASLR is also a weak defense by itself as I noted in that FreeBSD thread. Any ASLR-enabled application is, essentially, about 1 infoleak away from being no different than a non-ASLR application. If you want to stop exploits, invest in real mitigation tech, not cheap defenses, and suddenly you won't need to worry about this so much. (Feel free to add randomization on top of working defenses as an extra layer -- just not by itself.)

People always hem and haw over how this makes ASLR not work for XYZ, and thus is a 'security nightmare'. But then, very strangely, we still find that it's very possible to write all kinds of exploits that bypass usable ASLR anyway in a variety of applications, with only a single infoleak, coupled with a vulnerability, at only marginally higher work effort. ASLR does not actually eliminate a class of vulnerabilities, it only adds an extra step in the process of exploitation. It can only truly prevent a narrow class of exploits, under very specific constraints.

So, it does seem like there's a real security nightmare happening, but it's almost certainly not because random XYZ thing lacked ASLR at compile time. It's because we invest our 'faith' in stop-gap, mostly futile defense mechanisms that are obsoleted without much extra effort, normally.


One big thing that ASLR mitigates is non-interactive exploits. In a lot of applications, you can only send a payload once, and can't modify the payload after the fact (for example, vulnerabilities in image file processors). This is a common point of entry, and ASLR makes exploiting the underlying bugs much harder.

So I wouldn't call ASLR a weak defense - it closes off a lot of exploitation avenues by itself, and it can make exploiting interactive situations quite a bit harder. Finding that second infoleak bug isn't always quite so trivial.


I can tell from experience that the Linux implementation of ASLR however is completely worthless. Why? Because the executable you launch itself isn't randomized. The executable must be completely trivial for there to not be enough usable gadgets to defeat ASLR.

The Windows implementation is actually better in this regard since executables are randomized as well as libraries. However the randomization is the same for all processes and only changed on boot (because libraries on Windows usually uses relocations rather than PIC so the pages wouldn't be shareable if they were randomized per process), so an infoleak in one process can be used to attack another.


I do get that, as an amateur Lisper.

I have played with CCL and SBCL on and off for a while. Ironically, the AUR package I used Clozure Lisp with does not even exist anymore on Arch AUR repos.

https://aur.archlinux.org/packages/ccl-bin/?comments=all (That won't work)

I get the trade off, but if you read SBCL dev blogs, like PVK, you will recognize really competent programmers and elegant solutions.

https://pvk.ca/

The problem is these gods (I revere them) cannot possibly encode all the hacks and knowledge. They have to code, and the nuances of such "hacks" are internalized by them and forgotten. Ironically, a lot of shit talk on Lisp transpires here, and it is one of the most sophisticated developer-productive tools I have seen, with many laudable features other toolchains brag about. Lisps had them for years. This is not to be an elitist jackass; the cultural shift to obsolence (sadly, my view) is why only diehards know and others avoid implementation details and functionality: we don't use it, so we don't care.

Either way, dead code is scary, especially in such scenarios with binary image dumping and bit fiddling hacks. I was ironically aware of such issues with SBCL when I stumbled on Gentoo Hardened list info when people explained how Lisps have trouble in such constrained environments.

https://archives.gentoo.org/gentoo-hardened/message/d2fb14f1...

I worry later on such methods leave blobs of weak executable routines and other items of interest to get a foothold once the unsung heroes of now move on.

On the other hand, it is painfully obvious to me higher level languages, with or without memory control, allow more flexbility, depth, and alternatives to avoid buffer overflows and other counterpoints made in other threads here. The devil is in details that fly over my head at the speed of light.

Life is a tradeoff. The fact that the world uses FreeBSD (WhatsApp) and ASLR is not fully implemented means, yeah, it is complicated.

It is scary to me, a guy who shits himself at macros and could not code a rudimentary one-pass compiler if you held a gun to my head.


Why people are so afraid of self-modifying code? It enables some cool hacks, it is no more and no less secure than non-self-modifying code (as long as it is properly contained), and, well, this what makes computers and programming interesting rather then limited to boring table lookups and finite state machines.


> as long as it is properly contained

Why are people so afraid of C buffers? They enable great performance, and they are no more and no less secure than bounds-checked buffers (as long as the array indices are properly contained), and well, this is what makes computers and programming run fast rather than being forced into slow interpreted execution.



Self-modifying code is difficult to understand because by definition the code you are looking at might change without you observing the change. So the source code being the 'source of truth' that we are all accustomed to is not true because another line might modify the one your looking at. This problem is similar to heavy use of monkey patching or OOP patterns where "everything happens somewhere else". I'm not arguing for or against self-modifying code, but it is important to note that there is a cost associated with it.


I usually pay homage to Einstein and call this "spooky action at a distance". Apparently I'm not the only one to have seen the connection:

https://en.wikipedia.org/wiki/Action_at_a_distance_%28comput...

"In computer science, action at a distance is an anti-pattern (a recognized common error) in which behavior in one part of a program varies wildly based on difficult or impossible to identify operations in another part of the program."


It wasn't even self-modifying the last time I looked at the unexec code. It was also fairly easy to port to a new architecture.


I have no opinion about who's right or wrong on this, but the article does give multiple reasons to dislike the code -- calling it an "unportable hack" that "few people... will be able to fix when faced with problems", stating that it doesn't seem to have much value, and suggesting two simpler ways to achieve the same desired effect.

In my reading, the author doesn't even complain that it's a security issue, just that it will likely fail on systems with certain security measures. And the author never says it's bad simply because it's self-modifying; rather, that the self-modification is a "platform-specific reimplementation of a linker".


An unportable hack that was ported to 11 operating systems.

A maintenance burden that is over 20 years old still working fine.


The emacs community might think it's "working fine". The glibc community-- which had to forego many bugfixes and optimizations for years, solely because of emacs-- might disagree.


Oh cut me a break. The glibc "community" consists of a cabal of developers who won't fix broken for good reason and break userland for bad reason all the time, admit no wrong, and bring gaping security holes to entire platforms every couple of years. Uri Drepper was the tip of the iceberg. Nothing changed.

unexec is hairy, but it's mature, greybeard hair. It is no worse than any JIT, for starters.


The glibc community has to maintain an API at first, second improve the implementation. They were not able to improve ptmalloc2 to ptmalloc3 over many years, because they were afraid to add memory footprint for better performance. But they happily added more and more debugging hooks which made malloc even more slower for easier development.

But now they want to deprecate malloc_{g,s}et_state(), without a plan to improve ptmalloc2? They already failed.

Deprecating an API for no good reason is failure, not an improvement. It not only breaks emacs, it breaks other software also. unexec is used in perl5 also, btw. just not in the official perl5 packages.


People fear what they don't understand.


When it comes to programs, and unless you're doing research, you should fear what you don't understand. If I have to maintain a program you put into production without understanding it, I will be more than happy to share the fear after I learn your address.


Then refuse yourself from maintaining it.


I love writing self-modifying code. I just hate everybody else's self-modifying code.


I guess the daily updates on the CVE database have something to do with it.


Self-modifying code does not play nice with the advance of CPU and compiler technologies in the past 20 years or so.


I don't get it either; the article sounded like the guy was in way over his head to me.


The fetch-decode-execute cycle is table lookup.

When you have table lookup, and a table is dynamically modified in such a way that this is influenced by the table lookup, that is as "exciting" as self-modifying code.


> Why people are so afraid of self-modifying code?

Because it's several orders of magnitude more incomprehensible than code that fails a cyclomatic complexity check, is why.


People don't like "go to" either:-)


Code that modifies other code but not itself isn't self modifying code.


Self-modifying code is fine. Dumping the process memory in order to modify it is not. Even more so if the majority of the original program was written in a Lisp dialect.


Some previous discussion here: https://news.ycombinator.com/item?id=11001796


Supposedly this sort of thing is also why Microsoft Word documents were so hard to parse at first - they were just a memory dump of the process and not text at all.



From your link:

>Applies to: Office 2007 | Office 2010 | Open XML | Visual Studio Tools for Microsoft Office | Word | Word 2007 | Word 2010

I feel as though the gp comment is referring to far older versions, although without clarification, it's hard to be sure.


The older versions are also not literal dumps. They're binary "dumps" of the object tree in memory, yes, in a sense that you walk the tree and write it out. This is bad because your in-memory object tree then effectively defines the format, and it's not spec'd otherwise, which makes portability that much harder, especially for a closed-source application where you can't see code. But it's a very different problem.

FWIW, old Office documents were actually CFBF (Compound File Binary Format) files - think of it as FAT-in-a-file, allowing for multiple independent streams inside, with transactions. This was very commonly used on Windows in the OLE/COM era, because it was the underlying format for OLE Structured Storage. It's what allowed a Word document to embed another arbitrary document in an extensible way. The underlying data in the streams within CFBF was a loose object graph dump.

It all makes a lot of sense when you have your OLE glasses firmly on - it's basically a natural design that follows if your world consists of OLE objects and interactions between them. Look up IStorage and IStream to see what I mean.

The side effect of all this, however, is that the data inside an old Office file is not laid out in a logical way - streams consist of non-sequential interleaved blocks in a seemingly random order (depending on what was written when), some blocks may contain garbage data, and so on. So it's very difficult to reverse engineer, which is why it took so long back in the day, and the results were often unreliable.


> FWIW, old Office documents were actually CFBF (Compound File Binary Format) files

That's actually the "new" binary formats. The usage of CFBF seems to have been introduced in Office 4.2 (at least Excel 5.0 is the first Excel version to use them, it's hard to find information about the old Word document file formats).

> The side effect of all this, however, is that the data inside an old Office file is not laid out in a logical way - streams consist of non-sequential interleaved blocks in a seemingly random order (depending on what was written when), some blocks may contain garbage data, and so on. So it's very difficult to reverse engineer, which is why it took so long back in the day, and the results were often unreliable.

I don't believe the OLE compound file format has ever been much of an effort to reverse engineer. But the CFBF based Office documents are also basically just blobs of the older binary formats saved in a more structured way. The issues with Office documents have always been a question about their sheer complexity combined with their tight coupling to the internals of the Office programs. This still shines through in the OOXML formats which contains lots of stuff like "position something the way it was done in Word 5.0".


If you think of emacs as a text editor which happens to have a LISP interpreter, then this is silly. But if you think of it as a LISP interpreter which happens to have a text editor, then it makes a lot more sense.


Like they say, "Emacs is a great operating system, lacking only a decent editor".


Thanks to evil-mode it now has a decent editor too.


> But loading these files with it to “dump” the Emacs binary? The only time I’ve heard of that was when reading a discussion about creating “executables” with Common Lisp which is apparently achieved by serializing the program’s current state to disk.

The very first Lisp implementation did that already. It could dump and read memory images to/from tape. From that on, most Lisp implementations, and not just Common Lisp, are doing it. Some have extensive capabilities in this area (like tree-shaking or generating shared libraries which can be included in programs).



Great follow up to this article - thanks!


> I hope at least Guile Emacs will try getting rid of it.

Guile Emacs is dead, isn't it? No activity in over a year: http://git.hcoop.net/?p=bpt/emacs.git


> Guile Emacs is dead, isn't it?

Emacs-guile is working today. That's a first after many many years of talk and proof-of-concept stage efforts.

I hope the development will pick up speed once guile 2.2 and emacs 25.1 are out. Both projects underwent some big changes/improvements lately.

One sign of live is that bpt's improvements to guile elisp were rebased on a recentish guile-master. This branch lives in the main repository now.

http://git.savannah.gnu.org/cgit/guile.git/log/?h=wip-elisp

Also, I have seen some recent commits from emacs developers to guile. So the two projects are talking to each other.

Now, imho, the success if emacs-guile strongly depends on whether they manage to share the load across several developers. A more open development culture surely would help.


By Emacs standards, it's not dead, it's just sleeping. In fact, it thinks it'll go for a walk...


This article is more than a year old.

> 28/02/2015


I've spent a bit of time looking at this code. It's not really that bad, though if might be a bit of a surprise to find it if you weren't expecting it.

The most serious problem with it I had is that there's something wrong with the makefile dependency checking. For certain types of change you have to do a full rebuild. But I'm pretty certain autotools is scarier than any memory dump, so I just put up with this.


What's a good way for a program to trigger its own coredump? I had the issue on Linux and QNX embedded systems that I was unable to live-debug, and I wanted to be able to retrieve a coredump at various stages of execution. Unfortunately, all I found were CLI utilities that just called GDB, which wasn't available on my target.

I never felt confident to implement the feature directly :\


If you can change the code, you can simply fork() and then abort() on the child. If you attach that as an handler to SIGUSR1/2, you can have core dumps on demand.


Nice! Thanks!


Admittedly I only glanced at this, but why can't the process image in question be dumped to a C string (or .S file if you really want) and then linked with the normal linker?


That's what's need to be done on windows. The other dumpers can do better by just dumping the segments and loader commands, by taking special care of base address adjustments and restoring the old malloc'd heap structures, not be mixed up with the new heap. A normal linker will still need to reload the shared libs, and then adjust the external symbols there.

Your solution needs several seconds, unexec just a few milliseconds.


Sorry, maybe I wasn't clear. I was suggesting doing all of this at build time, not runtime.

Oh wait does this all get serialized to disk every time you exit emacs? Is that what I'm missing? If so, why not run the linker at emacs exit time, to optimize the loading sequence?


unexec is only done once, at build-time, to convert temacs with a bunch of loaded libraries to emacs, which includes those libraries already. So it doesn't need to search them on disc, and compile them.

It's never serialized. It's just dumped. Like a core file, with just proper headers, sections and segments, so that it can be executed. A proper COFF/ELF binary. A core file has all the segments but misses the headers.


Ok, so why can't it just be dumped to a .S file, so the regular toolchain can handle creating the headers, sections, and segments? .S files basically let you create arbitrary ELF images without making you implement the ELF format yourself. I'm really not getting what the custom linker/toolchain in unexec is buying you.


Question: is unexec() more or less portable than CRIU? If CRIU is the more portable of the two (or if they're equivalent in both just basically working only on Linux), could unexec() be dropped in favor of just generating a CRIU process snapshot, and giving emacs a launch wrapper that restores from that snapshot?


Emacs unexec is mature (by decades), ported to 11 operating systems. CRIU is Linux-only, more or less experimental, and not included in mainstream stable distros. Does that answer your question?


This would be really useful to speed up Atom/Electron startup time.


[flagged]


I am a diehard Vim fan. In every IDE and editor I had to use, first thing I would do is to find and install a Vi plugin. I use Vi motions everywhere. System wide. There are cons - I can't work on someone else's machine. I've completely lost ability to do even simplest one hand manipulations even on my own keyboard. Also I can't stand watching anyone trying to open file using mouse or trackpad. Especially that looks extremely pathetic when someone does that during a tech talk, on a big screen. Five seconds to open a file, move or resize a window. Totally pathetic. I love Vim. It makes me feel empowered. One day though I woke up with a shocking realization. Emacs is better Vim than Vim.


> Emacs is better Vim than Vim.

That's a big payload to drop as the last sentence of a post; surely it deserves some justification. Why do you feel that it is so?


You can't sell Vim to non-Vim muggles just by talking about how awesome Vim is. Vim's power needs to be discovered, learned and appreciated by an individual on his/her own. Same thing can be said about Emacs. For a very long time I was totally convinced I won't be able to find every single feature of Vim (that I love so much) in Emacs. I started using it anyway - I was curious. For about a year I would have Vim, Emacs and InteliJ (with IdeaVim plugin) - all three running together. Building my own Evil-driven config was difficult. And then I found Spacemacs. Next thing I remember - I pledged myself to stay in Emacs and work without leaving it and switching back to IntelliJ and Vim even if it takes days to find the right solution. By the time when Jetbrains changed their licensing model and my InteliJ license expired I couldn't care less anymore. And then I got a new machine, and for the first time ever - I broke my ritual of preparing new machine, instead of installing Vim and setting the right config, I started with Emacs. It's been six months. My Vim config still pristinely default. And I don't even care.


Vim has its own share of horrors lurking in the codebase.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: