Echo – Assembly program that prints the first positional argument to stdout

aiur3la · on Jan 2, 2017

Serious question: why is this on HN front page? Am I missing something?

pavlov · on Jan 2, 2017

I feel it's kind of interesting as a minimal Unix program that does something useful without linking to the C library, just with syscalls.

Even for echo, this one is extremely minimalist: first argument only, and a maximum of 255 characters.

gens · on Jan 2, 2017

I don't understand why it is limited to 255 chars. The kernel copies the string(s) into the programs memory so it would be a kernel bug if the program got a non null-terminated or too long string.

More importantly this program has a bug in that it doesn't check if there is an argument passed to it at all.

Good effort but can improve a lot. I would praise the documentation but it is rather imprecise. All in all i wouldn't put it on the front page of HN yet.

kelseyhightower · on Jan 2, 2017

This is great feedback, which I plan to use to improve the echo program. I'm just learning (on my own), and I figured I would just post my progress and I would get some feedback; it worked!

echo is far from finished, and it's safe to say "I don't know what the hell I'm doing", but hey, I gotta start somewhere.

unixbhaskar · on Jan 2, 2017

That's the spirit Kelsey! keep it up.

pavlov · on Jan 2, 2017

Maybe the 255-char limit is a feature? If this "fast echo" is meant to be used in a script that writes entries to a log where you wouldn't want long text anyway, or something like that... So having a known upper bound for the output size can be useful.

cyphar · on Jan 2, 2017

> I don't understand why it is limited to 255 chars. The kernel copies the string(s) into the programs memory so it would be a kernel bug if the program got a non null-terminated or too long string.

But you can also pass arguments to execve(2) which are not null-terminated.

gens · on Jan 2, 2017

The kernel copies the strings you pass in the array of pointers. (haven't checked though, but it is better then the alternative of not copying and dealing with the mess)

cyphar · on Jan 3, 2017

The memory mapping is the same before and after execve(2), so I don't think it needs to be copied. I'll take a look though.

gens · on Jan 4, 2017

Maybe only a few pages remain as programs don't inherit memory from their parents. It could be done for those strings but consider that mappings are in 4k pages (so the rest of the page would have to be cleared to 0).

Bladtman · on Jan 2, 2017

Serious answer: Because many developers regard assembly as some sort deep magic only understood by elder gods. This, of course, comes from some vague (and not entirely correct) understanding of "assembly" running beneath everything else, and thus being fundamental, yet not immediately useful to a large category of developers today. Hence it seems important but archaic. Archaic + difficult = elder knowledge.

userbinator · on Jan 2, 2017

I've actually had a few coworkers think I'm some sort of elder god when I find the root cause of subtle bugs that would've either required deep knowledge of the C++ standard, or not-as-deep knowledge of Asm. These are bugs that others have spent many hours staring at the source and stepping through in a debugger without any better idea of why they occur, but are solved in minutes by a glance at the Asm. IMHO if you are working with native code at all, it's a very useful skill to have.

agumonkey · on Jan 2, 2017

Even though it was a bit of a "sufferance", I enjoy having been full circle somehow. Starting with Java OOP in college, then went lisp maniac [1], then ml/FP. Which were all somehow further away from the machine, in a way. But at the same time lisp model seems a fairly thin layer over raw asm. And you realize that primitives of computing: arithmetic, logic, iterations.. are very similar whatever the language or paradigm. I then learned a bit about continuation, non determinism, compilation and now I'm almost free. A language is mostly an encoding. Most of them speak about the same things but in a different clothing.

Not 100% free, I think I need to finish my compiler training and forth bootstraping before I can claim that.

I can't really suggest others to follow the lisp, ml, prolog road though, so I'll just state what I wrote above.

[1] SICP especially, with its gradual pedagogy. From substitution, to environment, to register machines. You can see the relationships up close.

ycmbntrthrwaway · on Jan 2, 2017

Everyone who works with native code, not just C or C++, should at least understand how linkers and loaders work.

ycmbntrthrwaway · on Jan 2, 2017

Once you are tired being praised, teach them some valgrind. It will solve most of their problems.

userbinator · on Jan 2, 2017

I'm pretty sure the bugs I found would not have been valgrind-able as they were unrelated to memory errors.

ycmbntrthrwaway · on Jan 2, 2017

From my experience most of the hard to trace errors come from uninitialized variables and they are usually valgrindable. It is VM-based so it can cache jumps and other conditions that depend one uninitialized vars via taint analysis.

unixbhaskar · on Jan 2, 2017

Cool ! perfect explanation.

cshep · on Jan 2, 2017

Yes. From experience, many developers, namely newly-graduated college students from not-so-rigorous programs, have little idea of Assembly. The same applies with theoretical computer science (Turing Machines, FSMs, PDAs etc.), algorithmic analysis and fundamentals of computing hardware (flip-flops, half/full adders, basic CPU design).

activatedgeek · on Jan 2, 2017

I think this is a pretty interesting piece of code. Something trivial via non-trivial (something that we don't do everyday) set of calls.

ZeljkoS · on Jan 2, 2017

I agree, nothing special. For comparison, "colpinsky" draws color changing Sierpinski Triangle in only 16 bytes, less than this echo :D

https://www.youtube.com/watch?v=Qw5WLk9IeX0

https://www.pouet.net/prod.php?which=62079

__michaelg · on Jan 2, 2017

It looks like Kelsey didn't write a lot of assembly before. There are quite a few things you either wouldn't do -- like `cld` for no reason -- or most people (and compilers) would do otherwise -- e.g., `xor ebx, ebx` instead of `mov ebx, 0`.

Besides, 32 bit (⊙＿⊙')

kelseyhightower · on Jan 2, 2017

Yes, this is my first assembly program. I had to look up every instruction and it took me hours to understand even the basics, but it was worth it. I have a much better understanding of x86 assembly and plan to write larger programs to continue learning in 2017.

I went with 32 bit because all the examples were 64 bit so I forced to learn the nasm and ld flags to get my program to compile, link, and run. I also learned a lot about the different registers available to 32 and 64 bit programs.

userbinator · on Jan 2, 2017

What caught my attention was the segment register use --- besides the fact that Linux runs processes in flat mode, the more common way to es = ds is push ds; pop es.

That said, it does look better than compiler output and distinctly has the style of hand-written Asm; the 3 pops at the beginning, for example, would be something no compiler I've seen can do. (Minor "optimisation" --- rethinking your register use can eliminate some superfluous moves.)

jamesfisher · on Jan 2, 2017

The instruction `repne scasb`[1] stood out. `repne X` means "while (not equal) { X; }". How is `repne` implemented? Is `repne scasb` assembly shorthand for a `scasb` then a `jne`? Or is `repne` some fancy higher-order instruction which takes another instruction as its argument?

[1]: https://github.com/kelseyhightower/echo/blob/53d84ea4e79db3d...

Stratoscope · on Jan 2, 2017

The latter. The various REPxx prefixes cause a string instruction like SCASB to be repeated until some condition is satisfied.

These date back all the way to the 8086/8088. They were the fastest way to do string operations on those early CPUs, but I don't think this the case on modern CPUs.

https://www.google.com/search?q=intel+rep+prefix

http://wiki.osdev.org/X86-64_Instruction_Encoding#REPNE.2FRE...

https://courses.engr.illinois.edu/ece390/archive/spr2002/boo...

tptacek · on Jan 2, 2017

They're still the x86-64 implementation for things like strlen on a bunch of platforms. They're not always the fastest, but have code size advantages.

http://agner.org/optimize/optimizing_assembly.pdf

Stratoscope · on Jan 2, 2017

That's a really interesting document - I am going to spend some time studying it, thanks!

Good point about the code size. I imagine there are likely to be cases where that would let some algorithm run faster overall because it fits in the instruction cache, even if the string operation considered on its own is slower.

exDM69 · on Jan 2, 2017

> They were the fastest way to do string operations on those early CPUs, but I don't think this the case on modern CPUs.

Modern CPUs still have a fast path for string operations, and I recall hearing that they even had some improvements not too long ago (in Sandy Bridge or other recent arch).

CPUs may be smart enough to detect a memcpy done with a loop, but REPxx is the preferred way - even on modern CPUs.

pvitz · on Jan 2, 2017

I also remember them being called "software machine gun" by Duntemann in his assembly book. Using REP STOSW could save you 5 instructions.

userbinator · on Jan 2, 2017

How is `repne` implemented?

It's a microcode loop. Here is the appropriate pseudocode (this is extracted from the Intel manuals):

http://qcd.phys.cmu.edu/QCDcluster/intel/vtune/reference/vc2...

bdcravens · on Jan 2, 2017

I'm curious as to the why. Kubernetes doesn't keep Kelsey busy enough? :-)

burke · on Jan 2, 2017

Based on his twitter it looks like it was just an interesting thing to learn over the holidays or as a 2017 thing.

mistaken · on Jan 2, 2017

I guess it's because the echo command included in the shell was slow.

pm215 · on Jan 2, 2017

The cost of fork+exec of a separate binary will make even the most efficient possible external echo slower than the shell builtin, I suspect. (This is why echo is a builtin in the first place, though there's no requirement for it to be so.)

jkmcf · on Jan 2, 2017

Kelsey exists purely to make the rest of us feel lazy :)

whym · on Jan 2, 2017

I wonder if the couple of dozens of lines of assembly code could be trivial enough to be public domain. Assuming a straightforward implementation, surely there is far less freedom in expressing the simplest version of the echo program in ASM compared to, say, C?

userbinator · on Jan 2, 2017

surely there is far less freedom in expressing the simplest version of the echo program in ASM compared to, say, C?

I'd say it's the opposite, since it is often the case that more instructions (and thus ways to select and arrange them) are required to express an operation in Asm compared to an HLL like C. This implies that there is room for more creativity when e.g. writing a "Hello world" in Asm vs. C.

exDM69 · on Jan 2, 2017

I'm guessing that the author wants to be sure not to get in trouble with their legal department.

My contract has a similar clause (all copyright assigned to employer) but it's void because my local (non-US) legislation overrides it. Not that I want to go head to head with our legal dept to test whether it holds.

wfunction · on Jan 2, 2017

Doesn't echo print all the arguments?

kelseyhightower · on Jan 2, 2017

It will as I continue to work on echo until it's "finished". This is the result of 4 hours of learning and writing my first assembly program.

eriknstr · on Jan 2, 2017

Indeed it should.

tlholaday · on Jan 2, 2017

echo returns nonzero if it cannot write ...

touch foo.txt; chmod 400 foo.txt; echo ouch > foo.txt; echo $?

.., but it appears this asm returns zero always.

tomjakubowski · on Jan 2, 2017

How do "raw" system calls pass back error information on Linux? errno is strictly a C/POSIX abstraction, right?

dom0 · on Jan 2, 2017

They just return the negative error number.

Eg. -EINVAL.

JoshTriplett · on Jan 2, 2017

And for syscalls that also have a meaningful return value, the ABI requires that valid errno values fall in the range -1 to -4095, to disambiguate them from any possible return value. Those values can't conflict with valid userspace pointers (since they'd point into kernel space), and syscalls must not allow them to conflict with valid numeric return values.

userbinator · on Jan 2, 2017

...which I think is a far superior system to errno. I commented on this before somewhat recently: https://news.ycombinator.com/item?id=13062421

In fact, it's puzzling to me why the errno mechanism was even conceived, as it seems to offer no advantages over returning errors directly (does anyone happen to know?)

andreiw · on Jan 2, 2017

Now rewrite in IR =))