I don't understand why it is limited to 255 chars. The kernel copies the string(s) into the programs memory so it would be a kernel bug if the program got a non null-terminated or too long string.
More importantly this program has a bug in that it doesn't check if there is an argument passed to it at all.
Good effort but can improve a lot. I would praise the documentation but it is rather imprecise. All in all i wouldn't put it on the front page of HN yet.
This is great feedback, which I plan to use to improve the echo program. I'm just learning (on my own), and I figured I would just post my progress and I would get some feedback; it worked!
echo is far from finished, and it's safe to say "I don't know what the hell I'm doing", but hey, I gotta start somewhere.
Maybe the 255-char limit is a feature? If this "fast echo" is meant to be used in a script that writes entries to a log where you wouldn't want long text anyway, or something like that... So having a known upper bound for the output size can be useful.
> I don't understand why it is limited to 255 chars. The kernel copies the string(s) into the programs memory so it would be a kernel bug if the program got a non null-terminated or too long string.
But you can also pass arguments to execve(2) which are not null-terminated.
The kernel copies the strings you pass in the array of pointers. (haven't checked though, but it is better then the alternative of not copying and dealing with the mess)
Maybe only a few pages remain as programs don't inherit memory from their parents. It could be done for those strings but consider that mappings are in 4k pages (so the rest of the page would have to be cleared to 0).
Serious answer: Because many developers regard assembly as some sort deep magic only understood by elder gods.
This, of course, comes from some vague (and not entirely correct) understanding of "assembly" running beneath everything else, and thus being fundamental, yet not immediately useful to a large category of developers today. Hence it seems important but archaic. Archaic + difficult = elder knowledge.
I've actually had a few coworkers think I'm some sort of elder god when I find the root cause of subtle bugs that would've either required deep knowledge of the C++ standard, or not-as-deep knowledge of Asm. These are bugs that others have spent many hours staring at the source and stepping through in a debugger without any better idea of why they occur, but are solved in minutes by a glance at the Asm. IMHO if you are working with native code at all, it's a very useful skill to have.
Even though it was a bit of a "sufferance", I enjoy having been full circle somehow. Starting with Java OOP in college, then went lisp maniac [1], then ml/FP. Which were all somehow further away from the machine, in a way. But at the same time lisp model seems a fairly thin layer over raw asm. And you realize that primitives of computing: arithmetic, logic, iterations.. are very similar whatever the language or paradigm. I then learned a bit about continuation, non determinism, compilation and now I'm almost free. A language is mostly an encoding. Most of them speak about the same things but in a different clothing.
Not 100% free, I think I need to finish my compiler training and forth bootstraping before I can claim that.
I can't really suggest others to follow the lisp, ml, prolog road though, so I'll just state what I wrote above.
[1] SICP especially, with its gradual pedagogy. From substitution, to environment, to register machines. You can see the relationships up close.
From my experience most of the hard to trace errors come from uninitialized variables and they are usually valgrindable. It is VM-based so it can cache jumps and other conditions that depend one uninitialized vars via taint analysis.
Yes. From experience, many developers, namely newly-graduated college students from not-so-rigorous programs, have little idea of Assembly. The same applies with theoretical computer science (Turing Machines, FSMs, PDAs etc.), algorithmic analysis and fundamentals of computing hardware (flip-flops, half/full adders, basic CPU design).
It looks like Kelsey didn't write a lot of assembly before. There are quite a few things you either wouldn't do -- like `cld` for no reason -- or most people (and compilers) would do otherwise -- e.g., `xor ebx, ebx` instead of `mov ebx, 0`.
Yes, this is my first assembly program. I had to look up every instruction and it took me hours to understand even the basics, but it was worth it. I have a much better understanding of x86 assembly and plan to write larger programs to continue learning in 2017.
I went with 32 bit because all the examples were 64 bit so I forced to learn the nasm and ld flags to get my program to compile, link, and run. I also learned a lot about the different registers available to 32 and 64 bit programs.
What caught my attention was the segment register use --- besides the fact that Linux runs processes in flat mode, the more common way to es = ds is push ds; pop es.
That said, it does look better than compiler output and distinctly has the style of hand-written Asm; the 3 pops at the beginning, for example, would be something no compiler I've seen can do. (Minor "optimisation" --- rethinking your register use can eliminate some superfluous moves.)
The instruction `repne scasb`[1] stood out. `repne X` means "while (not equal) { X; }". How is `repne` implemented? Is `repne scasb` assembly shorthand for a `scasb` then a `jne`? Or is `repne` some fancy higher-order instruction which takes another instruction as its argument?
The latter. The various REPxx prefixes cause a string instruction like SCASB to be repeated until some condition is satisfied.
These date back all the way to the 8086/8088. They were the fastest way to do string operations on those early CPUs, but I don't think this the case on modern CPUs.
That's a really interesting document - I am going to spend some time studying it, thanks!
Good point about the code size. I imagine there are likely to be cases where that would let some algorithm run faster overall because it fits in the instruction cache, even if the string operation considered on its own is slower.
> They were the fastest way to do string operations on those early CPUs, but I don't think this the case on modern CPUs.
Modern CPUs still have a fast path for string operations, and I recall hearing that they even had some improvements not too long ago (in Sandy Bridge or other recent arch).
CPUs may be smart enough to detect a memcpy done with a loop, but REPxx is the preferred way - even on modern CPUs.
The cost of fork+exec of a separate binary will make even the most efficient possible external echo slower than the shell builtin, I suspect. (This is why echo is a builtin in the first place, though there's no requirement for it to be so.)
The source code includes this notice: "Copyright 2017 Google Inc. All Rights Reserved."
I wonder if the couple of dozens of lines of assembly code could be trivial enough to be public domain. Assuming a straightforward implementation, surely there is far less freedom in expressing the simplest version of the echo program in ASM compared to, say, C?
surely there is far less freedom in expressing the simplest version of the echo program in ASM compared to, say, C?
I'd say it's the opposite, since it is often the case that more instructions (and thus ways to select and arrange them) are required to express an operation in Asm compared to an HLL like C. This implies that there is room for more creativity when e.g. writing a "Hello world" in Asm vs. C.
I'm guessing that the author wants to be sure not to get in trouble with their legal department.
My contract has a similar clause (all copyright assigned to employer) but it's void because my local (non-US) legislation overrides it. Not that I want to go head to head with our legal dept to test whether it holds.
And for syscalls that also have a meaningful return value, the ABI requires that valid errno values fall in the range -1 to -4095, to disambiguate them from any possible return value. Those values can't conflict with valid userspace pointers (since they'd point into kernel space), and syscalls must not allow them to conflict with valid numeric return values.
In fact, it's puzzling to me why the errno mechanism was even conceived, as it seems to offer no advantages over returning errors directly (does anyone happen to know?)