Many of the reviews on Amazon bemoan the choice of "High Level Assembly" in that book, does this edition use HLA? How does it compare?
Any HNers who've read the second edition care to comment on it? The author responded in the reviews on Amazon and his defense seemed reasonable to me: many students won't become professional assembly language programmers, so why make them struggle more than is necessary.
While you're at it, any alternative recommendations?
I'm still a novice at writing assembly (more interested in code gen and instruction set design than coding assembly by hand), but I've found these helpful:
* http://asm.sourceforge.net/resources.html
* http://www.intel.com/products/processor/manuals/
* http://news.ycombinator.com/item?id=1662430 (good resource, link to HN discussion)
Also, the following tutorial is focused on DOS/8086, but it's really unique in that it's very interactive, and provides you with a fully functional assembly interpreter and 8086 simulator:
Lisp allows inline assembly -- but that doesn't make it a low-level language. C and Forth can do the same, yet no one would confuse them with assembly language either.
Are there any Assembly programmers here? What is still done in ASM? Is it mostly for writing device drivers and such?
It's been YEARS since I've worked in it, and I used to use it for some practical things like controlling the Mouse cursor in DOS and some routines for buffering graphics, stuff I don't need it for at all anymore.
I'd love to also know some practical things I could use Assemble Language programming for in today's world.
Assembly language is very important in the fields of reverse engineering and malware analysis.
In both fields the researchers rarely have the source code for the target software at hand, and so they usually have to disassemble the binary or run it under a debugger and step through the raw assembly language code to understand it.
Malware itself is often crafted in assembly language. And this is another reason why understanding assembly is essential for malware analysis.
Assembly is prominent in the demoscene, where squeezing every last bit of performance out of a system and doing so in a limited amount of space (nowadays often a self-imposed limitation, at least on a typical desktop) is important.
Assembly is also almost always the language used to write the bootloaders for operating systems. A number of operating systems have even been written entirely in assembly language.
It's also relatively popular on many embedded devices and microcontrollers, where you might not have the luxury of using a high level language, and where size and speed considerations are paramount.
Even in programs written in a high level language, critical sections are often rewritten in assembly language for the sake of getting maximum performance.
The importance of small and fast languages like assembly is likely to increase now that Moore's Law (arguably) no longer applies, and now that devices with small computers in them are starting to proliferate.
I'm not sure how popular assembly is in the modern demo scene on "normal" platforms (modern PCs, not retro platforms or embedded\mobile stuff.) The modern demo scene is mostly about cool GPU tricks written in HLSL or GLSL, which have huge advantages for both performance and space (see rendering the world with 2 triangles - http://www.iquilezles.org/www/material/nvscene2008/nvscene20...). It would be cool if the demo scene started focusing more on pushing CPUs instead of GPUs, but that would require adoption of really exotic graphics techniques that for some reason run better on a multicore CPU instead of a GPU, or a focus on something other than graphical effects, like simulation or audio processing. I agree that if you go back 10 or 15 years ago a lot of work was done in ASM, but GPUs have severely reduced that if not killed it outright.
Game engines still use a little assembly, sometimes a lot on the PS3, but often it's written with compiler intrinsics in C or C++. It also tends to be low level SIMD math code for physics, particles, audio or animation, which is stuff your average game programmer never touches.
Professional audio apps use a lot of SIMD but almost always via compiler intrinsics. It's rare that somebody will actually drop down into raw assembly. Thanks to mobile you can't afford any more to assume that x86 is the only instruction set that matters so an extra shim via a library like iOS's Accelerate makes a lot more sense than raw asm.
About 8 years ago I worked on porting NetBSD to a custom MIPS platform. Most of the kernel work I did was in C, but just two things had to be done in assembler:
1. Putting the CPU into 'idle' mode when no processing needed doing.
2. Putting the CPU+memory into 'standby' mode when running on batteries and it hadn't been used for a while.
This second feature was slightly challenging as the memory had to be put into standby mode first, then the CPU put into standby mode, but of course you couldn't access the main memory after it had been put into standby mode, but did have to carry on running instructions to then put the CPU into standby! The solution was to ensure that all the code required to put the CPU into standby mode (and then after wake-up, get main memory back out of standby mode) was in the CPU cache, so it didn't need the main memory to execute.
I've also recently used a tiny amount of assembler to implement computed goto's when using a C compiler that didn't natively support them
I'm fairly sure that people who require insane speed in tight loops also code in assembler, but I've never needed that myself.
> The solution was to ensure that all the code...was in the CPU cache,
Can you give some details for the idly curious (i.e. me).
My naive implementation would be something like:
; do stuff
jmp A
B: STANDBY
; bring memory back up
jmp C
A: ; shutdown memory
jmp B
C: ; continue as normal
Also, I'm surprised a processor with standby capabilites doesn't do all this in hardware rather than relying on the programmer to write an implementation. You call STANDBY, and the processor itself shuts down memory and then shuts itself down. Comments?
Putting the memory and CPU into standby was done by setting/clearing variout bits and various control registers (memory mapped). The code to do this was in a loop that executed twice. A CPU register set to zero the first time round and set to 0xffffffff the second time round, to use as a mask.
All control register changes were then performed in the loop, with changes to the control registers masked by the mask register so that no changes were actually made the first time round the loop, but all code was loaded into the CPU cache, then the second time round the loop changes were made and the memory+CPU put to sleep.
The MIPS CPU was quite nice, because when it woke up after this all state was preserved, so it just carried on as if nothing had happened. And memory contents were preserved through standby as well, so it was all fairly simple.
This is in contrast to some ARM CPUs I've used where when they wake up after sleep, they jump to the reset address, so the boot code has to be aware of sleep/wake operations, which was a bit of a pain.
It can be done in SW and it removes the need to implement it in hardware. Most probably nothing is preserved over a stand-by in a CPU, and when a CPU comes back online, it needs to be set up from scratch.
Intel, ARM, and other chip designers dedicate huge chunks of their chip to vector SIMD processors for a reason. SSE, Altivec, and NEON assembly implementations of DSP functions can easily be up to 40 times faster than a naive scalar version.
And even if you can't take advantage of SIMD, a competent programmer can almost always beat the compiler on small critical functions and loops. Doubly so on architectures where compilers are typically terrible, and instructions are complex and individually very powerful (ARM). Triply so in cases where the compiler simply won't "do the right thing" no matter how much you nudge it, such as aliasing and unpredictable branches.
Vector extensions are terrible: they're too high-level an abstraction because they don't expose individual instructions, and SIMD instructions are "weird" enough that you can't get reasonable performance by expecting the compiler to magically do the right thing. There's no gcc vector equivalent of psadbw, for example, and I doubt the compiler is even capable of generating it.
Intrinsics are passable, but still have two core problems. First of all, the compiler is atrocious at register allocation: it will often spill far more than necessary, and even when it doesn't, it will usually spill the wrong things. Secondly, they're way harder to write than straight assembly, far less flexible, and are nigh-unreadable, so why bother?
I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?
edit: Just realized that you're the x264 guy and it's unlikely you misunderstood me. Still I think my point about psadbw stands.
I'm not sure you looked at what I posted. __builtin_ia32_psadbw is right there on the list of builtins. I've used __builtin_ia32_psadbw128 in GCC myself. It compiles directly to PSADW instructions. Perhaps you confused what I was talking about with GCC's auto-vectorization?
Those aren't gcc vectors, those are intrinsics. Vectors use something like this:
Taking full advantage of many of the specialized instructions (SSE) in an Intel chip requires assembly language.
One of my recent hobby projects rearranges the bits in a large set of data using the SSE instructions, and it's an order of magnitude faster than doing it in C.
"Assembly language" is a bit of a misnomer. What we're talking about is writing hardware instructions and laying out bytes in memory directly. Learning the syntax takes about 5 minutes.
- Low level processor set up: MMU, TLB, caching etc.
- Early stage boot code.
- Using instructions that are not normally accessible in C. For example, SIMD instructions, count leading zeroes, some bitwise operations.
- Interrupt handlers, interrupt masking.
- Entry and exit to low power modes.
- Entry and exit to hypervisor and other such virtualized modes.
- Sequences to turn on/off MMU.
- System calls.
- Locks for mutual exclusion, critical sections.
- Instruction level optimization in some algorithms.
- Anything that requires stack control or setup (packing arguments, green threads, etc).
- Machine code generation in compilers.
Some people prefer inline assembly and a lot can be achieved by C macros and inline assembly. Personally I prefer naked assembly functions in .s file, it is more readable and requires less tricks.
I rarely see assembly being used for performance. Most of the time the use of assembly is limited only to hardware level interaction that is normally not possible with C.
Even though writing assembly is not that common, the need to read is:
- Debugging on-target code with hardware debugger without having symbols and source available.
If you work in embedded space, there's often sections of code in asm - boot, stack manipulation, OS hosting, etc.
Even in fairly high level application code, I've read disassembly to locate and workaround compiler bugs as recently as a year ago. How else do you know if the compiler is doing the right thing?
Many of the reviews on Amazon bemoan the choice of "High Level Assembly" in that book, does this edition use HLA? How does it compare?
Any HNers who've read the second edition care to comment on it? The author responded in the reviews on Amazon and his defense seemed reasonable to me: many students won't become professional assembly language programmers, so why make them struggle more than is necessary.
http://www.amazon.com/review/R3C06U180STE19/ref=cm_cr_rdp_pe...
Edited to add: I answered my own question, I think this does use HLA. http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/www.art...