Hacker News new | past | comments | ask | show | jobs | submit login

Heh... That takes me back.

I remember a puzzle in an old demoscene document: Assuming EAX contains all zeros except for a byte value in AL, what is the shortest number of instructions needed to copy AL's value to the other three bytes in EAX?

The puzzle was hard not because the task was particularly difficult but because the audience had spent so long optimizing assembly for speed that the hopelessly inefficient one-instruction answer would not occur to them.

(Answer: http://home.sch.bme.hu/~ervin/codegems.html#17)




According to the Intel IA-32 Optimization Reference Manual, integer multiplication has a 3 cycle latency. What's "hopelessly inefficient" about 3 cycles?

( Source: http://download.intel.com/design/processor/manuals/248966.pd... )


On earlier-model processors, it could be as bad as 40 cycles. You really didn't want to touch multiplication (or division!) back then. I recall programmers went as far as calculating memory offsets for a 320x200 screen as (x + (y << 8) + (y << 6)) instead of (x + y * 320).

http://home.comcast.net/~fbui/intel_i.html#imul




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: