Hacker News new | past | comments | ask | show | jobs | submit login

The inner loop of those comparisons is indeed the spot where you can still speed up as noted in the last part of the post, the kind of optimizations that you describe are extremely effective but qualify as 'micro optimizations' and I expressly left those out because they impact readability considerably. But, you're right, if that's what it takes then so be it and then readability would have to suffer in deference to the last couple of % of speed. Maximum gain from this optimization relative to the final runtime is about 20% by my estimation. (Inner loop will step 8 bytes at the time, but will have more instructions).



I concur that doing this directly in your code is extremely ugly; but note that you can get this speedup by just dropping in a call to glibc's memchr(), hiding the ugliness behind a well-known interface.


Good point about memchr, I missed an opportunity there.

edit: because it's clearer and faster, that's a no-brainer, updating the blog post with a remark to that effect and a link to parent.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: