I think you hit the nail on the head. Data structures are so much more important than throwing more threads at the problem. Someone could write beautiful lock-free code but choose a ring buffer (lock free queue) instead of a concurrent set and it's all for not.