Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Firstly, let me say you're most likely correct. I do find finger trees useful in functional languages as well and for creating immutable structures with them.

I want to add some general comments about anyone thinking about using a finger tree or anything else.

Give any potential data structure a try with real data, not contrived stuff and benchmark on the actual target hardware and in a real code path if possible. Further, try it with real-world access patterns like adding, removing, ordering, etc. at different times at different sizes - the results can be interesting. Even if something does well for one use-case, the other use-cases you need it for it may perform awful, so sometimes it's a compromise. Data structure performance is quite often not performance for a single operation, but a function of several operations. This makes it even more important to test it with real data when possible and access it different ways.

Moreover, too many times I see positive or negative benchmarks where someone is filling a data structure with exactly the same integer, or the same instance of an object or something like that. In practice, the underlying data can introduce other considerations based on the size, pointer chasing, memory allocations (especially if the data structure can dynamically resize) and so on in just about every language. Factor in hardware, and the results can change even further. Also ensure your benchmarking itself isn't flawed - we've been there and discussed it ad-nauseam and yet I continue to see bad benchmarking in the wild (particularly on the JVM or CLR).

As a related example, I've found that even the famed link list can suck in real-world. I've always been a "choose the right data structure for the job" kind of guy, and yet sometimes it's more deceptive than it seems. You might say, oh, my access patterns are exactly a linked list, and yet a simple array with lots of helper cruft code will smoke it in practice. Likewise, I've seen shocking results replacing something with a skip list or other data structure that seems like it should perform worse, but depending on the data, size, hardware, and implementation, I get interesting results. I've seen plenty of stupid things like someone who picked a hashtable for fast lookups, only to iterate through it in some other code path for some other use-case accessing the same data, completely murdering overall application performance due to selecting based on a single criterion. If you're using a language where you can create your own allocators, it can become even more interesting than you think - i.e. there are gains or losses to be had.

The moral is simple: "just try it." This comes up a lot in game programming for example and I've seen it in embedding programming have huge effects as well. I can't even begin to explain how much faster I've made things just by changing a data structure because of real-world conditions vs. lab conditions. Don't just assume that picking what the textbook says in the Big O table is what actually will dictate performance, positive or negative. There's a reason most good data structure books also contain explanation regarding behavior and performance, though even books don't and can't explain the real-world nuances when working with non-contrived data.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: