Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have two nearly-identical tables, both shaped like (foreign_key some_data_type, name varchar2, value varchar2). (Before you say "that means you should use No-SQL", this is a staging environment for loading data into a claims handling system. Which is built in a language that comes with a relational db built in.)

They both have about 50M rows, with statistically identical data. I was running near-identical large queries on both, with the same execution plan (nested-loop join with index lookups), and getting vastly different timings.

This being a staging environment, these tables are re-populated by truncating them and running an ETL tool. What turned out to be happening, is that in one case the source query on the ETL tool was sorted by the foreign key, and in the other it wasn't. So in one case all those fetch-by-index-lookup operations added to to essentially a partial table scan, and in the other they added up to what you'd expect where blocks would be fetched in random order and probably re-fetched after falling out of cache.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: