aaand here we go again. DB guy with 25+ years experience. Summary: it depends. >...

Akronymus · on Dec 30, 2022

There's also the possibility of filtering each source table first, then doing an inner join. Which can VASTLY cut down on computation. I assume GP assumed doing an outer join first, then filtering.

But those are details for the database engine to handle. And, as you said, indexes

_a_a_a_ · on Dec 30, 2022

FYI for others, such filtering is called predicate pushdown (I believe also called predicate hoisting sometimes). Example (and this is trivial but for illustration)

   select * from (select * from tbl) as subqry where subqry.col = 25

would be rewritten by any halfway decent optimiser to

   select * from (select * from tbl where tbl.col = 25)

(and FTR the outermost select * would be stripped off as well).

Good DB optimisers do a whole load of that and much more.

Akronymus · on Dec 30, 2022

Yeah, had to get quite well acquainted with query execution plans and the like a few years ago (And forgot most of it by now) because of diagnosing a SLOW query.

Joining onto either table a or table b is something that REALLY trips optimizers up.

jteppinette · on Dec 30, 2022

Wow, this comment comes across as being incredibly arrogant while providing zero value. nOOb lol

_a_a_a_ · on Dec 30, 2022

I thought I was being informative. I can't give hard&fast rules because (drumroll)... it depends. So I have tradeoffs to consider, and indexes got mentioned.

How else could I have posted better? Honest question.

jteppinette · on Dec 30, 2022

Because you didn’t actually refute anything the GP said, and gave bad advice, all while being incredibly negative and arrogant.

> this mostly clueless advice

> strong opinions put forth by someone who doesn't have the necessary experience, or understanding of what's going on under the hood

> I'm tired of suchlike n00b advice strongly (and incorrectly and arrogantly) expressed on HN

You continue to just say it depends without giving any actual scenarios. You make it sound like magic, but it’s not: “under x and y, do z except when u” is better than “it depends, I’m sick of all these noobs”.

Also, your main points are against denormalization and avoiding large table joins which are 100% rational arguments under certain workloads.

_a_a_a_ · on Dec 30, 2022

I refuted what he said by pointing out that 1E9 x 1E6 = 1E15. A billion row table denormalised with a million row table = 1000 trillion row table. How big's your disk array? How are you going to ensure correctness on update?

His was stupid advice and had it should not have been given.

> You continue to just say it depends without giving any actual scenarios

it depends. Use your common sense and then use a stopwatch, is a good start. There are entire shelves of books on this, I won't repeat them.

> You make it sound like magic, but it’s not:

absolutely true!

> “under x and y, do z except when u” is better than

it's a multidimensional problems inc. memory size, disk size, the optimiser, sizes of particular tables joined, where the hotspot is, cost of updates of non-normalised tables, etc. I can't give general advice from here.

> Also, your main points are against denormalization and avoiding large table joins which are 100% rational arguments under certain workloads.

I said "Denormalising is a useful tool that IME rarely gains you more than it loses you,"

I don't accept your criticism.

jteppinette · on Dec 31, 2022

That’s not what denormalize means, how long have you been doing this again?

_a_a_a_ · on Dec 31, 2022

True, you normalise/denormalise data not tables as such; tables pop out of a normalisation process and denormalisation collapses them together. Perhaps if I'm still wrong you could put me right. And don't just point at the wiki article on it, please be specific.

To your question, probably longer than you but I've always more to learn.