Messing with data significantly outside of SQL is often asking for trouble. SQL ...

lmz · on Sept 15, 2022

> Not only that, but database engines are improving all the time, so the same code you wrote which declares your desired transformations tends to get faster over time and there is nothing to update or refactor. The transformations you write are sort of timeless because they are strongly decoupled from the implementation and hardware.

Only true up to a certain extent. Counterexample: performance regressions due to changes in the query planner / its input statistics. Changes aren't always positive and logically equivalent plans can have very different perf characteristics.

orangepurple · on Sept 15, 2022

Fully agree. I am mostly on the analytical side. When my client uses Snowflake it's usually smooth sailing because it's so automated and performant. When I have my own analytical Postgres instance on my local machine I tune costs and memory consumption parameters in postgres.conf but I only rarely run into major gotchas. If my client uses IBM... I go for a walk on the beach or go out to lunch when I launch my query.

Your point about equivalent plans can have very different perf characteristics is very true. I always try to review the query plan with EXPLAIN if my query takes more than a minute and rewrite the logic if necessary.

adamddev1 · on Sept 14, 2022

Very cool. So you write it with solid, declaritive SQL and you can trust that it will be rock solid and optimizied. Need to learn SQL instead of just doing NoSQL all the time. Thanks for the explanations.

orangepurple · on Sept 14, 2022

I load JSON into Postgres all the time these days for analyzing data for a government client and use Postgres JSONB operators to untangle and index it. JSONB because I don't care about preserving the original string representation of the record (strings in fields are still completely intact).

Although I heavily lean on psql with \COPY and stdin or stdout pipes to a compressor like zstd (psql can natively pipe CSV to and from any program reliably even in Windows) I found loading JSON records to be extremely frustrating this way.

Whatever you do NEVER use pipes in Powershell. They don't stream. They buffer in RAM fully into your computer crashes. Microsoft is insane.

Since you use NoSQL you can write a very short Python program that uses psycopg2 directly to load a list of dict as JSONB rows into a single Postgres table with one column (I call mine "record")

At that point you can basically use Postgres as a NoSQL database and structure the records if you want using a VIEW.

We're in the process of documenting their use of JSON for financial records keeping as a design defect.

ithrow · on Sept 14, 2022

Their existing imperative code operations actually produced the wrong results (subtly) frequently or their imperative code depended on the order of the data returned from the database (undefined behavior). Yikes.

That just sounds like very sloppy programming.