Hacker Newsnew | past | comments | ask | show | jobs | submit | chimerasaurus's commentslogin

I’m never leaving Seattle.

iykyk


Disclaimer - I am James on this[1] blog.

Yesterday we announced Polaris specifically so (1) customers don't get locked into a catalog; (2) people know Snowflake works with AWS, Azure, Confluent, etc.

1: https://www.snowflake.com/blog/introducing-polaris-catalog/


This [1] says Snowflake was also bidding to buy Tabular.

[1] https://www.cnbc.com/2024/06/04/databricks-is-buying-data-op...


Awfully many coming soons in that article


No doubt. Ask my team how thrilled I am whenever we say "coming soon."

Narrator: It made him die inside.


I’d take issue with the “Iceberg is slow” theme that Databricks in particular has tried to push.

If that were true, Snowflake would not be as fast on Iceberg/Parquet as its native format. The engine makes something fast or slow, not the table format.

Disclaimer - am at Snowflake.


Back when were choosing between the three formats about 1.5 years ago, Iceberg was definitely the slowest. If the situation has changed since then, I would love to see an updated comparison.

We tested all three of them using Spark batches that converted a stream of changes into SCD2.


I agree with the points you make above.


It’s written by a group -really- trying to make one of them a thing, even though it’s in decline, so just have that lens for anyone reading it.


Disclaimer - work at Snowflake. Two quick points to mention.

1. Snowflake has always used blob stores + file data + metadata. Architecturally it’s actually always been very Lakehouse-y

2. Parquet and Iceberg should be equivalent in performance and features. It’s more than playing nicely - it’s more choose your own adventure where all things are equal.


As a note, Iceberg also supports AVRO in addition to Parquet (and ORC).


I talk to customers basically all day about table formats. Only one customer has really brought up Hudi in a meaningful way. IMO, Hudi is basically out of contention for 95%+ of people looking at table formats.


I’ll just point out on the Snowflake side, we’ve been very public saying we want Iceberg/Parquet to be at or as close to parity as possible with our native format. The value add is the platform, not lock in. That also forces us to be the best on open formats, which IMO is also a good thing for everyone.

Disclaimer: I work at Snowflake literally on this with my team. :)


> we’ve been very public saying we want Iceberg/Parquet to be at or as close to parity as possible with our native format

Thats great to hear. Would this mean that external iceberg tables would have the same performance as native table ? My impression of parent comment was that, eventually there would be no such thing as 'native format'. Really interested to see public statements by snowflake to that effect, would love to share that with my team.



CNN is not relevant for anything


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: