more oreoftw's comments

oreoftw · on March 16, 2020

IMO it was good for its price. Who else would sell you USA <> EU for $300 for a two way ticket?

dewey · on March 16, 2020

Wow Air, until they went out of business. But it was supposed to come back this year, in light of recent events probably not any more.

oreoftw · on March 3, 2020

Could you name those good analytical databases? I'd love to learn more.

phillc73 · on March 3, 2020

DuckDB perhaps[1]: https://www.duckdb.org

[1] I say "perhaps" because I've only just started using it having migrated from MonetDB, but have no experience of alternatives like Presto.

ptrott2017 · on March 3, 2020

Curious, as a sometimes MOneyDB user, be interested to know why did you choose DuckDB over MonetDB?

phillc73 · on March 4, 2020

Monetdb-Lite disappeared from CRAN and after some investigation, it appears that the development team is now focused on a new product, DuckDB[1]

[1] https://github.com/MonetDB/MonetDBLite-R/issues/38#issuecomm...

wochiquan · on March 4, 2020

Apache Druid: https://druid.apache.org/docs/latest/design/index.html

wochiquan · on March 4, 2020

Use case blog of Apache Druid by Netflix if ya'll want to take a look: https://netflixtechblog.com/how-netflix-uses-druid-for-real-...

georgewfraser · on March 3, 2020

Snowflake, Redshift, BigQuery, Databricks, Presto.

deepsun · on March 3, 2020

I can say for BigQuery and Databricks from personal experience.

BigQuery is much slower and is much more expensive for both storage and query.

Databricks (Spark) is even slower than that (both io and compute), although you can write custom code/use libs.

You seem to underestimate how heavily ClickHouse is optimized (e.g. compressed storage).

derefr · on March 3, 2020

> You seem to underestimate how heavily ClickHouse is optimized (e.g. compressed storage).

Is it any more compressed than Apache Hive's ORC format (https://orc.apache.org)? Because that's increasingly accepted as a storage format in a lot of these analytical systems.

deepsun · on March 3, 2020

Yes, looks like it. According to these posts, ORC only uses snappy or zlib compression, while Clickhouse uses double-delta, Gorilla, and T64 algorithms.

https://engineering.fb.com/core-data/even-faster-data-at-the...

https://www.altinity.com/blog/2019/7/new-encodings-to-improv...

marcinzm · on March 3, 2020

ORC or Parquet are file storage formats so without context their performance can be almost anything. Where is the data stored? S3? HDFS? Local ram disk?

Clickhouse manages the whole distributed storage, ram caching, etc. thing for you.

In my experience, a unified single purpose vertically integrated solution will be faster than a bunch of kitchen sink solutions bolted together.

edmundsauto · on March 3, 2020

Of those, it looks like only Presto is open source and/or free. So maybe it's a presto versus clickhouse comparison, which explains why so many choose clickhouse (it's one of only 2 options in its class).

jfim · on March 4, 2020

Presto is mostly an engine that runs on top of other databases, although it does have its own query execution engine.

The basic idea behind Presto is that it federates other databases, and supports doing joins across them. From what I understand, the problem that it solved at Facebook is bridging the gap between different teams; if a team has MySQL and another has files stored on HDFS, it doesn't really matter because all you do is query Presto and it'll query both under the covers. The alternative is setting up data pipelines, and dealing with the ongoing issues of maintaining those data pipelines.

deepsun · on March 3, 2020

Presto is not really a database, it's the SQL layer on top of many other data storages, like Hive / any other SQL DB / Redis / Cassandra / etc.

barrkel · on March 4, 2020

How well do those work on a single 8GB node? Because ClickHouse works very well at that scale, with a single C++ executable.

There's large complexity and cost overheads to Hadoop solutions, and not everyone has actual big data problems. ClickHouse hugely outperforms on query patterns that would devolve into table scans in a row store, while working at row store volumes of data without a bunch of big nodes.

FridgeSeal · on March 3, 2020

Snowflake doesn’t really keep up with Clickhouse (in my experience) and it costs money.

DataBricks is essentially Spark, and I shouldn’t need a whole spark cluster just to get database functionality. It also costs money.

Unless I’m mistaken, Presto is just a distributed query tool over the top of a separate storage layer, so that’s 2 things you have to setup.

I have no experience with BigQiery but I’ve heard good things about it and Redshift, however but if the rest of your infra isn’t on GCP/AWS then that will probably be a blocker.

Clickhouse is open source, comes with convenient clients in a bunch of languages as well as a HTTP API. It’s outrageously fast and has some cool features and makes the right trade-offs for its use-case, large range of supported input/output formats, built-in Kafka support and the replication and sharding is reasonably straightforward to setup.

deepsun · on March 3, 2020

Also, Presto and Databricks(Spark) is just a layer on top of other storagea, it cannot optimize storage for querying, as you do storage yourself.

bdcravens · on March 3, 2020

According to https://tech.marksblogg.com/benchmarks.html Clickhouse has better performance than 3 of those (the other 2 haven't been tested in that benchmark)

TheTank · on March 3, 2020

I would be cautious using this as a proxy for performance ranking as some items (dataset, queries) are normalized, but the hardware setup is not.

jstrong · on March 3, 2020

the hardware profile is listed in each row, also, the guy is totally meticulous!

quod_2058 · on March 3, 2020

I don't think it's fair to say "A is faster than B" like in the above comments based on the order they appear in a list that mixes GPU clusters and laptops results. The author of the benchmark does nothing wrong deontologically, but the results table seems ordered by time and some people jump to quick conclusion or use it as a way to rank performance when it's not appropriate.

dilyevsky · on March 3, 2020

Github link pls

oreoftw · on March 3, 2020

For analytics - sure. Clickhouse was not designed to handle OLTP workload, there's no transaction support.

1996 · on March 3, 2020

I cringe a bit inside at people using say nosql approaches when it makes literally no sense to do so.

Therefore I think the lack of OLTP will not matter much and that clickhouse will be widely used, but also misused when it becomes too fashionable.

atombender · on March 3, 2020

This makes no sense.

For example, aside from the lack of transactions, Clickhouse is designed for insertion. There's an INSERT statement, but no UPDATE or DELETE statements. You can rewrite tables (there's ALTER TABLE ... UPDATE and ALTER TABLE ... DELETE), but they're intended for large batch operations, and the operations potentially asynchronous, meaning that they complete right away, but you only see results later.

ClickHouse has many other limitations. For example, there's no enforcement of uniqueness: You can insert the same primary key multiple times. You can dedupe the data, but only specific table engines support this.

There's absolutely no way anyone will want to use ClickHouse as a general-purpose database.

1996 · on March 4, 2020

I should have phrased that differently: if something is good enough in some key metric, it extends to other uses - even if it makes a poor fit.

So I insist: everyone will WANT to use clickhouse as a general purpose database, and will create ways to make it so (ex: copy table with the columns you don't want filtered out, drop the original, rename)

It is just too fast and too good for many other things, so it will expand from these strongholds to the rest.

A personal example: I am migrating my cold storage to clickhouse, because I can just copy the files in place and be up and running.

I know about insert and the likes, I have a great existing system - but this lets me simplify the design, and deprecate many things. Fewer moving parts is in general better.

After that is done, there is a database where I would benefit from things like alter tables or advanced joins, but keeping PostgreSQL and ClickHouse side by side, just for this? No. PostgreSQL will go. Dirty tricks will be deployed. Data will be duplicated if necessary.

hodgesrm · on March 4, 2020

Advanced joins (specifically merge joins) and object storage are on the way. See the following PRs:

* https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+mer... -- Recent work to enable merge joins

* https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+s3 -- Same thing for managing data on S3 compatible object storage

There's been a lot of community interest in both topics. Merge join work is largely driven by the ClickHouse team at Yandex. Object storage contributions are from a wider range of teams.

That said I don't see ClickHouse replacing OLTP databases any time soon. It's an analytic store and many of the design choices favor fast, resource efficient scanning and aggregation over large datasets. ClickHouse is not the right choice for high levels of concurrent users working on mutable point data. For this Redis, PostgreSQL, or MySQL are your friends.

joelwilsson · on March 3, 2020

Sure - but the comment you're replying to made no mention of NoSQL. It just said Clickhouse lacks OLTP by design, that doesn't mean it won't be widely used, just that it will perhaps be limited to analytics workloads.

If you need deletes and transactions, look elsewhere, but Clickhouse seems to be great for what it's been designed for.

oreoftw · on Oct 14, 2013

Well, that seemed to me that they're rather looking for senior engineer :) who is a senior engineer after all? They want to have a new team player.

oreoftw · on Oct 13, 2013

For sure Rust has better design than Go, but I don't know any actively developed web framework in Rust. Go may be a better choice because of ecosystem.

oreoftw · on Sept 30, 2013

Is it possible to find a contract for a dev that has no permit to work in UK? For instance, I'm skilled in Scala&Python and can easily visit UK as a tourist/businessman. What's the way to this market?(Scala User Group meetups, anything other)?

acallaghan · on Sept 30, 2013

I'd be in violation of your visa/visa waiver. I have to pay UK corporation tax, something you won't be able to do with a National Insurance number (right to work in the UK) or working on a work visa or something.

I have American friends that have come and worked in the UK, but all for companies, none have setup themselves, so not sure if it's possible.

oreoftw · on Sept 30, 2013

It shouldnt be a violation if I'll visit UK only to get new contracts and work on them remotely. I also have my own LLC in EU, so I can act on its behalf making taxation for customer easier.

The question is how to reach the market?

peteretep · on Oct 1, 2013

Work from home, and you'll be sorted, even if you're coming in from week to week. Tax laws target _where the work is done_. Then you're just coming to visit a client.