Hacker News new | past | comments | ask | show | jobs | submit login
RethinkDB, SageMath, Andreessen-Horowitz, Basecamp and Open Source (2016) (sagemath.blogspot.com)
78 points by tosh on July 21, 2019 | hide | past | favorite | 15 comments



Interesting to see RethinkDB being re-posted on here again, even if it is a 3 year old article. We use it extensively in our background worker processing of our SaaS and it has worked magnificently for years. I even write a small HN related fun project with it a couple of years back [0].

I haven't kept up with how it is travelling as an open source project, but I am hoping that community interest is still strong and that the project will be around for a while yet.

[0]- https://hackernoon.com/tophn-a-fun-side-project-built-with-v...


> it has worked magnificently for years.

It surely did the job, but unfortunately, I have different experience.

- It's very space-ineffective. One particular thing I've discovered is that it's very ineffective for storing numbers. A strings are much more space-efficient (https://github.com/rethinkdb/rethinkdb/issues/6304). However, there must be something else. I'm slowly moving to PostgreSQL (experimenting with CockroachDB as alternative) and maybe that's just me, but similar datasets with similar indices require significantly less disk space there.

- It dies under heavy load. That could be a tolerable issue - I can throttle updates my app generate, but there's no way to control replication speed. Trying to add replicas to some of larger tables reliably kills the destination node for me.

- There's no way to limit memory, and optimizing for this is hard. A node with 64GiB RAM lives okay, but the one with only 16GiB gets periodically OOM-killed (cache size on that node is set to mere 6GiB).

- Sometimes, it has some weird cluster discovery delays, where restarted nodes just sit and don't connect to their peers for minutes, even though they're reachable and there are no network issues. This is not a problem for large clusters, but I have only 3 nodes (cost constraints), where a loss of one moves system into partially degraded state as replicating all tables is too wasteful. Tolerable, but not fun.

> how it is travelling as an open source project

Unfortunately, I'd say it is dead. There is some discussion activity, but almost no code activity.


I'm launching a scalable open source cryptocurrency payment system using RethinkDB.

It's amazing how well it works and scales on Kubernetes at the click of a button. There is a video of it on my project website https://crypticle.io/ I intend to keep using it for as many future projects as possible.

AFAIK, the business did not survive because the product was too good and didn't have enough problems to charge support for.

Actually this is a big problem in open source and that's why I'm involved in cryptocurrency now.


> It's amazing how well it works and scales on Kubernetes at the click of a button

Oh, just try to scale it down ;) https://github.com/rethinkdb/docs/issues/958

Deleting from system table is okay, but messing with server tags (when the setup is heterogeneous) is not exactly fun.

To be fair, that's a very minor issue. Perfectly manageable, just a slight inconvenience when you figure things out.


It's not that difficult, you can use the reconfigure query to put all the shards on one server, then you can scale down the other servers.


We've used it for years as well, and (for us) the absolutely worst fact about it is that it is a memory hog like crazy. Even without any accesses at all, it needs like 32 GB RAM on our nodes, for a 0.5 TB DB (with a bunch of indexes). I guess this is by design and is a compromise that requires this, but it makes it very expensive to deploy on AWS, where RAM is the real differentiator for this (actually storing the table on EBS is super cheap compared to the RAM required...). It's also very difficult to analyse to see where the RAM is going, this is one thing at least that could be improved. I would not use it for any newer projects unless this was fixed.


Anyone wanna tell us what's happened to RethinkDB since then, and what's happened to SageMath since then?


RETHINKDB:

- There is a recent discussion about whether or not RethinkDB is dead here: https://github.com/rethinkdb/rethinkdb/issues/6747

- Judging by new PR's and commits, Rethinkdb development slowed down substantially in Jan 2016.

SAGEMATH:

- SageMath development is still very active; maybe more so, due probably to strong Europe grant support (OpenDreamKit): https://github.com/sagemath/sage/graphs/contributors

SAGEMATHCLOUD: What this blog post is "really" about.

- We renamed it "CoCalc" -- https://cocalc.com

- I rewrote everything that depended on RethinkDB using PostgreSQL. Though it was months of work, this was obviously a VERY good decision in retrospect!

- I rewrote all the realtime sync functionality to not depend on the database at all (just use a custom protocol over websockets), which is obviously a much more scalable approach to this problem. RethinkDB (and anything like it) is totally the wrong tool for implementing realtime sync for file editing.

- Development on CoCalc is very active: https://github.com/sagemathinc/cocalc/graphs/contributors

Disclaimer: I wrote this blog post and started Sage and CoCalc. I didn't repost this here, and just happened to notice it while scanning HN.


The github “is this dead” epitomizes rethinkdb to me:

https://github.com/rethinkdb/rethinkdb/issues/6747#issuecomm...

“I wanted a task queue and wanted to build it on X”

Doesn’t matter that a job queue makes no sense on X. When X looks like it’s not active they build it on Y.

Neither X nor Y have any property that’d make it good for a job queue, but popularity contest drove adoption, not technical merit.


I found your blog post fascinating.

Are there any decisions you wrote about here that you now feel differently about (beyond rethinkdb, obviously).

Your take on open source and debt and lifestyle businesses is so interesting and I'm curious if almost four years has changed anything for you.


More specifically, has any of the business advice you got turned out to be surprisingly wrong?


Expect a couple of releases soon, and then, probably in a fork, a replacement of the clustering backend, which will either be open source or non-free, depending on whether funding for an open source-up-front implementation can be had. I'm working on it actively right now.




RethinkDB was truly an inspiration in a reactive way of thinking and a simplified solution for distributed sharding.

Change Streams in MongoDB are inspired by RethinkDB.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: