Running 10M PostgreSQL Indexes in Production (And Counting)

pradeepchhetri · on Nov 30, 2016

Great blog !!

Some questions -

1. Is the sharding logic (based on customer) written in the application or is it somewhere else ?

2. How does your PG cluster look like ? How is the read/write workflow ?

3. Is there a way to rebalance the tables after adding nodes to the PG cluster ?

malisper · on Nov 30, 2016

Hi, author here.

1) We use software called Citus[0] which sits on top of Postgres and handles all of the read time sharding logic for us. Citus could handle the write time logic, except we wrote our own logic because we are able to perform several optimizations for our specific use case that don't work in general.

2) Right now, we have a 40 vanilla PostgreSQL instances running on ZFS on Linux and a single Citus master node. We perform writes by sending them directly to the proper vanilla PostgreSQL instance (why we don't use Citus for the write path). We perform reads by sending a vanilla SQL query to the citus master, which will automatically figure out what data it needs from each worker, send queries to fetch all of the data, and then aggregate the results and return them.

3) We wrote our own code for handling resharding and rebalancing different shards across our cluster.

[0] https://www.citusdata.com/

pradeepchhetri · on Nov 30, 2016

Very cool. Thank you for the answers.