Hacker News new | past | comments | ask | show | jobs | submit login
Redshift Pitfalls and How to Avoid Them (heapanalytics.com)
6 points by drob on March 16, 2017 | hide | past | favorite | 3 comments



Hi, author here. This post covers some of the surprising things we ran into getting started with Redshift. Happy to answer any questions.


Very interesting.

I have to ask - with 80 clusters, do you set up a cluster per customer? Or do you put multiple customers on the same cluster?

Can you talk a bit more about how you've set up your workload queues in the Redshift WLM? I can spot at least three different workloads in your post:

1) loads into Redshift 2) transformations once data is in Redshift 3) customer ad-hoc queries

And then what node type do you use for your 80 clusters? I'm asking because the smaller dense-compute clusters offer more I/O, which helps a lot when you're loading data into the cluster.


We offer our customers the option of managing their own cluster or having us manage it for them. About 80% of our customers choose the first option. In this case, cluster sizing and queue configuration is up to them.

When we manage the cluster, we set up a new cluster for each customer. The node type we choose depends on the customer - for larger customers we'll typically use the DS type nodes (with hard disk storage). The slower I/O isn't a huge deal for us, since getting the data from our database to S3 is usually the bottleneck in the process, rather than copying from S3 into Redshift.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: