Redshift Pitfalls and How to Avoid Them

jhk727 · on March 16, 2017

Hi, author here. This post covers some of the surprising things we ran into getting started with Redshift. Happy to answer any questions.

scapecast · on March 18, 2017

Very interesting.

I have to ask - with 80 clusters, do you set up a cluster per customer? Or do you put multiple customers on the same cluster?

Can you talk a bit more about how you've set up your workload queues in the Redshift WLM? I can spot at least three different workloads in your post:

1) loads into Redshift 2) transformations once data is in Redshift 3) customer ad-hoc queries

And then what node type do you use for your 80 clusters? I'm asking because the smaller dense-compute clusters offer more I/O, which helps a lot when you're loading data into the cluster.

jhk727 · on March 23, 2017

We offer our customers the option of managing their own cluster or having us manage it for them. About 80% of our customers choose the first option. In this case, cluster sizing and queue configuration is up to them.

When we manage the cluster, we set up a new cluster for each customer. The node type we choose depends on the customer - for larger customers we'll typically use the DS type nodes (with hard disk storage). The slower I/O isn't a huge deal for us, since getting the data from our database to S3 is usually the bottleneck in the process, rather than copying from S3 into Redshift.