Anyone else finding their S3 bill consisting of mostly PUT/COPY/POST/LIST querie...

jakozaur · on Nov 22, 2016

I hit that scenario, when we create tons of small files (e.g. < 10KB ones). In that use case it is often cheaper and easier just to use database such as DynamoDB.

See my other comment, it got link to article about S3 costs optimizations which got more detailed recommendation.

cyberferret · on Nov 22, 2016

What app/site are you using to upload to S3? I use a combination of CloudBerry Backup and Arq Backup on my Macs/PCs here and the requests aren't that high (on average about 30Gb of data per machine in around 300K files).

I am guessing it comes down to the algorithm used to compare and upload/download files. I believe the two solutions above use a separate 'index' file on S3 to track file compares.

scrollaway · on Nov 22, 2016

It's more that we have a pretty high throughput system, using Lambda.

Users authenticate with an API gateway endpoint, we do a PUT to store a descriptor file, send a presigned PUT URL back so they can upload their file, we then process the file and do a COPY+DELETE to move it out of the "not yet processed" stage and finally do another PUT to upload the resulting processed file.

Despite a lot of data, the storage bill is barely scratching $40, but we're at almost $700/mo on API calls.

ranman · on Nov 22, 2016

Heyo, sounds not quite right if you wanna shoot me an email randhunt@amazon.com I'd be happy to try to figure it out. Your API calls shouldn't be that much more than the storage cost without a really strange access behavior. I don't know the answer off the top of my head but I'm down to try to find it out. GET's cost money and outbound bandwidth cost money but PUTS/POSTS should be neglible.

scrollaway · on Nov 22, 2016

Thanks! I'll shoot an email.

Edit: Sent!

cyberferret · on Nov 22, 2016

Ah, thanks for the extra info. We have several web apps that take user uploads and store it to S3 buckets here too, but still we don't see an adversely high request load. Not sure if the handshaking involved in getting a pre-signed URL will be upping your count?

We just use the AWS SDK on our Ruby back end. The user file is first uploaded to the (EC2) app server, then we use the SDK call to transfer it to the S3 bucket. Our storage and request costs are about equal at this stage.

Using Lambda/Node, I guess that the SDK is not an option and you have to use the pre-signed URL method? Or else use Python and the SDK library?

zwily · on Nov 22, 2016

You can generate pre-signed URLs without making an API call.