Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
New Amazon S3 Storage Class – Glacier Deep Archive (amazon.com)
85 points by nnx on March 28, 2019 | hide | past | favorite | 42 comments


It's a little irritating this is only available via the S3 API rather than something I can just move my archive too. I guess maybe it's time to upload to S3 and be finally retire my Glacier archive after ~5 years.

Reading between the lines, I wonder if they're really trying to just deprecate the Glacier API, and move to Glacier being a different storage tier in S3. Which is probably what it should have been in the first place. AWS will likely never actually retire the Glacier API (much like SimpleDB has never actually been retired), it'll just hang on, not receiving new features.


> [...] move to Glacier being a different storage tier in S3

I am almost certain they are playing catchup here, given that GCP had this feature for ever ("coldline" storage class), last time I checked it was more expensive though.


Glacier's standard archive pricing has been competitive with Coldline all along, and S3 has had the ability via lifecycle to transition to Glacier since early in Glacier's existence, so it's been effectively possible to leverage the S3 API with Glacier, just with various provisos.

Glacier here seems to be dropping out an even cheaper option than they had before, provided you're happier with even longer retrieval times. Which for my purposes is perfectly fine.


Dont forget to access your archive over the s3 bittorrent endpoint.


Call me silly, but this seems like a great option for backing up nonessential personal data like a movie collection. It's even cheaper than Backblaze B2, and the limitations (180 day minimum storage, 12 hour retrieval time) don't seem like bad tradeoffs.


Just the thing I was looking for to back up my NAS offsite. Although not sure how to test the backup cost effectively, so I'm treating it as separate from my 3-2-1 strategy for now.


Bandwidth is between 9x and 5x more expensive compared to B2, though.


For backing up something personal, why don't you just use actual Backblaze?


Backblaze is not necessarily optimal for this use case, since "files are expunged from the servers after 30 days if they're removed from a computer" [1]. For files stored for archive purposes, such as some old movies or CDs, it might be that due to user error you happen to remove something but only notice this when you actually need the content.

[1] https://help.backblaze.com/hc/en-us/articles/217664898-What-...


That sounds like their backup client software... B2 is dead cheap and there's a lot of 3rd party clients which give you complete control over retention/versioning.


> B2 is dead cheap

B2 is 400% more expensive than Deep Archive. Whether the lower transfer costs result in savings depends on your use case. For B2 the cost of transfer is 5x a month of storage, for Deep Archive it’s 90x.

GCS is also cheaper than S3, with even higher transfer costs, but instant access irrespective of storage class.


It's also not some weird ass tape system that requires you to build your entire workflow to accommodate it and then get surprise bills on retrieval because it wasn't as cheap as you thought.


“Today [November 2016] we are replacing the rate-based retrieval fees with simpler per-GB pricing.”

https://aws.amazon.com/blogs/aws/aws-storage-update-s3-glaci...


Not OP but - No Linux support. I use it on my Windows machines but need something for Linux


What about rclone? Or are you looking for only official clients?


I may have misread but I read “actual backblaze” as their $5/mo per-machine backup service

I do indeed use B2 for my personal Linux backups


In addition to the other response, regular Backblaze doesn't support Linux.


1 USD per month per TB. Finally a storage tier that allows me to back up my media collection for what’s almost loose change.


But every time you download your 1TB media, you pay $100 transfer+request fee. At that cost you can nearly buy a 1T ssd hard drive.


But that drive won't have redundancy, won't be living in a highly protected and secure data center and won't be looked after by a team of skilled professionals.


The transfer fee for 1TB seems about $5 dollars when stored in EU, Frankfurt, for bulk retrieval (24hrs). Standard retrieval is $24 dollars (12hrs). If you'd compress all your files into one of couple zip files you will pay a tiny amount for the requests.


"Data Transfer OUT" or egress pricing is in addition to bulk retrieval, and it's high. "From Amazon S3 To Internet" $0.09/GB, or $90.81 according to their calculator: https://calculator.s3.amazonaws.com/index.html If you have enough TB to transfer, Snowball should be cheaper: https://aws.amazon.com/snowball/pricing/


I’m not really planning to download it unless there’s a DR event with my local set-up.


Excluding API call charges, retrieval fees, and transfer out charges:

If my math is correct, for N. California, 100 TB in Deep Archive is $204.80 per month, vs. $512 per month in regular Glacier.

For other non-Gov US regions, 100 TB in Deep Archive is $101.376 per month, vs. $409.60 per month in regular Glacier.

[Edit: Addl. pricing info]


This pricing is finally at the point where I can start considering this for some of the research labs I support.


If you have to retrieve all of it, what is the cost?


I may be missing something (determing what components of AWS pricing apply is not trivial to figure out), but the cost I'm getting for retrieval + outgoing bandwidth is about $90/TB. So for the 100TB example that would be $9,000


The retrieval cost itself isn't too extravagant. For 100 TB, it would cost about $250 for "bulk" retrieval (48 hours), or $2,000 for "standard" retrieval (12 hours).

But that only accounts for the cost of retrieving it within AWS. If you want to download the data to your own server, you have to pay the internet egress rate which would add an additional $7,800.


Depends a bit on if you use snowball


Snowball ought to be quite a bit cheaper if you have enough TB to recover: https://aws.amazon.com/snowball/pricing/


It's about half that cost in the us-east regions.


Thanks for that! Updated. Oregon is the same (as is pretty much the norm for US-WEST-1 vs. the rest of non-Gov US).


Off topic, but did anyone else press play to listen to Polly’s TTS of this article? They seem to have added “inhales” to ostensibly make it more natural, but the pauses and inhales (let alone the quality of the speech) are so off the mark it just sounds really odd. If your voice sounds robotic already, a fake breath only makes it worse.


Ya, weird. The breathing isn't really integrated with the speech. It sounds like someone on a breathing machine, like Christopher Reeve when he would pause as his machine took a breath for him.


I imagine Polly suffers from halitosis.


Isn't glacier the product which seems very cheap if you only look at the storage cost but is extremely expensive when retrieving the data? I remember reading an article by someone who had to pay something like 2000 USD to restore his data which wasn't even in the terrabytes. Although I might be mistaken here.


Amazon says the original retrieval pricing matched their cost, but was hard to grok and easy to run up a huge bill if you didn't use it carefully and extract data patiently.

They realized that was a mistake and significantly streamlined the pricing, and with this Deep product it doesn't look like they've even supporting the original somewhat opaque Glacier API, just S3.

If you're patient and can wait 48 hours for data, bulk storage retrieval is cheap at $0.0025/GB and $0.025 per 1000 requests. The standard AWS $0.09/GB egress is the really big cost, but if you have enough data you can mitigate that with a Snowball. Not a big issue if you're recovering from a catastrophe that destroyed all your local backups, it looks great as insurance for those of us with modest time to recovery desires.


Is this the service that is rumoured to be using bluray disc libraries?


Or maybe the newer Blu-ray based Archival Disc https://en.wikipedia.org/wiki/Archival_Disc which ups per-side storage to 150 GB and uses both side. But the pricing and retrieval scheduling strongly suggests LTO tape.


Perhaps this is the case, but I remember digging deep one night and seeing a picture of a StorageTek tape system on what I thought was an AWS page. I can't remember the URL. Oracle is trying to compete in this space, so maybe I'm misremembering.


I don't see Amazon locking themselves into an Oracle product given the bad blood between them.


I'm struggling to see the value proposition there. I would imagine you'd get way higher data / rack density using even plain old hard disk drives than you would with bluray disc libraries.

The only way to get close to that would be to have some insanely complicated automated jukebox/storage mechanism, because you'd be relying on essentially piling up disks in a rack.

Just rough back-of-napkin figures: blu-ray is 4.75" in diameter, and 0.05" in depth. 42U racks: 78″ x 42" x 24″ (more or less.)

Assuming you just pack them all in, in great big tall piles: You could get 5 x 8 columns, 1560 discs deep.

5 * 8 = 40. 40 x 1560 = 62,400 discs in a rack. At 150Gb a disc, that's a total of 9,360,000gb (9.36pb).

Of course you probably need to cut that in half at the very least, to be able to provide some kind of mechanism for extracting and remove the discs, and for safe storage. I'd consider that generous, but it's a good figure to work from.

And of course you have to consider that any jukebox for doing the writing / retrieval is effectively wasted space. The amount of discs you could have stored in transit there aren't likely to be sufficient enough to worth bothering with, so at the very least halve it again. That's also assuming you'd have one jukebox per storage rack.

So... 9.36 / 2 / 2 = 2.34 petabytes average rack density.

Current backblaze PODs come in at https://www.backblaze.com/blog/open-source-data-storage-serv... 480 TB per 4U. Rack I used for disc storage is the standard 42U. I'm going to go assume you'd lose 6U for top of rack switches, power, etc. so 9 servers per 42U. 9 * 480tb = 4.32 petabytes.

So even just with Backblaze PODs you'd get more than double the data density, and it would all be on-line and retrievals could be nearly instantaneous. Plus you'd be dealing with tried and true technology that is likely to be way more reliable and have much less unknowns, comprising of fairly easy to replace storage media, vs relatively new technology with less certain supply lines and specialised devices associated with it.

edit: I see below that bluray archive disks are reaching 150gb per side, so 300gb total. So the rack data density probably reaches 4.68 petabytes, just slightly over that of the hard disks, but still my final point remains. Choosing a newer technology vs well known hard disks and supply chains would be crazy, unless there was a strong advantage that it supplied. I don't see 0.36pb per rack as being that significant an advantage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: