We’re about to start a project to build an LTO-9 based in-house backup system. Any suggestions for DIY Linux based operation doing it “correctly” would be appreciated. Preliminary planning is to have one drive system on in our primary data center and another offsite at an office center where tapes are verified before storage in locked fireproof storage cabinet. Tips on good small business suppliers and gear models would be great help.
If you are having trouble getting 10TB of disk space from IT, you have bad IT. Not saying that's uncommon or anything, but 10TB fits on one external hard drive for $300 from Best Buy, or less than $1000/mo using EBS on AWS if you need some better guarantees and are all-in on the cloud.
> Tapes are fun. You can fit a petabyte of data in a bankers box!
Yes, though those ultra-thin M.2 NVMe drives could probably top that now.
> Do we have a drive that can read this tape?
Don't let this be a problem in the first place: buy 4 tape drives and keep 2 of them in your cold/offsite/airgapped storage site (2 in case 1 fails, so you can use the remaining drive to transfer everything to a newer format).
> Do we have server we can connect it to?
Significant hardware should not (and is not) necessary: the LTO-7/8/9 drives I see for sale right now seem to be using either USB 3.0, Thunderbolt, or SAS connections; USB and Thunderbolt can be handled by any computer you can find at a PC recycler today; while any old desktop can handle SAS with a $80 HBA card.
> Do we have storage we can extract it to? (go ask your internal IT team for 10TB of drivespace...)
10TB isn't a good example (case-in-point: I have a 3-year-old stack of unused 12TB WD drives less than 3 feet away from me).
That said, if you're enterprise-y enough for a $4000 LTO-9 drive, then you probably also have a SAN that's chock full of drives, so being able to provision a 10TB+ LUN should be implicit.
> What program did we create this tape with? Backup Exec, Veritas, ArcServe, SureStore
Ideally, none of those; instead, good ol' Perl and `dd`.
> You have the encryption keys, right?
I don't encrypt my backups to avoid this problem. My old data archives have little exploitable value for any potential attacker; and I imagine I'd store backup tapes this important in a fireproof safe in my parents' house or something. I'd only encrypt the entire tape if the tape were to leave my custody.
I appreciate that this is not for everyone, and it's probably illegal for some people/orgs to not encrypt backups too anyway (HIPAA, etc).
> How much of this data already exists on the previous months backup?
Incremental/Differential backups are still a thing.
> Who's going to pay for the storage to move it to Glacier/etc?
No-one should. Cold data backups should/must always be in the custody of a designated responsible officer of the company.
BUUUT, I guess there's nothing wrong with storing an encrypted copy in Cloud storage (as in S3/Glacier/AzureBlobs - not OneDrive...). I actually do this right now thanks to the smooth and painless integration in my Synology NAS. It costs me about $15/mo to store all these TBs in S3.
> How long is it going to take to upload?
Consider that it's 2024 - a company with LTO-9 and SAN is probably going to have a metro-ethernet IP connection at 10Gbps or even faster. At home I have a 10Gbps symmetric connection from Ziply (it's $300/mo and they give you an SFP module, which I put into my Ubiquiti UDM): so the limiting factor here is not my upload speed, but my drive read speed (LTO-9 drives seem to read at about 2-3Gbps raw/uncompressed?)
Make sure the bandwidth exists to keep up with the write speed of the LTO drive. For instance, the write speed for LTO-6 (which I own as a hobbyist) is around 300MB/s, but line speed of gigabit Ethernet is about 100MB/s. Translate those numbers to LTO-9 and make sure that the NAS, network, or local storage can keep up. It's not a deal-breaker to underflow the drive, but it causes the tape to stop, rewind, and re-buffer (called shoe-shining) which takes more time and causes unnecessary wear on the drive and cartridges.
Nothing is fire proof. Is the cabinet "fire suppression system liquid" proof?
> Tips on good small business suppliers and gear models would be great help.
Hire an auditor would be my advice. Every business is different.
I am, just now, having flashbacks of when I was in a SOX environment and had to regularly contract with them... and while the experience can be somewhat unpleasant I've often found good auditors to be extremely knowledgeable about solutions and their practical implementation considerations.
They're not fully sealed. There are two shafts which connect the King's chamber to the exterior of the pyramid. The lower ritual congregation area is not fully sealed off from the upper chambers either. Which means bats are a constant problem in pyramids.
Archival is more of a process than only a question of media. First you must create a proper database of your archived data.
Maybe you want to do 3 copies, not two. Maybe you want to use two different archival formats such as tar and LTFS, just in case. Maybe you want to source your media from both available producers (Sony and Fuji) because in the long run, maybe one or the other may grow some funky error mode or corruption problem. Etc.
LTO-9 tapes can be easily found on Amazon in many countries, made by IBM, HP, Quantum or Fuji.
The vendor does not matter, whichever happens to be cheaper at the moment is fine.
For the tape drives, the internal drives can be cheaper by around 10%, but I prefer the tabletop drives, because they are less prone to accumulate dust, especially if you switch them on only when doing a backup or a retrieval. The tape drives have usually very noisy fans, because they are expected to be used in isolated server rooms.
I believe that the cheapest tape drives from a reputable manufacturer are those from Quantum. I have been using a Quantum LTO-7 tape drive for about 7 or 8 years and I have been content with it. Looking now at the prices, it should be possible to find a tabletop LTO-9 drive for no more than $5000. Unfortunately, the prices for tape drives have been increasing. When I have bought an LTO-7 tabletop drive many years ago it was only slightly more than $3000.
The tapes are much cheaper and much more reliable than hard disks, but because of the very expensive tape drive you need to store a few hundred TB to begin to save money over hard disks. You should normally make at least two copies of any tape that is intended for long-term archiving (to be stored in different places), which will shorten the time until reaching the threshold of breaking even with HDDs.
Even if there are applications that simulate the existence of a file system on a tape, which can be used even by a naive user to just copy files on a tape, like copying files between disks, they are quite slow and inefficient in comparison to just using raw tape commands with the traditional UNIX utility "mt".
It is possible to write some very simple scripts that use "mt" and which allow the appending of a number of files to a tape or the reading of a number of consecutive files from a tape, starting from the nth file since the beginning of a tape. So if you are using only raw "mt" commands, you can identify the archived files only by their ordinal number since the beginning of the tape.
This is enough for me, because I prepare the files for backup by copying them in some directory, making an index of that directory, then compressing it and encrypting it. I send to the tape only encrypted and compressed archive files, so I disable the internal compression of the tape drive, which would be useless.
I store the information about the content of the archives stored on tapes (which includes all relevant file metadata for each file contained in the compressed archives, including file name, path name, file length, modification time, a hash of the file content) in a database. Whenever I need archived data, I search the database, to determine that it can be found, for instance in tape 63, file 102. Then I can insert the corresponding cartridge in the drive and I give the command to retrieve file 102.
I consider much better the utility "mt" of FreeBSD than that of Linux. The Linux magnetic drive utilities have seen little maintenance for many years.
Because of that, when I make backups or retrievals they go to a server that runs FreeBSD, on which the SAS HBA card is installed. When a tabletop drive is used, the SAS HBA card must have external SAS connectors, to allow the use of an appropriate cable. I actually reboot that server into FreeBSD for doing backups or retrievals, which is easy because I boot it from Ethernet with PXE, so I can select remotely what OS to be booted. One could also use a FreeBSD VM on a Linux server, with pass-through of the SAS HBA card, but I have not tried to do this.
My servers are connected with 10 Gb/s Ethernet links, which does not differ much from the SAS speed, so they do not slow much the backup/retrieval speed. I transfer the archive files with rsync over ssh. On slow computers and internal networks one can use rsync without ssh. I give the commands for the tape drive from the computer that is backed up, as one line commands executed remotely by ssh.
The archive that is transferred is stored in a RAMdisk before being written on the tape, to ensure that the tape is written at the maximum speed. I write to the tape archive files that have usually a size of up to about 60 GB (I split any files bigger than that; e.g. there are BluRay movies of up to 100 GB). The server has a memory of 128 GB, so I can configure on it a RAMDdisk of up to 80 GB without problems. This method can be used even with a slow 1 Gb/s or 2.5 Gb/s network, but then uploading a file through Ethernet would take much more time than writing or reading the tape.
There is one weird feature of the raw "mt" commands, which is poorly documented, so it took me some time to discover it, during which I have wasted some tape space.
When you append files to a partially written tape, you first give a command to go to the end of the written part of the tape. However, you must not start writing, because the head is not positioned correctly. You must go 2 file marks backwards, then 1 file mark forwards. Only then is the head positioned correctly and you can write the next archived file. Otherwise there would be 1 empty file intercalated at each point where you have finished appending a number of files and then you have rewound the tape and then you have appended again other files at the end.
A lot very interesting details in your reply - thanks. I have this question:
If you aren’t budget constrained today and had to set it all up again. What would you do?
While I’m a Linux guy, I’ll happily run BSDs when appropriate, like for pfSense, and if it really has better mt tools or driver for LTO-9 drives due to the culture/contributors being more old school, then I’d just grab a 1U server to dedicate for it run a BSD and attach the drive to that.
You seem to have extensive practical hands on experience and while I was doing tapes 20 years ago this will be first time I’m hands on again with it since then. So I need to research most reliable drive vendors and state of kernel drivers and tools, just as you are alluding to.
Pretend you have $50K if needed (doubt it). 2PB existing data, 1PB/year targeted rate, probably 10-20%/year acceleration on that rate. with a data center rack location, 20Gb/s interconnect via bonded 10Gb NICs to storage servers (45drives storinators) and then an office center cabinet/rack/desk (your choice) and will put a tape drive holding at least 8 tapes in data center, planning for worst case of 100TB a month and data center visits to swap in new tapes shouldn’t be too frequent. Any details on what you would do would be interesting.
Like I have said, it is not necessary to dedicate a full-time FreeBSD server for this, you can use either a Linux server that is rebooted temporarily in FreeBSD or a FreeBSD virtual machine on the Linux server.
Around $5000 to $6000 should be enough for a LTO-9 tabletop tape drive plus a suitable SAS HBA card and SAS cable. The card must have matching SAS connectors and SAS speed with the tape drive.
More money will not bring anything extra until a much higher amount is reached, which would be enough to buy a tape autoloader/library, which would eliminate the necessity for a human to insert and remove the cartridges into the tape drive when needed. I am not sure if $50K is enough for a tape autoloader.
Tape autoloaders/libraries are worthwhile only for very big organizations where the amount of data that is continuously written or read to or from the tapes is very large. For a small business or for an individual a tape autoloader is certainly not worthwhile, because the tape drive will be in use at most a small fraction of every day.
1 PB/year is less than 3 TB/day. This can be written on a single tape in a little more than 2 hours. Even with a simple non-pipelined implementation of the file uploading with the writing on the tape, the backup can be done in less than 4 hours. Even writing 2 copies can be done in less than 8 hours. The backup can be done mostly or completely overnight.
For a much bigger amount of data one could buy several tape drives, before starting to think about an autoloader. Also it is possible to pipeline the network transfers with the tape writing, for a backup speed higher by around 50%.
If money would not be a problem and if the data needs to be archived for a long term, so that multiple copies are desirable, I would buy 2 tape drives, to be able to write 2 copies simultaneously.
This would also halve the time for archiving the initial 2 PB of existing data, which will take several months, so a speed-up would be desirable. Having 2 drives will also increase the reliability, as the system will continue to work if one becomes defective.
With only 3 TB written per day, a LTO-9 tape, which has a capacity of 18 TB, will be enough for 6 days.
So unless a backup must be restored, the operator would need to change the tape only once per week.
This is a moderate amount of data, easy to handle with a single drive, even if two are preferable for redundancy and for higher speed.
I do not understand your reference to a "a tape drive holding at least 8 tapes in data center". If you mean an autoloader, from what you describe it does not seem that the very big expense for an autoloader would be justified.
The LTO tapes are best stored in suitcases that can contain 20 cartridges, i.e. when using LTO-9 that is 360 TB. Therefore 3 suitcases store more than 1 PB, i.e. a year of data according to your example. The suitcases should be stored in a secure safe or cabinet. They are usually made to be stackable.
I have assumed that your 1 PB is of already compressed data. If the data is compressible than the requirements for the usage time of the drives and for the storage volume would be much smaller.
I have forgotten to mention that after I compress and encrypt the archived files, I add redundancy with a Reed-Solomon code, e.g. with the par2 program. If I choose e.g. a redundancy of 5%, then a file retrieved from the magnetic tape could have defects of up to 5% of its size, while the original data could still be extracted from it.
Excellent help. To clarify a few items:
- yes I mean drives with autoloader. for example: https://www.backupworks.com/qualstar-Q24-LTO-9-SAS-Library.a...
it’s basically a hard requirement as we don’t have staff time to enter data centers frequently. we are a bit unusual in being certainly not big, but not really a small business either when looking at budgets available. unless there is something wrong with qualstar product linked above perhaps autoloaders are cheaper than you believed?
- understood your rebooting trick. however being full automated (apart from blank tape rotations) is a requirement also. it’s a production infrastructure. if FreeBSD provides significant value it seems safer to spec a dedicated 1U server to use for backups. there is a management node currently that might work though that has to run Linux as it currently does and I need to check if the SAS on it can be used. It has an bunch of SAS ssd drives currently and I would have assumed there is a way to cable up the qualstar drive … but again I’m still early in researching. and the SAS compatibility issue you raise is perfect example of stuff I need to figure out.
- love par2cmdline and our burner with mdisc for IP backup uses that on git repo files and then seqbox as an outer container for data to guard against potential fs metadata corruption issues. there was a newer low level tool (rust rewrite I think) with many bitrot protection features that I can’t recall it’s name currently and isn’t immediately coming up in my notes, but I know it exists and have been meaning to look into it. it has a newer erasure encoding like raptorq and also block metadata like seqbox, I think can replace the par2 seqbox combo we are currently using on MDISC physical backup for IP. I don’t trust a 100% cloud as one can imagine somehow getting all accounts hacked and deleted.
- yes on compressed. the 2PB is already highly highly compressed. so it means 18TB/tape.
Do you have any vendor/distributors you can recommend? I always recommend 45drives to people and I was planning to ask them about LTO when we order next storinator which is coming up soon also.
There is this interesting blog post from a couple of years ago that probably was the seed of my plan to embark on LTO. Our monthly backblaze invoice is totally out of control. But we need a full backup of our data as it’s simply not replaceable and at the heart of the business.
If you would use the full configuration with 2 tape drives, the cost of the system might be around $15k, which is very reasonable for a tape library with autoloader.
I think that this autoloader is a good choice, especially if the price includes "1 x IBM LTO-9 SAS Tape Drive Installed".
As I have said, I believe that it is better to choose the option of also including the second tape drive.
For the tapes, there is no reason to worry about specific distributors. I have always bought them from Amazon, but shops that are specialized in storage products should be OK, unless they charge a premium price over what can be found at Amazon or Newegg. While the tapes are made by Fuji or Sony, they are usually easier to find and at at lower prices as IBM, HP or Quantum branded tapes.
The prices vary, so whichever vendor is cheaper when you buy a batch of tapes should be fine. An LTO-9 cartridge should be only slightly over $100. In time the prices of LTO-9 cartridges should drop. For now they are more expensive than the older cartridges, because they are still relatively new.
You must check the tape drive requirements for the SAS HBA PCIe card that must be installed in the server, which must have compatible connectors, and you must buy an appropriate SAS cable. I believe that the LTO-9 drives require the newer 12 Gb/s SAS standard and also the newer variant of the external SAS connectors (perhaps SAS HD SFF-8644 connectors).
If you already have a 12 Gb/s SAS HBA that has only internal connectors for SSDs, it is possible to reuse it by buying a SAS internal to external adapter of the appropriate connector types, which must occupy one of the empty expansion slots of the server case and which plugs into the internal connectors, while providing external connectors. Such adapters can also be used with server motherboards that have on-board SAS controllers. If you have a SAS HBA card that has external connectors, but different from those on the tape drive, e.g. SAS SFF-8088, there are cables with mixed SAS connectors that can connect the tape drives. The HBA cards usually have at least 2 external SAS connectors, suitable for 2 tape drives.
With the autoloader, it should be easy to make the backup or retrieval process completely automatic, so that an operator should not have to visit the tape autoloader more often than at a few months interval, except for the initial phase when you would have to write 2 PB on almost 120 tapes (or a double number for improved redundancy, beyond the redundancy added per each archive file; 2 copies can be stored in 2 different geographic locations, to avoid the catastrophic loss of all tapes), so you would want to keep the tape autoloader in an easily accessible place for that time.
The initial cost for writing 2 copies of 2 PB of data, i.e. 4 PB of data, would be not much less than $30k for the tapes. This, together with the autoloader with 2 tape drives, HBA card, cases, cables and maybe adapters, would be in the range of $45k to $50k, so within your estimated budget.
As I have said, it is convenient to have a database with the metadata (including content hashes, made e.g. with BLAKE2b-512 or with BLAKE3-256) of all the files that have ever been archived, which shall be used whenever information must be retrieved and which can also be used for deduplication (for which the content hashes are handy), to check whether a file is already present in some earlier archive, so there is no need for its backup.
I want to add that when you start testing the tape drives, one of the first things that you need to do is to measure the exact capacity of an 18 TB LTO-9 tape cartridge.
For instance, I write the tapes with "dd bs=131072 if="$file_name" of=/dev/nsa0". This means that I am using 128 kB blocks. I have measured that a 6 TB LTO-7 tape cartridge has a capacity of 45905860 such 128 kB blocks.
The position of the read/write head, measured in blocks from the beginning of the tape, can be obtained with "mt rdspos". After you choose some block size, e.g. 128 kB, you should forever stick with it in all your write commands and on all your tapes, so that you will always get consistent information about the position of the read/write head.
The tape capacity can be measured by writing files, preferably of the same size that you will typically use for archives (in order to write a similar number of file marks), until you get a write error.
With the capacity of the tape known exactly, after any writing of a new file you get the current position and you compute the remaining free space on the tape, to know whether you can still append data or you must change the tape.
The position in blocks can also be used to verify that the tape drive works OK. For example when after rewinding the tape you go to the end of the written part, to append new files, you must see the same position as after your last write. Or when writing a copy of a tape, you must see the same positions on both tapes for any file.
For retrieving files, the position in blocks does not matter, but only the ordinal number of a file. You position the read/write head to the beginning of a file with "mt rewind; mt fsf $file_number". Then you read the file, possibly in a loop if you want to read multiple consecutive files.
For going to the end of the written part of a tape, to append new files, you must use "mt locate -e; mt bsf 2; mt fsf", as I have mentioned in a previous posting. The explanation of why this is needed is buried in the documentation about how tape marks and head positioning really work.
Whenever I start using the tape drive, I use "mt comp off; mt status" and I check the status output to be as expected.
The tape is ejected with "mt -f /dev/esa0 rewind".
At the currently advertised reduced price of $7226, the Quantum SuperLoader 3 would be a good choice.
I would buy 2 of them, which together with all the other items and with 4 PB of tapes for the migration of the existing data would not exceed your estimated budget of $50k.
I assume that for this price you might get the 8-slot version. Quantum SuperLoader 3 can be extended to 16 slots, but I assume that for this you must buy an additional 8-cartridge removable active magazine. You should check the price for that.
Because I had good experience with the reliability of my Quantum tabletop tape drive, I would recommend this Quantum autoloader. Moreover, its datasheet includes all the expected information about reliability parameters, so they are tested by the manufacturer.
I consider the included backup software as useless. You should write your own backup scripts. You might need a few days for this, depending on the previous experience and on the support provided by the utilities specific to the file systems that you happen to use, but then you can be worriless for years, unlike when you depend for all your precious data on a black box proprietary program, which cannot be trusted to do the right thing, and which might write data in a format that cannot be recovered with any other tool (without an extensive reverse engineering work).
Regarding Linux' "mt", there are two versions : the horrible, primitive version that comes with cpio and is almost certainly the one that's installed as default : and "mt-st", the actually usable one.
Great post. You might be able to elide the RAM disk in lieu of the "mbuffer" command. My script uses a combination of dd | pv | mbuffer | mt. I omitted the options because I don't remember any of them. I personally use dd of an ext4 filesystem-on-file that is exactly the size of what will fit on tape. This was simply because I couldn't figure out how to reliably advance the tape head or how to continue a write from one tape to another.
The advice I got long ago from an IT guy was: if you wait long enough, tape will be on top again.
That was a long time ago but I’ve peeked in at backup systems in the intervening years and it does seem to hold true over time.
But it really depends how much data you have. My ex dropped a single HDD in a safety deposit box at CoB, N times per week and fetched back the oldest disk. I don’t think she ever said how many were in there but I doubt it was more than three. I think the CTO took one home with him once per week.
The silly thing about most of this set up is that the office, the bank, and the data center were all within half a kilometer of each other. If something bad happened to that part of town they only had the infrequent offsite backup.
Every time I've looked, tapes were on top "again" for large scale archival. And I've been looking for ~20 years by now.
I don't get where people get the impression that X was at the top right before tapes got that last innovation (where X here is most often HDDs, but not always). But that's always the impression, and tapes are always on top.
People also have been working with 3D phase change drives since the 90s. Those always promise to replace tapes. But nobody ever got them robust enough to leave the labs.
Tape might win for large-scale systems, but it's basically dead for home-office scale.
You used to be able to get modestly priced tape units for home use from the old "connects to the floppy controller" units with capacities in the tens of megabytes, up to some late-gen SCSI/IDE/parallel/early USB models that would be a couple of gigabytes, but still at home-friendly prices. What's today's answer? An enterprise-grade device that might put 10TB+ on a tape, but comes with a four-figure price tag and isn't really sold at Best Buy.
If I want to back up the house today (maybe 4 active PCs, 5-6Tb of total space), affordable choices are pretty much disc-based. I could choose a cheap NAS (ended up doing that with an old fanless Atom machine and a used 12TB datacentre drive) or get a USB-attached external drive. Even if I used BD-XL media, even my modest needs would be dozens of discs, plus getting a writer in a shrinking market. There are plenty of datahoarders with much bigger needs, but even for them, tape is completely outside the addressable market.
Amount of data makes it less realistic. We have around 2PB data currently and expect to grow around 1PB next year with maybe 10-30% annual growth rate.
$50K if needed but it doesn’t look to need that. 2PB initial data. predicted 1PB/year with around 10-30%/year rate of growth of rate of growth (acceleration?)
We’re about to start a project to build an LTO-9 based in-house backup system. Any suggestions for DIY Linux based operation doing it “correctly” would be appreciated. Preliminary planning is to have one drive system on in our primary data center and another offsite at an office center where tapes are verified before storage in locked fireproof storage cabinet. Tips on good small business suppliers and gear models would be great help.