More

huntaub · 2024-11-19T06:52:24 1731999144

Hey, thanks for reaching out. The caching layer does return success before writing to S3 -- that's how we get good performance for all operations, including those which aren't possible to do in S3 efficiently (such as random writes, renames, or file appends). Because the caching layer is durable, we can safely asynchronously apply these changes to the S3 bucket. Most operations appear in the S3 bucket within a minute!

mbrt · 2024-11-19T07:17:39 1732000659

Very nice, I like the approach. I assume data is partitioned and each file is handled by an elected leader? If data is replicated, you still need a consensus algorithm on updates.

How are concurrent updates to the same file handled? Either only one client can open in write at any one time, or you need fencing tokens.

huntaub · 2024-11-19T07:38:51 1732001931

Without getting too much into internals which could change at any time, yes. You have to replicate, partition, and serve consensus over data to achieve high-durability and availability.

For concurrent updates, the standard practice for remote file systems is to use file locking to coordinate concurrent writes. Otherwise, NFS doesn't have any guarantees about WRITE operation ordering. If you're talking about concurrent writes which occur from NFS and S3 simultaneously, this leads to undefined behavior. We think that this is okay if we do a good job at detecting and alerting the user if this occurs because we don't think that there are applications currently written to do this kind of simultaneous data editing (because Regatta didn't exist yet).

mbrt · 2024-11-19T08:00:51 1732003251

Thanks for the details!

Consistency at the individual file can be guaranteed this way, but I don't think this works across multiple files (as you need a global total order of operations). In any case, this is a pragmatic solution, and I like the tradeoffs. Comparing against NFS rather than Spanner seems the right way to look at it.

huntaub · 2024-11-19T15:34:22 1732030462

This is actually also interesting, in that I don’t think that the file system paradigm actually requires a global total ordering of operations (and, in fact, many file systems don’t provide this). I know that sounds like snapshots wouldn’t be valid, but I think that applications which really care about data consistency (such as databases) are built specifically to handle this (with things like write-ahead-logs).

ignoramous · 2024-11-19T20:22:07 1732047727

Regatta is a write-through cache for s3 bucket under its supervision? I guess then external changes to that bucket is a no-no?

Any plans to expand to other stores, like R2 (I ask since unlike S3, R2 egress is free)?

huntaub · 2024-11-19T20:41:43 1732048903

Hey there, that's sort of the correct way to think about it -- notably that our caching layer is high-durability, so we can keep recent writes in the cache safely. External changes to the bucket are okay! Lots of customers need to (for example) ingest data into S3, then process it on a file system, and that totally works. The only thing that isn't supported is editing the same file from both S3 and the file system simultaneously. We think this is a super rare case, and probably doesn't exist today (because there isn't anything that bridges S3 and file semantics yet).

We support all S3-compatible storage services today, including R2, GCS, and MinIO.

ignoramous · 2024-11-20T06:37:09 1732084629

I actually asked about R2 to see if Regatta's pricing is any different as there's no egress fee. I should have been clearer.

btw, thanks a bunch for answering my Q & everyone else's too (except for parts where you couldn't talk about the implementation, understandably so). Appreciate it. Wishing the best.

huntaub · 2024-11-19T06:22:08 1731997328

Wow, thanks for coming out! I hope that you're heartened to see the number of people who immediately think of JuiceFS when they see our launch. I totally agree with you, storage is such an interesting space to work in, and I'm excited that there are so many great products out there to fit the different needs of customers.

huntaub · 2024-11-19T05:57:25 1731995845

Thanks for the question. Full disclosure, I'm grabbing this response from another comment:

I have mutual friends with some of the Nasuni folks, and I have a lot of respect for what they do. In particular, Nasuni stores data in a proprietary block format in your S3 bucket, so you can't connect it to existing data sets or use that data directly from S3 out the other side. Whereas with Regatta, we store data in its native format in S3 so you can do these things.

huntaub · 2024-11-19T05:56:25 1731995785

We choose NFS purely because it's the fastest way to get broad compatibility with most operating systems (NFSv3, for example is supported on both Linux and Windows). However, I have great news for you! We're simultaneously working on a custom protocol (over FUSE today) that is going to solve the small file problem for things like cloning the Linux kernel git repo. You can actually see in our demo video (https://youtu.be/xh1q5p7E4JY?feature=shared&t=170) that we untar the Linux kernel on Regatta in under 12 seconds. We're hopeful that this performance makes file storage useful for a broader set of workloads.

amitizle · 2024-11-21T19:27:04 1732217224

Great! Thanks!

huntaub · 2024-11-19T05:53:37 1731995617

Hey, thanks for the question. From what I can tell (and this could be wrong), but it looks like s3ql is using S3 as a block layer. Regatta, on the other hand, allows you to read and write files in their native format. I agree that it's harder to implement than just using S3 for block storage, but I think that it unlocks a lot of potential use cases for customers. With Regatta, we make these semantics performant, which is a huge improvement on the prior art.

huntaub · 2024-11-19T05:22:59 1731993779

Hey there, thanks for your note. I think that the answer here (as with all good questions) is "it depends".

I agree with you, Object Storage accels at making the storage interface super simple to use (POSIX is incredibly complex). However, that doesn't change the reality that nearly all software still reads and writes data from a local file system interface.

The specifics of whether or not using a translation layer will save you costs comes down a lot to what you're comparing it to. If you have an EBS volume that's 20% full, then I guarantee you that Regatta's storage costs will be cheaper than EBS, even if you don't ever tier to S3. It's just a cherry on top for workloads which may have unpredictable access patterns and don't want all of their data to be hot when not in use.

huntaub · 2024-11-19T03:36:06 1731987366

Thanks for your note! We're really hopeful that our "local-like performance" is part of the story that distinguishes us from other file system solutions. I envision a world where people don't have to overprovision block storage volumes, and can just use this instead -- with the ability to easiy grab their data from S3.

huntaub · 2024-11-19T03:35:02 1731987302

Thank you! We have docs at https://docs.regattastorage.com. There is an architecture page which might answer your questions. If you have deeper questions, feel free to ask in the thread or shoot me an email at hleath [at] regattastorage.com.

huntaub · 2024-11-19T03:34:29 1731987269

Yes! I don't know of any reason why that wouldn't work. I've worked with lots of customers who need simple, low-cost storage for database backups.

huntaub · 2024-11-19T03:32:58 1731987178

Thanks for the note, great to hear from you! I think that what Clickhouse does is great, and I expect that more applications want to take advantage of the low prices of S3 cold storage without needing to build their own application-level abstractions. I'm hopeful that this allows more of these next-generation serverless data products to exist.