Hacker News new | past | comments | ask | show | jobs | submit | memset's comments login

Super interested in your approach to error wrapping! It’s a feature I haven’t used much.

I tend to use logs with line numbers to point to where errors occur (but that only gets me so far if I’m returning the error from a child function in the call stack.)


Simply wrap with what you were trying to do when the error occurred (and only that, no speculating what the error could be or indicate). If you do this down the call stack, you end up with a progressive chain of detail with strings you can grep for. For example, something like "processing users index: listing users: consulting redis cache: no route to host" is great. Just use `fmt.Errorf("some wrapping: %w", err)` the whole way up. It has all the detail you want with none of the detail you don't need.

So hand rolled call stack traces? I just don’t understand why this is better than exceptions.

I’ve been working on a different way to automate. Basically a script that does the renewal and then knows how to install to any destination.

https://github.com/poundifdef/certmaster


I used ULIDs for a time until i discovered snowflake ids. They are (“only”) 64 bits, but incorporate timestamps and randomness as well. They take up way less space than ULIDs for this purpose and offer acceptably rare collisions for things I’ve worked on.


The original snowflake id developed at twitter contains a sequence number so they should never collide unless you manage to overflow the sequence number in a single millisecond.


Also, you can store them as a BIGINT, which is awesome. So much smaller than even a binary-encoded UUID. IIRC the spec reserves the right to use the sign bit, so if you’re concerned, use BIGINT UNSIGNED (natively in MySQL, or via extension in Postgres).

I wish more people cared about the underlying tech of their storage layer – UUIDv4 as a string is basically the worst-case scenario for a PK, especially for MySQL / InnoDB.


You could do that, yes!

There are a few different ways to solve this problem. What would be the easiest flow for you?


I’d like to learn more! Clickhouse benchmarks don’t show Pinot favorably by comparison.

Also, do you have thoughts on Starrocks?


> Clickhouse benchmarks don’t show Pinot favorably by comparison

Looks like they don't configure any indexes for Pinot in their benchmarks, which is one of Pinot's main selling points on the performance front - https://github.com/ClickHouse/ClickBench/issues/37.


Confirmed. Also, ClickBench is working from a batch-loaded data set, which is kind of antithetical to a real-world, real-time analytical database workload.

[Disclosure: I work at StarTree, and we're powered by Apache Pinot.]

We are currently considering / evaluating different methodologies to benchmark more realistic situations for real-time analytics. Potential consideration for your own benchmarking / POCs, or for a future industry benchmark spec:

1. Some sort of "freshness" (data latency) measurement: time for streaming ingestion / indexing / data ready for query. Is it consistent, or are there pauses in ingestion?

2. Some sort of "ingestion scaling" measurement: how many objects per second can you get to before you choke IO? What happens to ingested objects at different payload sizes? (This interacts with "freshness" above; i.e., you might [or might not] be able to throttle ingestion to improve freshness.)

3. Query concurrency/throughput: does your query capacity scale linearly or non-linearly? What happens at 10 QPS? 100 QPS? 1000 QPS? 10000 QPS? 100000 QPS? (Or when does it top out?)

4. Data volume: Are you querying against 1TB? 10TB? 100TB? 1 PB? (More?) This interacts with query concurrency. Because driving 100 QPS against 1 PB is a totally different case than driving 100000 QPS against 1 TB.

5. Data storage type: Are you running against local NVMe, EBS, or S3 buckets? (Maybe even HDD?) Is it all uniform storage, or is it in a tiered storage topology? If tiered, what's the % mix of the different storage types? This is just an increasing reality all vendors need to deal with. Customers want to optimize their spend per use case.

6. Query complexity: Before talking simple "latencies," you have to understand what sort of queries you're running. These aren't simple atomic row CRUD operations like a Redis or a ScyllaDB. How are you doing aggregates? Are you running queries against denormalized data in a single table, or are you doing single JOINs or multiple table complex JOINs?

7. Indexing: As pointed out by shadow28, indexes are vital for best performance. Which type of index was used? (Apache Pinot supports about a dozen different types).

And my personal favorite to throw into the mix:

8. Cluster resilience: Great. All of the above worked on a fully-provisioned stable cluster. Now knock out a node. Do it. See what happens. How long before the cluster rebalances and quiesces? What happens to your QPS and latencies during the rebalance, and then after quiescence? Measure that. Now knock out a 2nd node. Maybe a third. How many nodes can you bring down before performance goes non-linear, or the cluster is rendered as utterly unreliable.

This latter I call the "Torpedo test;" and I've been preaching about it for years[1]. How many "torpedos" can you cluster take before it sinks under the waves. It's not specific to real-time OLAP. You can use this kind of methodology to test resilience of any distributed system. And you should probably do this before you hit production.

[1] https://www.slideshare.net/slideshow/what-we-learned-about-a...


Is there a relationship between revenue for the restaurant? Or total tips earned?

If low-privacy tipping decreases loyalty, is it overall positive from the business’ perspective?


This is honestly the coolest thing I've seen coming out of YC in years. I have a bunch of questions which are basically related to "how does it work" and please pardon me if my questions are silly or naive!

1. If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

2. Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

3. I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

4. Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

5. I have to ask - how do you think about open source here?

6. Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

I haven't played with the so maybe doing so would help answer questions. But I'm really excited about this! I have tried using EFS for small projects in the past but - and maybe I was holding it wrong - I could not for the life of me figure out what I needed to get faster bandwidth, probably because I didn't know how to turn the knobs correctly.


Wow, thanks for the nice note! No questions are silly, and I'll also note that we now have a docs site (https://docs.regattastorage.com) and feel free to email me (hleath [at] regattastorage.com) if I don't fully address your questions.

> If I had a local disk which was 10 GB, what happens when I try to contend with data in the 50 GB range (as in, more that could be cached locally?) Would I immediately see degradation, or thrashing, at the 10 GB mark?

We don't actually do caching on your instance's disk. Instead, data is cached in the Linux page cache (in memory) like a regular hard drive, and Regatta provides a durable, shared cache that automatically expands with the working set size of your application. For example, if you were trying to work with data in the 50 GiB range, Regatta would automatically cache all 50 GiB -- allowing you to access it with sub-millisecond latency.

> Does this only work in practice on AWS instances? As in, I could run it on a different cloud, but in practice we only really get fast speeds due to running everything within AWS?

For now, yes -- the speed is highly dependent on latency -- which is highly dependent on distance between your instance and Regatta. Today, we are only in AWS, but we are looking to launch in other clouds by the end of the year. Shoot me an email if there's somewhere specifically that you're interested in.

> I've always had trouble with FUSE in different kinds of docker environments. And it looks like you're using both FUSE and NFS mounts. How does all of that work?

There are a couple of different questions bundled together in this. Today, Regatta exposes an NFSv3 file system that you can mount. We are working on a new protocol which will be mounted via FUSE. However, in Docker environments, we also provide a CSI driver (for use with K8s) and a Docker volume plugin (for use with just Docker) that handles the mounting for you. We haven't released these publicly yet, so shoot me an email if you want early access.

> Is the idea that I could literally run Clickhouse or Postgres with a regatta volume as the backing store?

Yes, you should be able to run a database on Regatta.

> I have to ask - how do you think about open source here?

We are in the process of open sourcing all of the client code (CSI driver, mount helper, FUSE), but we don't have plans currently to open source the server code. We see the value of Regatta in managing the infrastructure so you don't have to, and if we release it via open-source, it would be difficult to run on your own.

> Can I mount on multiple servers? What are the limits there? (ie, a lambda function.)

Yes, you can mount on multiple servers simultaneously! We haven't specifically stress-tested the number of clients we support, but we should be good for O(100s) of mounts. Unfortunately, AWS locks down Lambda so we can't mount arbitrary file systems in that environment specifically.

> efs performance

Yes, the challenge here is specifically around the semantics of NFS itself and the latency of the EFS service. We think we have a path to solving both of these in the next month or two.


Do I understand correctly that the data gets decrypted at your Regatta AWS instances, before the data ends up in the customer's S3 bucket? It sounds like the SSL pipe used for NFS is terminated at Regatta servers. Can customers run the Regatta service on their own hardware?

Or does Regatta only have access to filesystem metadata -- enough to do POSIX stuffs like locks, mv, rm -- but the file contents themselves remain encrypted end-to-end?


This is correct, we encrypt data in-transit to the Regatta servers (using TLS), and we encrypt any data that the Regatta servers are storing. Of course, when Regatta communicates with S3, that's also encrypted with TLS (just like using the AWS SDK). However, we don't pass the encrypted data to S3, otherwise you wouldn't be able to read it from the bucket directly and use it in other applications!


Thank you for the detailed answers! Honestly, this project inspires me to work on infrastructure problems.

So you are saying that regatta's own SaaS infrastructure provides the disk caching layer. So you all make sure the pipe between my AWS instance and your servers are very fast and "infinitely scalable", and then the sync to S3 happens after the fact.


That's exactly right!


So Regatta has an in memory cache? Does the posix disk write only suceed when the data is in more than one availability zone?


Hey there! Today, we are replicating cache data within a single availability zone, but we’re working on a multi-availability zone product. If you have a need for multi-AZ, please shoot me an email at hleath [at] regattastorage.com, I’d love to learn more


Are you planning to support android? How? AFAIK android doesn't have FUSE or NFS.


I don't think that I'm planning support for Android, did I mistakingly mention it somewhere?


Plain Old Recipe: takes online recipes and removes the cruft.

https://www.plainoldrecipe.com https://github.com/poundifdef/plainoldrecipe

Things I want to do:

1. Improved print-friendly format 2. Ability to format to arbitrary sizes (for example, format for index cards) 3. Smarter layouts. For example, if a recipe says "add the chicken stock" in a step it would be great if it could identify how much ("1 cup") like some apps do.


I have wanted to do something like this for news websites for a while now. I tried the recipe site, my first url from lifehacker wasn't supported, the second attempt from the curated list gave an error loading site, so maybe make the links clickable, have a top 10 ready to go ( cached) at bottom of landing page. Write a bot to periodically check the list of sites this works on and indicate how recent said check was done. Then I might spend more time. Good luck!


It's also a little weird that they lie and claim "Usually we're able to parse recipes from {{domain}}" no matter what domain you put in. This really rubbed me the wrong way.

https://github.com/poundifdef/plainoldrecipe/blob/39868809c1...


I love this project! I immediately made an iphone shortcut so I can convert any page I'm currently to a plainoldrecipe. Sharing in case it is useful for anyone else: https://www.icloud.com/shortcuts/86bfd549ae6c421ca04b5a99320...


Just The Recipe is a similar service: https://www.justtherecipe.com/


I would like to have some sort of paid version. Maybe an API, or something, to help pay for ongoing server costs and maybe some pizza money too.

Any ideas on what would be worthwhile for “premium” access?


Does not appear to be open source.


Never seen this before, but it's such a simple and brilliant idea. Thank you!


Thanks for plainoldrecipe! Such a handy tool.


SmoothMQ: a drop-in replacement for SQS. https://github.com/poundifdef/smoothmq

I am looking to build 4 main things:

1. Better compatibility with SQS' different endpoints 2. Sharding: I want users to be able to add/remove a node to a cluster and have the system automatically rebalance 3. Replication

The project is written in Go, and the UI is also just uses HTML and go templates.


I'm interested in this but I see long lasting issues and very few PRs (last one from August 3rd). Is it because things are stable now or the project is not getting traction?


I run https://www.plainoldrecipe.com

Had no idea about some of these other formats!


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: