Hacker Newsnew | past | comments | ask | show | jobs | submit | dloreto's commentslogin

I have a similar view to yours: as soon as you need variables, imports, functions or any other type of logic ... the existing "data-only" formats break down. Over time people either invent new configuration languages that enable logic (i.e. cue or jsonnet), or they try to bolt-in some limited version of these primitives into their configuration.

My personal take is that at some point you are better of just using a full programming langugage like TypeScript. We created TySON https://github.com/jetpack-io/tyson to experiment with that idea.


Thanks for your comment! This is now the second time I'm coming across Jetpack.io (the first time was when I found your devbox project) and this time, too, I come away thinking that you're magically reading my mind. :) Thank you for your work!

May I ask you, what exactly is Jetpack.io? It sounds like a blend between startup and open-source organization, given the prominent links to Github & Discord on your home page, the lack of a hiring page etc. I mean, I had to browse your website quite a while to find out you're actually selling a product(?) :)

Anyway, back to the topic at hand: The TySON README says

> The goal is to make it possible for all major programming languages to read configuration written in TypeScript using native libraries. That is, a go program should be able to read TySON using a go library, a rust program should be able to read TySON using a rust library, and so on.

YESSSS. In my wet dreams I sometimes even go one step further: How great would it be if every language could import constants[0] from any other language?


JSON + types + functions using TypeScript syntax. Makes it possible to use TypeScript as a configuration language for applications written in `go`, and soon `rust` and other major languages.


A follow up:

1. We've now implemented pretty thorough testing: https://github.com/jetpack-io/typeid-go/blob/main/typeid_tes...

2. I clarified the prefix in the spec

Thanks for the feedback!


I hear you ... and I debated using either base58 or base64url. I do like the more compact encoding they provide.

Ultimately I ended up leaning towards a base32 encoding, because I didn't want to pre-suppose case sensitivity. For example, you might want to use the id as a filename, and you might be in an environment where you're stuck with a case insensitive filesystem.

Note that TypeID is using the Crockford alphabet and always in lowercase – *not* the full rules of Crockford's encoding. There's no hyphens allowed in TypeIDs, nor multiple encodings of the same ID with different variations of the ambiguous characters.


Thanks for the feedback!

We have tests for the base32 encoding which is the most complicated part of the implementation (https://github.com/jetpack-io/typeid-go/blob/main/base32/bas...) but your point stands. We'll add a more rigorous test suite (particularly as the number of implementations across different languages grows, and we want to make sure all the implementations are compatible with each other)

Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?


There is no tests.

There is just a single test. Which only tests the decoding of a single known value. No encoding test.

Go has infrastructure for benchmarking and fuzzing. Use it!

Also, you took code from https://github.com/oklog/ulid/blob/main/ulid.go which has "Copyright 2016 The Oklog Authors" but this is not mentionned in your base32.go.


We've now implemented pretty thorough testing: https://github.com/jetpack-io/typeid-go/blob/main/typeid_tes...

Thanks for the feedback!


> We have tests for the base32 encoding which is the most complicated part of the implementation

I didn’t look into it much but it seems like a great encoding even outside of this project. Predictable length, reasonable density, “double clickable” etc. I’ve been annoyed with both hex and base64 for a while so it’s pretty cool just by itself.

> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

Yeah, the worry is almost entirely “subtle deviations across stacks”, which is usually due to ambiguous specs. It’s so annoying when there’s minor differences, compatibility options etc (like base64 which has another “URL-friendly” encoding - ugh).


My personal favorite encoding is base58 aka Bitcoin address encoding. It uses characters [A-Za-z0-9] except for [0OIl]. It is almost as dense as base64, "double clickable", but not (as) predictable in length as base32.

It was chosen to avoid a number of the most annoying ambiguous letter shapes for hand-entry of long address strings.

https://en.bitcoin.it/wiki/Base58Check_encoding


Reminds me that Windows activation keys used to exclude a broader set of characters to avoid transcription errors: looking it up again: 0OI1 and 5AELNSUZ


Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

I was reviewing encodings recently and didn't want to drop all the way down to base32, but for some reason the library i was using didn't allow anything beyond base32 and bas64 variants, despite having a feature where you can define your own base.

I thought maybe it was performance oriented. An odd prefix length like base63 would mean .. i think, a slightly more computationally demanding set of encoding instructions?

Either way i basically want base58 but i don't care about legibility, i just wanted double click and url friendly characters.


>Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.

Yes, the reason is that you need 64 characters if you want each character to encode 6 bits as log2(64) == 6. If you only have 63 characters in your alphabet then one of your 6-bit combinations has no character to represent it.

Base32 can represent 5 bits per character because log2(32) == 5. Anything in between 32 and 64 doesn't buy you anything because there is no integer between 5 and 6.


Is that "just" a performance concern though? Ie why is there a base58 and base62 but no base63?

Now you've got me curious on the performance of base58 to base64 hah. Down the rabbit hole i go. Appreciate your reply, thanks :)


It’s not too difficult to write your own encoding. Probably 10 lines of code or less if you hard-code your encoding alphabet.


What does “double clickable” mean?


Whether "double click" selects the whole id.


> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?

It would be great if you add suggestions for compound types (like “article-comment”) in README as OP stated as well.


It seems that's not allowed currently, if I'm reading it right. I'm not sure I like `-` very much. The reason why I don't like it is because of how double-click to select and line breaking works for the dash. Maybe allowing `_` in the typename, and the have the rightmost `_` serve as the separator might be more consistent.

But also, I'm bike-shedding and its only an ID


I like using "." for this case. Because types definition typically belong to a package or module, which commonly uses "." for separator.


The checksum idea is interesting. I'm considering whether it makes sense to add it as part of the TypeID spec.


What value does the checksum provide? I think I'm missing something because I really don't see a benefit.


The benefit is that you can reject bad requests to an API more easily.

For one application I used a base 58 encoded value. Part of it was a truncated hmac, which I used like check digits. This meant I could validate IDs before hitting the DB. As an attacker or script kiddie could otherwise try a resource exhaustion attack.

So in the age of public internet faceing APIs and app urls, I think built in optional check digit support is a good idea.


I struggle to see how 10 bits of check data will help much. I guess if the extra bits aren’t persisted to storage it doesn’t hurt so why not?


Storage can get corrupted, columns can be truncated. For the applications I tend to touch correctness and the ability to detect errors and tamper are more important that a couple of bytes per row. But every application and domain is different.


Checksums facilitate error detection. For typed UUIDs, checksums help detect errors introduced by changing the prefix/type or changing a “digit”.


The CLI tool will support encoding/decoding any valid UUID, whether v1, v4, or v7. We picked v7 as the definition of the spec, because we need to choose one of them when generating a new random ID, and our opinion is that by default, that should be v7.

We might add a warning in the future if you decode/encode something that is not v7, but if it suits your use-case to encode UUIDv4 in this way, go for it. Just keep in mind that you'll lose the locality property.


It's based on UUIDv7 (in fact, a TypeID can be decoded into an UUIDv7). The main reasons to use TypeID over "raw" UUIDv7 are: 1) For the type safety, and 2) for the more compact string encoding.

If you don't need either of those, then UUIDv7 is the right choice.


That's how the type is encoded as a string, but type-safety ultimately comes from how the TypeID libraries allow you to validate that the type is correct.

For example, the PostgresSQL implementation of TypeID, would let you use a "domain type" to define a typeid subtype. Thus ensuring that the database itself always checks the validity of the type prefix. An example is here: https://github.com/jetpack-io/typeid-sql/blob/main/example/e...

In go, we're considering it making it easy to define a new Go type, that enforces a particular type prefix. If you can do that, then the Go type system would enforce you are passing the correct type of id.


The stated downsides come from poor data locality when using mostly random UUIDs; but you can keep most of the benefits of a globally unique identifier, and retain locality, by using UUIDv7.

At jetpack.io we've been doing exactly that via TypeIDs: https://github.com/jetpack-io/typeid and there's a PostgresSQL implementation available. TypeIDs are UUIDv7 with additional type information, so you also get type-safety in your IDs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: