I have a similar view to yours: as soon as you need variables, imports, functions or any other type of logic ... the existing "data-only" formats break down. Over time people either invent new configuration languages that enable logic (i.e. cue or jsonnet), or they try to bolt-in some limited version of these primitives into their configuration.
My personal take is that at some point you are better of just using a full programming langugage like TypeScript. We created TySON https://github.com/jetpack-io/tyson to experiment with that idea.
Thanks for your comment! This is now the second time I'm coming across Jetpack.io (the first time was when I found your devbox project) and this time, too, I come away thinking that you're magically reading my mind. :) Thank you for your work!
May I ask you, what exactly is Jetpack.io? It sounds like a blend between startup and open-source organization, given the prominent links to Github & Discord on your home page, the lack of a hiring page etc. I mean, I had to browse your website quite a while to find out you're actually selling a product(?) :)
Anyway, back to the topic at hand: The TySON README says
> The goal is to make it possible for all major programming languages to read configuration written in TypeScript using native libraries. That is, a go program should be able to read TySON using a go library, a rust program should be able to read TySON using a rust library, and so on.
YESSSS. In my wet dreams I sometimes even go one step further: How great would it be if every language could import constants[0] from any other language?
JSON + types + functions using TypeScript syntax. Makes it possible to use TypeScript as a configuration language for applications written in `go`, and soon `rust` and other major languages.
I hear you ... and I debated using either base58 or base64url. I do like the more compact encoding they provide.
Ultimately I ended up leaning towards a base32 encoding, because I didn't want to
pre-suppose case sensitivity. For example, you might want to use the id as a filename, and you might be in an environment where you're stuck with a case insensitive filesystem.
Note that TypeID is using the Crockford alphabet and always in lowercase – *not* the full rules of Crockford's encoding. There's no hyphens allowed in TypeIDs, nor multiple encodings of the same ID with different variations of the ambiguous characters.
We have tests for the base32 encoding which is the most complicated part of the implementation (https://github.com/jetpack-io/typeid-go/blob/main/base32/bas...) but your point stands. We'll add a more rigorous test suite (particularly as the number of implementations across different languages grows, and we want to make sure all the implementations are compatible with each other)
Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?
> We have tests for the base32 encoding which is the most complicated part of the implementation
I didn’t look into it much but it seems like a great encoding even outside of this project. Predictable length, reasonable density, “double clickable” etc. I’ve been annoyed with both hex and base64 for a while so it’s pretty cool just by itself.
> Re: prefix, is the concern that I haven't defined the allowed character set as part of the spec?
Yeah, the worry is almost entirely “subtle deviations across stacks”, which is usually due to ambiguous specs. It’s so annoying when there’s minor differences, compatibility options etc (like base64 which has another “URL-friendly” encoding - ugh).
My personal favorite encoding is base58 aka Bitcoin address encoding. It uses characters [A-Za-z0-9] except for [0OIl]. It is almost as dense as base64, "double clickable", but not (as) predictable in length as base32.
It was chosen to avoid a number of the most annoying ambiguous letter shapes for hand-entry of long address strings.
Reminds me that Windows activation keys used to exclude a broader set of characters to avoid transcription errors: looking it up again: 0OI1 and 5AELNSUZ
Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.
I was reviewing encodings recently and didn't want to drop all the way down to base32, but for some reason the library i was using didn't allow anything beyond base32 and bas64 variants, despite having a feature where you can define your own base.
I thought maybe it was performance oriented. An odd prefix length like base63 would mean .. i think, a slightly more computationally demanding set of encoding instructions?
Either way i basically want base58 but i don't care about legibility, i just wanted double click and url friendly characters.
>Is there a good reason why Base63 (nopad) doesn't exist? Ie Base64 minus the `-`, so that you almost get the density of base64 (nopad) but the double click friendly feature.
Yes, the reason is that you need 64 characters if you want each character to encode 6 bits as log2(64) == 6. If you only have 63 characters in your alphabet then one of your 6-bit combinations has no character to represent it.
Base32 can represent 5 bits per character because log2(32) == 5. Anything in between 32 and 64 doesn't buy you anything because there is no integer between 5 and 6.
It seems that's not allowed currently, if I'm reading it right. I'm not sure I like `-` very much. The reason why I don't like it is because of how double-click to select and line breaking works for the dash. Maybe allowing `_` in the typename, and the have the rightmost `_` serve as the separator might be more consistent.
The benefit is that you can reject bad requests to an API more easily.
For one application I used a base 58 encoded value. Part of it was a truncated hmac, which I used like check digits. This meant I could validate IDs before hitting the DB. As an attacker or script kiddie could otherwise try a resource exhaustion attack.
So in the age of public internet faceing APIs and app urls, I think built in optional check digit support is a good idea.
Storage can get corrupted, columns can be truncated. For the applications I tend to touch correctness and the ability to detect errors and tamper are more important that a couple of bytes per row.
But every application and domain is different.
The CLI tool will support encoding/decoding any valid UUID, whether v1, v4, or v7. We picked v7 as the definition of the spec, because we need to choose one of them when generating a new random ID, and our opinion is that by default, that should be v7.
We might add a warning in the future if you decode/encode something that is not v7, but if it suits your use-case to encode UUIDv4 in this way, go for it. Just keep in mind that you'll lose the locality property.
It's based on UUIDv7 (in fact, a TypeID can be decoded into an UUIDv7). The main reasons to use TypeID over "raw" UUIDv7 are: 1) For the type safety, and 2) for the more compact string encoding.
If you don't need either of those, then UUIDv7 is the right choice.
That's how the type is encoded as a string, but type-safety ultimately comes from how the TypeID libraries allow you to validate that the type is correct.
For example, the PostgresSQL implementation of TypeID, would let you use a "domain type" to define a typeid subtype. Thus ensuring that the database itself always checks the validity of the type prefix. An example is here: https://github.com/jetpack-io/typeid-sql/blob/main/example/e...
In go, we're considering it making it easy to define a new Go type, that enforces a particular type prefix. If you can do that, then the Go type system would enforce you are passing the correct type of id.
The stated downsides come from poor data locality when using mostly random UUIDs; but you can keep most of the benefits of a globally unique identifier, and retain locality, by using UUIDv7.
At jetpack.io we've been doing exactly that via TypeIDs: https://github.com/jetpack-io/typeid and there's a PostgresSQL implementation available. TypeIDs are UUIDv7 with additional type information, so you also get type-safety in your IDs.
My personal take is that at some point you are better of just using a full programming langugage like TypeScript. We created TySON https://github.com/jetpack-io/tyson to experiment with that idea.