I don't think I'll ever understand why YAML was chosen as the language of choice...

lukeschlather · on Dec 15, 2023

I don't really think YAML is the problem, it's the string-based templating. I'd like to see Emrichen or something like it become more common. And Emrichen is format-agnostic, you can write your stuff in json or YAML and it looks and works pretty similarly. (Although I stick to yaml.)

https://github.com/con2/emrichen

If JSON had comments I might lean toward json.

kevincox · on Dec 15, 2023

100%. Target unaware templating is always a mistake. It is unfortunately common in our industry because it is easy and works most of the time. But this is also why SQL injections and XSS are one of the most common vulnerability. SQL injections are getting better because people are more often using parameterized queries which never need to actually encode the values into the "template" and XSS is getting better because most big frameworks now have target-aware templating that properly serializes values. But these are both hugely common issues. Look up a tutorial about how to use SQL or make your first website and more likely than not you will see examples that have vulnerabilities. Our whole industry is teaching the wrong thing by default, then hoping to fix it later.

String concatenation may have been a mistake. The concept of a string may have been a mistake. Every sequence of bytes has some structure, and in order to mash two "strings" together they need to be serialized in the correct way. Even when building error messages it would be nice if you can reliably identify the "chrome" from the "content".

Also look at terminal escape sequences. When we print text to a terminal we should probably be replacing non-printable characters with some sort of encoding so that the reader can understand that 1. these are from the content not the application and 2. Not do stuff like delete the output line to trick the reader.

Every time you put two strings together you should think about how you need to properly encode one into the other. Output unaware text templating almost always fails this because it relies on the user to do this for every single interpolation, and that is doomed to fail.

I wrote an even longer rant about this on my blog a while ago: https://kevincox.ca/2022/02/08/escape-everything/

jauntywundrkind · on Dec 15, 2023

JSON5, JSONC, and others have comments.

Also, there's the old dirt hack

  {"item": "this is a comment",
  "item": "//so is this but more obvious",
  "item": "because in 99.99% of implementations, last value wins:"
  "item": 42}

bvrmn · on Dec 15, 2023

It's kinda important to keep comments on automated transformations.

tbrownaw · on Dec 15, 2023

It's "declarative" which is supposed to be good, and you can represent complex data structures without any annoying closing braces (like json) or tags (xml).

Plus, well, there's a disturbingly high number of people who think that semantic whitespace is a positive.

yeetcode · on Dec 15, 2023

When did the universe decide braces were so bad? Makes formatting so much easier.

tbrownaw · on Dec 15, 2023

Encoding the same information in both indentation and braces is redundant, and so is a violation of DRY, and so is bad.

(Of course the reality is that braces are how you write scoping information and indentation is how you read it, and CQRS is actually a good thing.)

lucasyvas · on Dec 15, 2023

This is ridiculous. Indentation is a visual concern for humans and braces are a semantic concern for a machine. Code only needs the latter to actually execute.

You can prefer them be coupled for taste reasons but they are fundamentally different. Code minification is a practical example where you can save many characters by omitting whitespace.

This is like tabs vs spaces. The white spaces crowd prefer the trade offs, and the tab crowd don't. But the white space crowd can't argue the perks of tabs - they exist.

disclaimer: Am braces/whitespace camp.

happymellon · on Dec 15, 2023

The tabs/spaces war was fought because of IDE choices that restricted your ability for sane tab rendering defaults, and simple overriding of tab size.

Now that terrible text editors won, I have to watch a 2 Vs 4 space war.

If only we could have a character that could represent indentation and people could set the rendering so they can visualise it in their own preferred way.

And folks who want to remove braces should be pointed to the Apple certificate snafu.

pas · on Dec 15, 2023

if only whitespace was treated as such visually, and the IDE would serialize to whatever the repository uses, and similarly for YAML-JSON.

> Now that terrible text editors won

yes. but not all hope is lost, maybe AGI helps/wipes us out!

happymellon · on Dec 15, 2023

Considering that we lost tabs because some peoples text editors and web browsers rendered them as 8 spaces, and drew the wrong conclusion, I don't have much hope for your plan.

However I would be completely on board if we could agree on a formatting configuration file that people could check into a repo for IDEs to pick up.

turboponyy · on Dec 15, 2023

I think we've mostly agreed on editorconfig[1], though it doesn't concern itself with language-specific formatting - it's even built into a lot of editors.

For language-specific formatting, you can run formatters in pre-commit-hooks and CI. Treefmt[2] can help with that if you want to cover a lot of languages in your repo.

1. https://editorconfig.org/

2. https://github.com/numtide/treefmt

happymellon · on Dec 15, 2023

I'll have to check out EditorConfig.

It looks like IntelliJ supports it, so that could cut down on the amount of work required to get it in a project.

the_gipsy · on Dec 15, 2023

But you see that indentation is always written in the source plaintext, together with the braces, right?

lucasyvas · on Dec 15, 2023

100%

I see it, I understand the argument for coupling them. But the argument makes an assumption I am not comfortable with - it says they are there for the same exact same reason, which is not actually true. It is often incidentally true in practice depending on the needs of the language - but it is not universally true across all needs in all languages.

It's the same thing with semi-colons. Having a statement separator provides practical benefits in several languages. In many of these languages, they are also optional.

If you were to say that all languages should parse both styles to get the best of both worlds, that wouldn't be completely unreasonable. But it makes the parsing more complex than necessary to support both, so only one is often supported - which is fair.

the_gipsy · on Dec 15, 2023

That's not true, the argument does not "say they are there for the same exact reason", that's your strawman. The argument is just: "they always come together thus one is redundant and we obviously can't remove indentation". The difference in reason, syntax/compiler vs. legibility, is irrelevant.

I mean, you can argue for having both for a myriad of reasons. It's redundant and no big deal, a lot of languages do just that. And some languages do fine with only indentation and no braces. But there is no language that does braces without indentation, or at least stylewise the code is always indented.

wizerdrobe · on Dec 15, 2023

I’m always amazed at coworkers with a 100 WPM typing speed and IntelliSense griping about how hard it is to type.

jauntywundrkind · on Dec 15, 2023

Inbound links from HN blocked I think, but Tom Macwright's proclamation that typing is not the problem has long stuck with me. https://macwright.com/2015/01/19/typing-is-not-the-problem

These optimizations for some perceived ergonomic win almost always make terrible tradeoffs versus using a good well established data format. And especially systems which favor human consumption but create extreme difficulties for machine handling, those are the worst!

Yaml being such a non-Context Free Grammar is a huge pain. There's so much state in the parser. It only gets worse from there. Yaml has all kinds of wild crazy capabilities. References, a variety of inline content blocks, and weird ways to invoke stuff?? GitHub yesterday did a code review of Frigate, an enormously popular surveillance video analysis tool that's heavily downloaded, and found, oh yes, a huge glaring yaml bug allowing remote execution, because executing arbitrary code is just built right in to yaml amid 3000 other crazy hacks & who would have known to go look for & disable that capability?! https://news.ycombinator.com/item?id=38630295 https://github.blog/2023-12-13-securing-our-home-labs-frigat...

Typing is not the problem (even though I see so many people just terrible beyond words at navigating project structures or the command line... Improve! Some day!).

mieubrisse · on Dec 15, 2023

I do think there's a power to the readability that makes it more approachable (but which eventually burns you). We were rewriting our AST at Kurtosis last year, and the default choice we were going to go with was of course YAML. But we came across a Github issue from DroneCI (who also started with YAML) that said something like, "we started with YAML, and we learned that you always eventually want to add more complex lotic on top. Go with a Turing-complete language to begin with, else you'll be in the CircleCI trap inventing a language via YAML DSL."

We decided to go with Starlark as the base language for our DSL, and we've had a consistently great experience. Users report that it's very approachable, and the starlark-go library is very pleaeant to deal with.

kevincox · on Dec 15, 2023

I totally agree that string-templating into a data serialization format is a mistake. But you can make life dramatically easier on yourself by doing `{{ something | toJson }}`. In fact write a linter that every single substitution is followed by `| toJson` and you will save yourself a lot of headaches.

The main issue is that it make it more difficult to mix hardcoded and inserted values.

    labels:
        - mylabel
        - {{ extraLabels | indent 4 }} # toJson doesn't work here.

Also the small technical concern that YAML isn't actually a superset of JSON. (But you are far less likely to hit these cases than other escaping bugs).

mdaniel · on Dec 15, 2023

As far as I know the way helm thinks about that problem is putting any such literals into a temp copy _then_ serializing https://masterminds.github.io/sprig/lists.html#append-mustap...

  labels: {{ append .extraLabels "myLabel" "myOtherLabel" | toJson }}

your commentary talks about json but your code snippet is in yaml, so it's possible one or both of us are solving the wrong problem

kevincox · on Dec 15, 2023

YAML is (almost) a superset of JSON. So it is much easier to serialize data into JSON than avoiding worrying about indentation.

mdaniel · on Dec 15, 2023

I think you are lobbying the wrong person about the relationship between yaml and json; I was trying to point out that you had

  thing:
  - item1
  - {{ foo | toJson }} # <-- is not going to do what you expect

unless you quite literally wanted

  thing:
  - alpha
  - - alpha1
    - alpha2

kevincox · on Dec 15, 2023

Ah, I did have a typo. But you also copied it wrong. I wrote this

    labels:
        - mylabel
        - {{ extraLabels | indent 4 }} # toJson doesn't work here.

Which has an extra `-` and as you pointed out would produce a nested list.

But I meant this which would merge the lists.

    labels:
        - mylabel
        {{ extraLabels | indent 4 }} # toJson doesn't work here.

nucleardog · on Dec 15, 2023

The other popular option would be JSON.

No comments and having to escape.... well, practically everything you'd commonly be entering makes JSON suck at this sort of task.

taspeotis · on Dec 15, 2023

Yelling At My Laptop

Too · on Dec 15, 2023

Yaml has a lot of problems, this is not one of them.

This more shows that one shouldn’t be using string templating to create data structures in the first place.

bbkane · on Dec 15, 2023

What would you recommend instead? Personally, I think Typescript makes a great config language

jpgvm · on Dec 15, 2023

Dedicated config languages are the best usually. Jsonnet/Cue/friends.

Failing that if you actually need/want a procedural/non-pure language then I think Kotlin or Ruby take the cake. Both have extremely strong support for DSLs which IMO is key to reaching a modicum of usability.

sakjur · on Dec 15, 2023

Starlark is nice as well, it’s syntactically based on Python, and behaves a lot like regular procedural languages, but it’s meant to provide a pure and safe environment for configuration and be embeddable.

mieubrisse · on Dec 15, 2023

Big +1. We switched to Starlark for our DSL last year and have been very pleased. Users who've never used Starlark before come in with some 'what, another language?' trepidation, and end up pleasantly surprised.

jpgvm · on Dec 15, 2023

Since using Bazel a bit I have grown an appreciation for Starlark also. The big thing is that list and dict comprehensions are a really nice fit for these types of tasks.

10000truths · on Dec 15, 2023

Lua is fantastic for the config-as-code use case. Easy to read and write, with a lightweight embeddable interpreter, and it has almost universal library support across different languages/environments.

ishigoemon · on Dec 15, 2023

It's a shame that helm v3 didn't move forward with the lua engine[0]. I don't imagine ~=/1-based arrays were a worse timeline... And here we are 5 years later.

[0] https://github.com/helm/helm/issues/5084

ljm · on Dec 15, 2023

I dunno - using cdk8s at one place was a miserable experience. Adding runtime evaluation just made everything more complex than it should be.

In that case I preferred to describe infrastructure rather than program it.

maximus-decimus · on Dec 15, 2023

ohhh, that's why there are "indent 4". it just never clicked for me I guess. I thought it was just some dark magic incantations.

smsm42 · on Dec 15, 2023

YAML itself is not too bad, especially with a good IDE. But YAML and its ws-sensitivity combined with Go templating is horrible. Every component by itself looks kinda reasonable, but when they come together it makes an unholy mess.