I don't think I'll ever understand why YAML was chosen as the language of choice for all things devops, and why so many startups have sought to augment yaml with syntactical hacks
I mean, as soon as you template it and need to do `{{ something | indent 4 }}` or some shit to make the template work you know you're on a bad track.
I don't really think YAML is the problem, it's the string-based templating. I'd like to see Emrichen or something like it become more common. And Emrichen is format-agnostic, you can write your stuff in json or YAML and it looks and works pretty similarly. (Although I stick to yaml.)
100%. Target unaware templating is always a mistake. It is unfortunately common in our industry because it is easy and works most of the time. But this is also why SQL injections and XSS are one of the most common vulnerability. SQL injections are getting better because people are more often using parameterized queries which never need to actually encode the values into the "template" and XSS is getting better because most big frameworks now have target-aware templating that properly serializes values. But these are both hugely common issues. Look up a tutorial about how to use SQL or make your first website and more likely than not you will see examples that have vulnerabilities. Our whole industry is teaching the wrong thing by default, then hoping to fix it later.
String concatenation may have been a mistake. The concept of a string may have been a mistake. Every sequence of bytes has some structure, and in order to mash two "strings" together they need to be serialized in the correct way. Even when building error messages it would be nice if you can reliably identify the "chrome" from the "content".
Also look at terminal escape sequences. When we print text to a terminal we should probably be replacing non-printable characters with some sort of encoding so that the reader can understand that 1. these are from the content not the application and 2. Not do stuff like delete the output line to trick the reader.
Every time you put two strings together you should think about how you need to properly encode one into the other. Output unaware text templating almost always fails this because it relies on the user to do this for every single interpolation, and that is doomed to fail.
It's "declarative" which is supposed to be good, and you can represent complex data structures without any annoying closing braces (like json) or tags (xml).
Plus, well, there's a disturbingly high number of people who think that semantic whitespace is a positive.
This is ridiculous. Indentation is a visual concern for humans and braces are a semantic concern for a machine. Code only needs the latter to actually execute.
You can prefer them be coupled for taste reasons but they are fundamentally different. Code minification is a practical example where you can save many characters by omitting whitespace.
This is like tabs vs spaces. The white spaces crowd prefer the trade offs, and the tab crowd don't. But the white space crowd can't argue the perks of tabs - they exist.
The tabs/spaces war was fought because of IDE choices that restricted your ability for sane tab rendering defaults, and simple overriding of tab size.
Now that terrible text editors won, I have to watch a 2 Vs 4 space war.
If only we could have a character that could represent indentation and people could set the rendering so they can visualise it in their own preferred way.
And folks who want to remove braces should be pointed to the Apple certificate snafu.
Considering that we lost tabs because some peoples text editors and web browsers rendered them as 8 spaces, and drew the wrong conclusion, I don't have much hope for your plan.
However I would be completely on board if we could agree on a formatting configuration file that people could check into a repo for IDEs to pick up.
I think we've mostly agreed on editorconfig[1], though it doesn't concern itself with language-specific formatting - it's even built into a lot of editors.
For language-specific formatting, you can run formatters in pre-commit-hooks and CI. Treefmt[2] can help with that if you want to cover a lot of languages in your repo.
I see it, I understand the argument for coupling them. But the argument makes an assumption I am not comfortable with - it says they are there for the same exact same reason, which is not actually true. It is often incidentally true in practice depending on the needs of the language - but it is not universally true across all needs in all languages.
It's the same thing with semi-colons. Having a statement separator provides practical benefits in several languages. In many of these languages, they are also optional.
If you were to say that all languages should parse both styles to get the best of both worlds, that wouldn't be completely unreasonable. But it makes the parsing more complex than necessary to support both, so only one is often supported - which is fair.
That's not true, the argument does not "say they are there for the same exact reason", that's your strawman. The argument is just: "they always come together thus one is redundant and we obviously can't remove indentation". The difference in reason, syntax/compiler vs. legibility, is irrelevant.
I mean, you can argue for having both for a myriad of reasons. It's redundant and no big deal, a lot of languages do just that. And some languages do fine with only indentation and no braces. But there is no language that does braces without indentation, or at least stylewise the code is always indented.
These optimizations for some perceived ergonomic win almost always make terrible tradeoffs versus using a good well established data format. And especially systems which favor human consumption but create extreme difficulties for machine handling, those are the worst!
Yaml being such a non-Context Free Grammar is a huge pain. There's so much state in the parser. It only gets worse from there. Yaml has all kinds of wild crazy capabilities. References, a variety of inline content blocks, and weird ways to invoke stuff?? GitHub yesterday did a code review of Frigate, an enormously popular surveillance video analysis tool that's heavily downloaded, and found, oh yes, a huge glaring yaml bug allowing remote execution, because executing arbitrary code is just built right in to yaml amid 3000 other crazy hacks & who would have known to go look for & disable that capability?! https://news.ycombinator.com/item?id=38630295https://github.blog/2023-12-13-securing-our-home-labs-frigat...
Typing is not the problem (even though I see so many people just terrible beyond words at navigating project structures or the command line... Improve! Some day!).
I do think there's a power to the readability that makes it more approachable (but which eventually burns you). We were rewriting our AST at Kurtosis last year, and the default choice we were going to go with was of course YAML. But we came across a Github issue from DroneCI (who also started with YAML) that said something like, "we started with YAML, and we learned that you always eventually want to add more complex lotic on top. Go with a Turing-complete language to begin with, else you'll be in the CircleCI trap inventing a language via YAML DSL."
We decided to go with Starlark as the base language for our DSL, and we've had a consistently great experience. Users report that it's very approachable, and the starlark-go library is very pleaeant to deal with.
I totally agree that string-templating into a data serialization format is a mistake. But you can make life dramatically easier on yourself by doing `{{ something | toJson }}`. In fact write a linter that every single substitution is followed by `| toJson` and you will save yourself a lot of headaches.
The main issue is that it make it more difficult to mix hardcoded and inserted values.
Also the small technical concern that YAML isn't actually a superset of JSON. (But you are far less likely to hit these cases than other escaping bugs).
Dedicated config languages are the best usually. Jsonnet/Cue/friends.
Failing that if you actually need/want a procedural/non-pure language then I think Kotlin or Ruby take the cake. Both have extremely strong support for DSLs which IMO is key to reaching a modicum of usability.
Starlark is nice as well, it’s syntactically based on Python, and behaves a lot like regular procedural languages, but it’s meant to provide a pure and safe environment for configuration and be embeddable.
Big +1. We switched to Starlark for our DSL last year and have been very pleased. Users who've never used Starlark before come in with some 'what, another language?' trepidation, and end up pleasantly surprised.
Since using Bazel a bit I have grown an appreciation for Starlark also. The big thing is that list and dict comprehensions are a really nice fit for these types of tasks.
Lua is fantastic for the config-as-code use case. Easy to read and write, with a lightweight embeddable interpreter, and it has almost universal library support across different languages/environments.
It's a shame that helm v3 didn't move forward with the lua engine[0]. I don't imagine ~=/1-based arrays were a worse timeline... And here we are 5 years later.
YAML itself is not too bad, especially with a good IDE. But YAML and its ws-sensitivity combined with Go templating is horrible. Every component by itself looks kinda reasonable, but when they come together it makes an unholy mess.
I mean, as soon as you template it and need to do `{{ something | indent 4 }}` or some shit to make the template work you know you're on a bad track.