…which does a nice job of starting from the problem, and building up from "how could we improve JSON for use as a config language" to arriving at CUE.
(Those types of explanations happen to resonate with me, including understanding the "why" first at least for me helps the mechanics and details later make more sense and feel more intuitive…)
type Person = {
age: number
hobbies: string[]
}
const john: Person = {
age: 29
hobbies: [
"physics",
"reading",
]
}
* How do constraints relate with the rest of the language? Specifically, why a limited set of ad-hoc operators `>=18` / `=~ regex` vs (Scala) lambda forms `_ >= 18` / `_ =~ regex`
* Where is the line between a data language and a full fledged programming language? For example, the list package, while (intentionally?) missing map / fold, indicates a strong demand for rich list processing capabilities. https://pkg.go.dev/cuelang.org/go@v0.4.0/pkg/list
1. In CUE, types and values are equivalent and both live within the same lattice. You have a spectrum from the abstract to the concrete. See https://cuelang.org/docs/concepts/logic/ for a good overview. There are also definitions and structs, which have different semantics for closedness (ability to add fields). See https://cuetorials.com/deep-dives/closedness/ for an overview. One really interesting aspect of CUE is subsumption which can tell you if your types or API are backwards compatible.
2. Generally CUE is a turing-incomplete language, which means you cannot program. No user functions or lambdas by design. There are a number of utilities provided in the stdlib which are idempotent. It is not possible to maintain the guarantees of the language with user define functions. There are list and field comprehensions which are like a map. You can fold be simulated with comprehension. A proper fold could be added at some point.
1. The subsumption rule is elegant, but doesn't answer the pragmatic question: for what use-cases is a naive structural type system (familiar to a large population of developers) insufficient? Logic theory has type towers, but for 99% of the use-cases values and types are interesting, rarely kinds. Not even sure if 'type of kinds' has a name by itself. Possibly constraints are an answer, but the constraint system feels ad-hoc.
To rephrase, how would one describe Cue type system as a generalization of a naive structural type system?
2. Not sure what you mean by 'idempotent', perhaps 'pure' or 'no side-effects'? There is plenty of non-turing complete computations that can be done with pure functions.
I don't fully have my head around it either, but in my understanding you can have two mutually independent files that define `john: <constraint>` where <constraint> differs in each file, and in Cue these will be unified: the `john` value must satisfy both constraints.
For example, one file might have the constraint "john is an int" and the other "john is between 1 and 7". If the constraints clash (e.g., john is an int and john is a string), then it's an error.
That said, I think my disconnect is "when is this useful?" or "what can I do with this property?".
"When is this useful?" I can't comment on specific CUE design decisions, and speaking generally about data languages and not specifically about typing thereof.
A canonical use-case for data languages is a cloud deployment consisting of many almost identical items, e.g. k8 pods, described as a series of data templates with the the property that the cost of overriding any (shared) configuration parameter over the entire deployment is O(1).
In a standard function-based language this can be done by rewriting function arguments at runtime, i.e. some form of monkey patching. I haven't seen a good theoretical explanation on how data languages (GCL, jsonnet, CUE, etc.) avoid the monkey patching trap as opposed to embracing it with good terse monkey patching syntax & semantics.
Perhaps there is a there there, but until the basic evaluation semantics is clarified, I have little hope of grokking higher level concerns like typing.
I guess I don't have a good sense of the value of overriding a configuration parameter. If I were generating config in Starlark or Python, I would just provide the configuration as parameters to functions. If there's some function `def foo(): bar(x=4)` and I want to allow someone to change the value that is passed to bar.x, I would simply factor it out: `def foo(x=4): bar(x=x)`.
> In a standard function-based language this can be done by rewriting function arguments at runtime, i.e. some form of monkey patching. I haven't seen a good theoretical explanation on how data languages (GCL, jsonnet, CUE, etc.) avoid the monkey patching trap as opposed to embracing it with good terse monkey patching syntax & semantics.
I'm not entirely sure. I know you can arbitrarily layer on more constraints (e.g., you can impose your own naming conventions onto a Kubernetes Deployment "name" field by adding a `Deployment{name: <regex>}` constraint to your own files). But it's not clear to me how you would do other kinds of patches.
It's called unification and exists in many languages, Prolog is a well known one. CUE is in the logical language family. It uses Typed Feature Structures which originated in NLP before deep neural networks became capable.
It is specifically not inheritance where overrides are allowed.
> CUE [...] makes an effort to incorporate lessons learned from 15 years of GCL usage.
The main lesson learned by many people at Google, including the operators of many mission-critical services, was that GCL was far more trouble than it was worth. They used an alternative more mainstream configuration language instead.
Maybe Javascript? A lot of web tools support Javascript config files. There's this nice-looking effort to provide a hermetic execution environment for them: https://github.com/jkcfg/jk and if you use Typescript you get an extremely good static type system too. Plus the language is already very well known with loads of tool support and documentation.
Starlark has taken over the build system (as it was originally built for) and a small number of other systems at Google. There are a couple of other internal languages that found success in specific domains. But GCL remains the most common config language at Google unfortunately.
I hear the GCL lambdas make for a bad day. Marcel, who created CUE, helped develop Borg, borgcfg, GCL. He talks about the "unfortunate" sentiment in the last video on this page https://cuetorials.com/videos
Funny detail: I really did not understand GCL until I read about CUE and also JSonnet, which I believe are both by post-GCL ex/Googlers. Once I understood those languages, I started to get a handle on GCL. "GCL is like a fucked up JSonnet" is a better angle than "JSonnet is like GCL if it wasn't torture"
Jsonnet got rid of "up" and a few other details and suddenly things are much less likely to blow up in your hands.
curiosity: I once wrote a GCL function that computed a PNG data url (the thing that looks like '<img src=data:image/png;base64,iVBORw...' and can be natively opened by your browser) whose content was a rendering of the mandelbrot set. It took several GB of RAM to evaluate. Didn't port it to jsonnet yet...
- "parse, don't validate" has proven to me to be a far better approach for all my use-cases.
I guess validation that is language/consumer-agnostic is more useful in a big-tech context? Or maybe the advantage is to put constraints on data that you are sending to a system out of your control?
Keen to hear opinions of people that worked with similar tools.
Putting constraints on configuration is a big use case. CUE is much nicer to write than Yaml or JSON, and then you can output those formats for the programs which use the "old" formats. There is a vet command if you want to validate Yaml or JSON before moving to generating them. It has helped me catch errors and I use CUE instead of Helm for a lot of things now. I'm looking into generating Terraform JSON rather than HCL. If there was going to be a language, CUE is preferable to HCL.
Another thing I have been working on is a tool to diff, mask, upsert, and transform config or data files in bulk using CUE. https://github.com/hofstadter-io/cuetuls
I agree that it’s probably nicer than YAML, but that’s a low bar. I’m more interested in whether it’s nicer than something like Starlark (especially a hypothetical statically typed Starlark).
CUE aims to be turing-incomplete, so no programming. This is based on Marcel's experience writing borgcfg and GCL while at Google.
The quote I really like is "wrap code in data, not data in code"
It has a lot to do with readability, maintainability, and being able to write tooling to work with config or data. Think about building a query tool that can find everything in your config using a particular label. With CUE, it is likely to be a one-liner (once the query proposal is implemented)
I agree, but I don't understand the value of the difference.
> CUE aims to be turing-incomplete, so no programming
You could satisfy this property with an imperative language that lacks recursion, unbounded loops, etc; however, I'm not convinced this property is very useful in practice. I've never worked anywhere where this was a real problem.
> Think about building a query tool that can find everything in your config using a particular label.
In an imperative configuration language, I would just `jq` the generated JSON. At least I think that's analogous to your suggestion?
Marcel does a great job of arguing the value of Turing incompleteness. I've collected his talks here: https://cuetorials.com/videos I cannot say it better than he has.
JQ is close to how CUE would solve this. Both are different than an imperative config language, where you would recurse over data. For both JQ and CUE, you would provide a pattern matching "mask" of sorts. With CUE, you can also have a schema built into that mask and ensure that any transformations are still valid. You can also do the same with Yaml, so you get one tool and language to replace jq, yq, and jsonschema.
Cue might actually get bounded recursion, the idea has been thrown around. It ought not invalidate the theory because it can be proved all the way down. You can simulate it today with comprehensions. I've built some structural helpers on the pattern here: https://github.com/hofstadter-io/cuetils
True, though one may turn that around to say Cue's scope is too large while Jsonnet's is more "pure" while still enabling the larger scope.
Schema that describes data is itself just data and Jsonnet can describe both. While the various Jsonnet compilers do not "know" about any specific meta-schema in which to express schema nor of course any way to apply that schema to data for validation, one can create such systems with Jsonnet as the language and its compilers as a component.
True, CUE has quite the inspirational scope right in the name. It is actually what draws me to the language. I also like the theoretical foundations in being turing-incomplete. Functions and computation make configuration harder to understand and reason about.
I mostly generate yaml and json now that I've been using CUE. The cool thing about CUE is that it's really doing both at the same time. Types and values are just points along a spectrum of specificity, going from types to constraints to concrete values.
I found CUE because I needed something better than Yaml for generating code, or declarative application code. (https://github.com/hofstadter-io/hof). It was similar to when I found Go and replaced a bunch of C++. I gained way more functionality in my application while shedding more than 50% of my LOC.
Not OP, but "parse, don't validate"[0] says that data (including config, etc.) should go through a "parsing step" (no "shotgun parsing"[1]) which outputs specific data-types for the main processing code to deal with ("invalid states should be unrepresentable"[2]).
For example, we might have a "parsing step" which takes in Bytes (e.g. '[{"email": "chriswarbo@example.com"}]') and outputs a 'NonEmptyList[ContactDetails]' (e.g. 'NonEmptyList(Email(NonEmptyString('c', "hris"), NonEmptyString('e', "xample.com")), Nil)').
Compare this to a "validate" approach, e.g. parsing the bytes to a 'JsonArray', and using a boolean function to check whether it's empty, and whether its entries match the 'contact details' schema. That's problematic, since (a) it leads to "boolean blindness"[3] (the function tells us 'attempting to get those fields should work', whereas the parsing approach actually gives us those fields) and (b) such a boolean function is actually redundant, since deleting all of its call sites (or forgetting to write them to begin with) would give us a program that still compiles and runs (whereas the parsing approach would give a type error: 'Given JsonArray, expected NonEmptyList[ContactDetails]').
CUE's approach seems to be similar to passing JSON values around, and occasionally performing validation checks (if we remeber to); i.e. validating, not parsing.
I like Cue and Jsonnet and Starlark and so on. But all of these have very low mindshare (though Starlark has the most momentum thanks to Bazel), and who knows if they will be dead by next year.
Being an early adopter is difficult both in terms of the immaturity of the tooling — Cue, for example, only has a Go implementation at the moment — and in terms of the risk of betting on an evolutionary dead end, which can cause a lot of unnecessary churn when you want to standardize on something across an entire organization.
As a concrete example, I'd love to replace Kubernetes's use of YAML with something like the above. But the tooling is immature, and almost nobody is using any of it. For example, there's Isopod [1], which is a nice-looking tool to use Starlark with Kubernetes. But it might go the same way as Ksonnet.
> My recommendation is to make your choice based on the merits of the options, rather than trying to keep up with the latest trend.
It's not a matter of keeping up with the latest trend, it's a matter of moving a whole org to a technology or project that gets killed off. There are real costs to making the wrong choice.
> If nothing else, it will make you smarter, and that will make you more conscious, and that will make you more alive.
Which is great if you're talking about a personal project but if you're setting strategy for an org or team then you need a better metric than "how alive does it make you feel."
CUE is something that can augment your existing tools. You don't have to replace them. You can adopt incrementally to add confidence where you find repeated configuration mistakes. If you want to validate your configuration with concise schemas, then you can use it like JSON Schema, even importing your existing schemas and making them readable.
CUE will make you think about the structure and taxonomy of your configuration / data. If anything, this is a good exercise that can make your code better even if you don't end up adopting CUE.
Yes, but if you move all your stuff to Cue, you've committed to it for a while. My company has tons of apps on Kubernetes: Currently around 2,200 resources totaling more than 100,000 lines of YAML. And we're not a big organization. But once migrated, even an org our size is going to find it hard to migrate to something else.
Being snarky, I'd say you won't look back if you did migrate. Again, you could just place CUE next to Yaml to validate it.
You can always write the yaml version to disk from CUE. CUE will also read and import your Yaml so that you don't have to rewrite it by hand. It would certainly be harder to migrate away from something like Starlark or Pulumi where config is wrapped in code. 3rd party tooling is much harder to write under that paradigm.
Marcel, the creator of CUE, wrote the prototype that became Borg and eventually Kubernetes. He was also on the teams that wrote the config languages for these systems. Your use case has been central to CUE's design.
> Which is great if you're talking about a personal project but if you're setting strategy for an org or team then you need a better metric than "how alive does it make you feel."
I don't think GP meant you base your strategy on "how alive Cue makes you feel". They simply meant the concepts are a good read and having read the Cue documentation before, I agree with it.
There are plenty of well supported Jsonnet libraries for Kubrnetes. Have a look at [1] as a starting point. I’ve used it in production with great success, with tanka[2] as the glue
.
I use them to manage infrastructure running in Kubernetes. One of my coworker’s previous companies actually funded the cue support in IntelliJ. For kubernetes, it actually introspects the structs and interfaces to create the validation bits. It has a high learning curve but is surprisingly nice once you get the hang of it. There is a lot less duplication due to it in my use.
We are trying an interesting use-case at Overseed (https://www.overseed.io/). We recently replaced JSON with CUE as our data configuration language. To summarize, our application allows users to describe their fake data's attributes and behavior and then output that data as a file or stream API.
At the moment, we mainly use CUE for the structure, variables, and validation. It replaced our custom JSON and validation logic and cleaned up a lot of our code! We are still exploring moving more of our specifications to CUE style configuration and validation.
We are still early-stage users (a few weeks). So far, we love the concise syntax, strong typing, and validation. I plan to spend some time this week checking out the querying, scripting, and external tools.
This looks super promising, but I'm a coding and Cuelang newbie. I'm confused as to how to use it.
Let's say I'm building a tool for people to upload a lot of data, in the form of CSVs and JSON files, POSTing them into my node API. I'm writing those files to a document database.
I'm trying to figure out how to use Cue in this process. I'm planning on using JSON Schema definitions, and Ajv as the validator — would I purely be using Cue to generate and maintain my JSON schemas, which I use to validate with Ajv?
Or can Cue be used more directly in my node API, e.g. using some npm validation tool that lets me use Cue to validate a bunch of CSV files converted to JSON?
I think Cue is more aimed at configuration file management.
Think Kubernetes, Ansible, Puppet, Chef, and Salt. There you have tens to hundreds to thousands of YAML/JSON files that describe your server configuration and there's no way to validate any of it out of the box.
Even more broadly, most configuration files in general don't have any validation. For example things in the /etc/ directory on a Linux server. That fits under configuration file management too.
I'm not sure about your specific use case, but if you haven't considered something like Protocol Buffers -- I'd highly recommend at least looking into it. JSON sucks, sometimes.
Execute is in the name. There is a DAG engine (tool/flow) that automatically calculated dependencies. It powers the scripting layer and https://dagger.io
In theory you could use CUE to validate requests and insert defaults and implied data, replacing Ajv.
1. Get the CUE value representing the schema endpoint
2. Convert the JSON request into a CUE value
3. Unify the request with the schema
If unification fails, return 400 Bad Request.
If it succeeds, deserialize the resulting third CUE value into a programming language value.
CUE will fill fields missing from JSON by either using the default values from schema or inferring fields that have only one possible value (which may depend on all the other fields of request and schema).
I have not used starlark, but to me it looks like a cleaner competitor to hcl. It can be used to generate verbose configs. It doesn't seem to enforce types and constraints on the fields. Surely there must be a way to do that using skylark..?
But in cue types and constraints are just another form of values.
Cue is a nicer jsonSchema competitor in which valid json is also valid cue
I’ve done a lot with Starlark, but my big beef is the lack of types. Cue aspires to be declarative which I never find myself reaching for, but maybe that’s just for lack of experience with the declarative category of tools.
The biggest difference is probably CUE is Turing Incomplete, which means no programming or functions. CUE is logical and proves that what you have written is valid and correct.
CUE is a superset of JSON, so all JSON is valid CUE.
The other main thing is the value lattice where everything from abstract types to concrete data lives.
I haven't actually used Dhall, the syntax rubs me the wrong way or something.
There is no noise. Size of configs is close to optimum. Cross cutting concerns are easy to express. Quality of configs is very high similar to well typed code with algebra on types, this idea of values are types and lattice based unification is very intuitive.
...and we are generating manifests for "older, more production-tested systems", ie. docker compose for docker vms/swarm services - it's really great for that.
Looks nice. I guess my question is how easy would it be to implement this into into the providers directly? Reminds me of OPA which I believe does something similar already.
Depends on what you mean by providers. There is only a Go implementation at present.
The more complicated thing is that it supports imports and modules. A lot of programs may need some adjustment for this concept. For example, what do you put in a k8s ConfigMap? Should a container have an init step to fetch modules?
Grafana and Sigstore are making Cuelang first class. There are some other CUE native applications developing in the community.
I played with Cue last year and while I think the concept is interesting, I've been struggling a bit with actually finding a good application for the technology. I'd be more interested in using it for validation as a less brain-dead alternative to e.g. json-schema but the documentation talks mostly about manifest generation and Kubernetes-oriented workflows, which already has Kustomize and Jsonnet and templating solutions like Helm.
There are interesting use cases for ETL. I have written a tool that generates full-stack code using CUE as the declarative input for your types. (https://github.com/hofstadter-io/hof) For CI/CD from source to production, see https://dagger.io by the creators of Docker. That is a super cool project.
I no longer use Helm now that I use CUE, at least I no longer write charts. We do need a Helm like workflow tool built on CUE to replace the text/templating of Yaml
Once "cue mod" is managing deps, there is potential for increased and better sharing of module configuration, much like Helm is for kubernetes applications.
>CUE is a bit different from the languages used in linguistics and more tailored to the general configuration issue as we've seen it at Google. But under the hood it adheres strictly to the concepts and principles of these approaches and we have been careful not to make the same mistakes made in BCL (which then were copied in all its offshoots). It also means that CUE can benefit from 30 years of research on this topic. For instance, under the hood, CUE uses a first-order unification algorithm, allowing us to build template extractors based on anti-unification (see issue #7 and #15), something that is not very meaningful or even possible with languages like BCL and Jsonnet.
Not related to Cue, but "All is good now" simply shows how to shoot yourself in the foot with yaml and apply a nice patch to a wound.
That's not good at all.
More generally speaking, CUE has overlap with propagator networks.
Given
x = y + 1
y = x - 1
You can specify either x or y and the other will be solved for. A more powerful PN system can be built on CUE, especially given the DAG solver in the code base
Potentially naive question - If I had a non-user-facing CRUD admin backend, why should I not use this for validating the data while creating or editing records?
If you already have validation, hand tuned and correct, it will perform better. You will have overhead during the encoding to validate in CUE. This may not be a concern, also CUE has some performance goals that should make it a non concern as well.
A Starlark interpreter is typically embedded within a larger application, and the application may define additional domain-specific functions and data types beyond those provided by the core language
Pretty much what Lua provides. I settled for Lua many years ago.
While the concept of bottom is not going away, the lexical symbol may be replaced with the word. I'm particularly amused that bottom in parentheses looks like the bottom of a human... (_|_)
You will be invested in the workflow and toolchain building.
Barely anyone uses it, so have fun training up new hires.
Some folks just straight up left the company rather than get sucked into a new DSL
Frankly, the AWS SDK works great. The documentation includes working examples, is available in familiar syntax, I avoid CF’s opinions, and state is stored in the release branch.
I can successfully rebuild any version of our infra by checking out a git hash.
Meanwhile month old Terraform and cue break because a middleman had a brilliant idea at 1AM or some reason.
IMO devops as an idea jumped the shark almost immediately. Folks decided developer habits were to be added to ops habits, but on the ground it just meant “use source control”. No other wisdom, like don’t repeat yourself and write less code, came along.
I think SRE was meant to fix that but the same people and perspectives just took that over with DSL hell too
I’m don’t burn out on tech work. I burn out on tech worker bullshit
https://bitfieldconsulting.com/golang/cuelang-exciting
…which does a nice job of starting from the problem, and building up from "how could we improve JSON for use as a config language" to arriving at CUE.
(Those types of explanations happen to resonate with me, including understanding the "why" first at least for me helps the mechanics and details later make more sense and feel more intuitive…)