Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Good programmers worry about data structures and their relationships (softwareengineering.stackexchange.com)
234 points by rbanffy 11 months ago | hide | past | favorite | 130 comments


Looks like that substack just copied a bunch of quotes from this Stack Exchange post:

https://softwareengineering.stackexchange.com/questions/1631...


I've changed the URL from https://read.engineerscodex.com/p/good-programmers-worry-abo... now. Urgh spammers.

There was a wave of this substack spam a couple months ago and I suppose this is the same bad actor starting up again.


Sorry about that. Next time I’ll do more research.


Thanks rbanffy - and thanks for being a great HN contributor.

It would be good (not just for you but any of us) to do some googling for language drawn from substacks to see if they're lifted from other sources - because I think this spammer has been specializing in creating substack sites for a while.


> Thanks rbanffy - and thanks for being a great HN contributor.

I love this community and this is the least I can do for it.


Didn't even bother to change the order of the quotes. That's bold.


And that paragraph at the end came out of nowhere:

> It’s why one of the Senior Engineer (L5) requirements (at least for FAANG) generally involves writing higher-level design docs for more complex systems (which includes driving team planning and building good roadmaps for medium-to-large features).


I wonder if he'll file a bug report with whatever AI ghostwriter he's using.


All gen AI is unattributed plagiarism, but not all unattributed plagiarism is gen AI.


Sorry, I was being flippant. The author, of course, could be an old school plagiarist.


"Show me your flowcharts [code], and conceal your tables [schema], and I shall continue to be mystified; show me your tables [schema] and I won't usually need your flowcharts [code]: they'll be obvious." -- Fred Brooks, "The Mythical Man Month", ch 9.


I once wrote to John Carmack as a Quake-obsessed kid, asking for any advice he has for an aspiring programmer and if he had any favourite books. To my surprise he wrote back a really thoughtful response, including the following:

"Read The Mythical Man Month. I remember thinking that a book that old can't say anything relevant about software development today, but I was wrong."


I came here to share this quote because it's so true.

Except when the effort to change the database schema becomes significantly greater than the effort to change the code, and then application developers start abusing the database because it's faster and they have things to do.


Data structures are not the same thing as types. Data structures are bit patterns and references to other bit patterns (pointers or relationships). Types (as they are used in programming languages) place some constraints on those bit patterns, but can also encode many other language features.

Creating an elaborate type hierarchy with unnecessary abstractions is not what is meant by "worrying about data structures", and that tendency is one of the most common failure modes for otherwise smart engineers.


I think this is a subtle and important point. Types are a potential useful tool to restrain and and specify the schema of data structures. But concern for types is very different then concern for the data structures.


In OOP types and data structures have a much less visible border between them.


Exactally, which is why I think this is a subtle and important point.


Types are data structures that the language is aware of. This allows the tooling to do checks that they can't do on plain old data structures.


Data structures are algorithms at rest. They shuffle and move things each operation, but mostly sit still, like a turing machine that people only crank once in a while.

Types are the bits on disc.


Good catch.

Equating data structures to types is an over simplification that misses the core point.

I think the original call here is to simply think harder about the problem and avoid picking structures that'll burn you later.

For example, take Unix pipes, see how far they've traveled, how many domains, how many use cases. It's a brilliant way to visualize system building while respecting the constraints of minds and machines.

And it took Ken and others quite a while to realize something like pipes could make sense in Unix. It was not an insight easily obtained but required a bit of hustle and followup and obsession with finding the right building blocks for a system.


The same data structure can be assigned different types, that's what typedef operator in Pascal does.


Linus always has a great way of summarizing what others might be thinking (nebulously). What's being said in the article is really mirrored in the lost art of DDD, and when I say "lost" I mean that most developers I encounter these days are far more concerned with algorithms and shuttling JSON around than figuring out the domain they're working within and modelling entities and interactions. In modern, AWS-based, designs, this looks like a bunch of poorly reasoned GSIs in DDB, anemic objects, and script-like "service" layers that end up being hack upon hack. Maybe there was an implicit acknowledgement that the domain's context would be well defined enough within the boundaries of a service? A poor assumption, if you ask me.

I don't know where our industry lost design rigor, but it happened; was it in the schools, the interviewing pipeline, lowering of the bar, or all of the above?


I’d argue software design has never been taken seriously by industry. It’s always cast in negative terms, associated with individuals seen as politically wrong/irrelevant and brings out ton of commenters who can’t wait to tell us about this one time somebody did something wrong, therefore it’s all bad. Worse, design commits the cardinal sin of not being easily automated. Because of this, people cargo cult designs that tools impose on them, and chafe at the idea that they should think further on what they’re doing. People really want to outsource this thinking to The Experts.

It doesn’t help that isn’t really taught, but is something you self-teach over years, it is seen as less real than code (ergo, not as important). All of these beliefs are ultimately self-limiting and keep you at advanced beginner stage in terms of what you can build, however.

Basically, programmers collectively choose to keep the bar as low as possible and almost have a crab-like mentality on this subject.


I can see a swing finally starting. It isn’t “huge” by any stretch, but at the same time

“deVElOpErS aRe MoRE EXpEnSivE tHaN HArDwaRE”

Commenters are no longer just given free internet points. This is encouraging as these people controlled the narrative around spending time on thinking things through and what types of technical debt you should accept for like 20 YEARS.

I think maybe people are finally sick of having 128 gigs of ram being used by a single 4kb text file.


There is some truth to the idea that developer time is expensive, and can dwarf the monetary gains gotten through micro-optimization.

I agree that some people took the idea to mean "what's a profiler?" and that is why our modern machines still feel sluggish despite being mind-bogglingly fast.


This might be driven by the cost per computation being vastly lower while the benefit having remained mostly constant. There is little incentive for making a text editor that runs in 10k of memory because there is no benefit compared to one that runs in 10 megabytes or, soon, 10 gigabytes.

I spend a lot of my day in VScode and PyCharm and the compute resources I consume in an hour are more than what the Apollo program consumed over its full existence. Our collective consumption at any given decade is most likely larger than the sum of computing resources consumed up until that point in our history.


> most developers I encounter these days are far more concerned with algorithms and shuttling JSON around than figuring out the domain they're working within and modelling entities and interactions

The anemic domain model was identified as an anti-pattern quite a long time ago[1]. It usually shows up along with Primitive Obession[2] and result in a lot of code doing things to primitive types like strings and numbers, with all kinds of validation and checking code all over the place. It can also result in a lot of duplication of code that doesn't look obviously like duplication because it's not syntactically identical, yet it's functionally doing the same thing.

1 https://martinfowler.com/bliki/AnemicDomainModel.html

2 https://wiki.c2.com/?PrimitiveObsession


The industry predominately rewards writing code, not designing software.

I think the results of bad code aren't as obvious. A bad bridge falls down, bad code has to be... refactored/replaced with more code? It goes from one text file that execs don't understand to a different text file that execs doesn't understand.

And once something works, it becomes canon. Nothing is more permanent than a temporary hack that happens to work perfectly. But 1000 temporary hacks do not a well-engineered system make.

I believe that maturing in software development is focusing on data and relationships over writing code. It's important to be able to turn it into code, but you should turn those into code, not turn code that works into a data model.


> The industry predominately rewards writing code, not designing software.

The sad part of this is that code is absolutely a side-effect of design and conception: without a reason and reasonable approach, code shouldn't exist. I really think that the relative austerity happening in industry right now will shine a light on poor design: if your solution to solving poorly understood spaces was to add yet another layer of indirection in the form of a new "microservice" as the problem space changed over time, it's probably more likely that there was an inherent poor underlying understanding of the domain and lack of planning extensibility in anticipation. Essentially, code (bodies) and compute aren't as "cheap" as they were when money was free, so front-loading intelligent design and actually thinking about your space and it's use-cases becomes more and more important.


> The industry predominately rewards writing code, not designing software.

This also stems from most of the code being written at any given moment being to solve problems we already solved before and doing or supporting mundane tasks that are completely uninteresting from the software design point of view.


> anemic objects

I have yet to come across a compelling reason why this is such a taboo. Most DDD functions I have seen also are just verbose getters and setters. Just because a domain entity can contain all the logic doesn't mean it should. For example, if I need to verify if a username exists already, then how do I go about doing that within a domain entity that "cannot" depend on the data access layer? People commonly recommend things like "domain services," which I find antithetical to DDD because now business logic is being spread into multiple areas.

I quite enjoy DDD as a philosophy, but I have the utmost disdain for "Tactical DDD" patterns. I think too many people think Domain-Driven Design == Domain-Driven Implementation. I try to build rich domains where appropriate, which is not in all projects, but I try not to get mired up in the lingo. Is "Name" type a value object or an aggregate root? I couldn't care less. I am more concerned about the bounded contexts than anything else. I will also admit that DDD can sometimes increase the complexity of an application while providing little gains. I wouldn't ever dare say it's a silver-bullet.

I will continue to use DDD going forward, but I can't help but shake this feeling that DDD is just an attempt at conveying, "See? OOP isn't so bad after all, right?" Of which, I am not sure it accomplishes that goal.


If you replace the Object-Oriented mechanism for encapsulation with some other mechanism for encapsulation then there's probably no reason for this taboo.

But in 99.999999% of real-world projects, anemic object-oriented code disregards encapsulation completely, and so business logic (the core reason why you're building the software in the first place) gets both duplicated and strewn randomly throughout the entire code project code.

Or in many cases, if the team disregards encapsulation at the type level then they're likely to also disregard encapsulation at the API/service/process level as well.


Ok, I see where you are coming from, and I agree. However, I would like to add that poorly implemented DDD can be just as awful.


With decades of exponential growth in CPU power, and memory size, and disk space, and network speed, and etc. - the penalties for shit design mostly went away, so you could usually get away with code monkeys writing crap as fast as they could bang on the keyboards.


It’s so interesting because I started doing professional engineering AFTER doing day to day data and statistical analysis in statistical systems like matlab, R and early Python.

So my view of engineering has always been based on managing two things: functional state and data workflows

After doing software engineering professionally for a decade now I can tell you that:

1. Most “scientific” engineers back to Minsky, Shannon etc… describe the world of computing in terms of state management, data transformation and computing overhead management. All of the big figures and pioneers in software cared A LOT about data and state basically that’s all computing was at the beginning and was expected to be the pattern moving forward

2. There’s absolutely no consistency in what are the foundationally important assumptions in engineering system design that are always true such that everyone does them - and the ones that do are fads at best

3. Business timelines dictate engineering priorities and structures much more than robustness, antifragility, state management etc… in the vast majority of production software

4. Professional organizations like guilds, unions, etc… are almost universally rejected by software engineers. Nobody actually takes IEEE seriously because there’s no downside if you don’t. This ensures there’s no enforcement or self-regulation in engineering practices the same way there are in eg Civil and biomedical engineering. Even then those are barely utilized.

Overall the state of software development is totally divorced from its exceptionally high minded and philosophical roots, and is effectively led by corporations that are priorizing systems that make money for people with money.

So what is “good” has very little to do with what is incentivized


“Show me your flowcharts [code] and conceal your tables [data structures], and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.”

-- Fred Brooks


I think this quote misses that there can (and arguably should) be differences between your persistence model and your actual data structures. I'd argue that keeping things 1:1 with your underlying tables is incredibly restrictive and leads to models that miss out on the expressiveness that's available in modern languages.


I think the brackets were simply suggesting that flow charts are analogous to code and tables are analogous to data structures in that quote. Not that your tables and data structures in a concrete system will be the same.


This is essentially the point of view of functional programming and category theory.

You have some data object whose structure provides constraints on how it can be transformed. And then the program logic is all about the structure-preserving transformations.

The transformations become simpler and easier to reason about, and you're basically left with a graph where the transformations are edges and the structures are nodes. And that's generally easier to reason about than an arbitrary imperative program.


>This is essentially the point of view of functional programming and category theory.

No, it isn't. This is the point of every language philosophy, you will find OOP and procedural people arguing exactly this. Correctly defining your data types is important and applicable in every language and every paradigm.

The view of functional programming is that objects shouldn't be transformed and that mutation should be avoided. That is unrelated.

The point of category theory is that different patterns of relationships are common across mathematical fields. Which is totally unrelated and has nothing to do with anything discussed here. Maybe you meant type theory? But that also has no relation.


Nope I meant category theory

> Maybe you meant type theory? But that also has no relation.

Hmm? Category theory and type theory have a lot of close ties. For example see the Nlab page on their connections https://ncatlab.org/nlab/show/relation+between+type+theory+a...


OP is saying neither of those relate to the problem. Not that they don't relate to each other.


A conclusion I reached a while ago: all the work we do in code is far more likely to be shorter lived than a single good decision that we have in data.

https://www.swyx.io/data-outlasts-code-but


The good decisions are invisible. The bad decisions just seem to live forever.


This principle also applies at the business level. I’m constantly dealing with business analysts who obsess with processes (code) but don’t first take time to understand the entities and their relationships (data). The result is that when it comes time to build something they cannot communicate with the developers about what the data model should look like. The processes get implemented and the data model is put together on-the-fly rather than being carefully designed.


I think of code vs types analogous to the function vs form argument in design. If a website needs to be shipped ASAP, I should prioritize types less. If hundreds of engineers rely on some code, I care about types.

Language also influences how important types are, regardless of function. Haskell is strict, LISP is less so. Python, being closer to LISP in syntax, but surfacing powerful C (closer to Haskell) primitives has proven valuing function over form can be empowering.

Premature modeling of a domain in verbose types (ex. struct vs any) can slow down rapid iteration in comprehending what is valuable from data or how users may actually use code. Someone might need not just one, but infinite cat pictures in their file upload, but the code _and the types_ treat this as a single value. Another example is using JSONB columns in their RDS initially and normalizing fields into columns when needed. A more flexible type system saves time in early iteration cycles.


Correct, nothing improves code quality and performance more than having the right data structures.

This is also something which I learned far too late, my programming education focused very much on algorithmic thinking. That is important, but only helpful if you have already chosen the right data structures. Many times I have had the situation that the code I was writing was confusing and only a small part of it had to do with solving the actual problem. If this ever happens to you, you should rethink your data structures and consider whether they were chosen correctly.

Also, when reading code for the first time you should be looking at the data structures before anything else.


There are some grave problems with this article. I agree with the basic premise 100% but the article over simplifies the idea to focus on data. It isn't just about data, data structures, or even relationships. It is about organization in general, and most people cannot perform at that level.

To be clear: Good programmers worry about the organization and cleanliness of their code. They worry that their code is reduced to the smallest of forms, consistent in expression, and exceptional in measure.

The limitation here is personality and not intelligence and there is a lot of data on this.

The personality metric of concern is conscientiousness, which is how a person perceives the world outside themselves. This one thing is responsible for self-discipline, concepts of organization, initiative, half of empathy, and much more. People at the extreme high end of this lean more towards things like authoritarianism, obligation, duty, healthy living, and social alignment. These people find joy in putting things into order and discerning relational structures.

People on the low end tend to be free spirits, are more likely to experiment with drug use, can't clean their rooms or pick up trash even if you put a gun to their heads. Concepts of work effort and self-reliance are almost entirely unimaginable. These people cannot organize anything and they require absurd rewards to accomplish the smallest tasks, and even still the output of their efforts is fleeting and temporary. They simply cannot see abstract relational concepts and cannot be compelled so.

Strangely, low scoring people struggle to discern value from a thing as they cannot perceive separations of vanity from functionality. Yet, they have no problem selling things in full awareness that if they cannot perceive value then neither can most other people. High scoring people don't do this and thus tend to make less effective merchandisers.

High scoring people tend to perceive low scoring people as slobs, sloths, and an anchor on social progress. Low scoring people tend to perceive high scoring people as perfectionists, prudes, and unnecessarily distracted on trivialities far outside their imagination.

The common assumption is that people who are brilliant at abstract organization and industriousness must be more intelligent. This makes sense because these people tend to be more successful in all aspects of life other than careers in entertainment. That assumption is completely wrong, though. Conscientiousness is negatively correlated to intelligence at -0.27, according to various studies.


When I make a web application, the first step in that process for me is designing the relational database model with a pencil, eraser and piece of paper. It makes the code a lot less messy when you have all the data sorted out before you get into it. I also find that it really helps me to understand what I'm building and how I need to build it. And it's a hell of a lot easier to change code than how data is being stored, so it's something I really try to get right and properly normalized the first time.

I don't even attempt to do types at this point. It's really just about how the structure is going to look.


Even in the code I always figure out the types/schema first. If you get the shape wrong, everything that follows is also wrong


Argued this for a long time yet so many devs insist on MongoDB and other similar schemaless data stores


I've found that a lot of junior developers are so scared of making a wrong database schema in SQL that they rather pick a schemaless database. Which of course if you're not careful leads to multiple conflicting ad-hoc implicit schemas in your NoSQL database. I.e. maintenance headaches.

Ideally do a meaningful MVP schema and then iterate on it. Schema migrations can be done. And it saves you so many headaches later on to have a reliable schema for your data.


This is the principle behind “How to Design Programms” [1]: Bild your data structures, then the form of your functions on those structures should correspond more or less exactly to it.

[1]: https://htdp.org/2023-8-14/Book/index.html


> When I read this quote, I actually was able to recognize countless examples in the past of this. I once worked on a project where we spent quite a while optimizing complex algorithms, only to realize that by restructuring our data, we could eliminate entire classes of problems. We replaced a 500-line function with a 50-line function and a well-designed data structure. Not only was the new code faster, but it was also much easier to understand and maintain. (Of course, then the problem also shifted “down the stack” to where the majority of toil was in restructuring existing data.)

This is really a preference, then. I encountered almost this exact sort of problem in my last project. I wanted a simpler database design and more complex querying/code, they wanted a significantly more complex database design that was harder to understand (for everyone but the guy who spent all of one weekend designing it) but simpler querying/code (that was also more plentiful as a result). The question really is, where do you prefer your complexity to go? Do you want to lean on the database, or your code?

Simple example, you have a portfolio of stock that constantly changes in composition and value over time. Do you: 1) only store the current model of the portfolio in a "portfolios" table and the current prices of stocks in a "stock_prices" table and use a separate history table for both (with stored procedure triggers to automatically copy all changes to it) to store all previous versions that can then be queried separately if needed, OR 2) store each change in both quantity and price across multiple tables, no separation of what is "current" vs. what is "historical" other than the relationships that are (properly, hypothetically) set up via an "intent_versions" table at the top level, requiring a bunch of joins to actually determine the state of the portfolio both now and at any point in the past?

I opted for the former because I have no fear of complex queries, the center of thought-mass of the team leaned towards the latter. WWYD?


A major caveat for folks in this line of thinking, though, is to avoid falling into the "one true schema" trap. Data can and will be duplicated in your system. A large part of the "consistency" battle people should be having is how long before a lot of that duplication is as expected. Not making sure it is never inconsistent.

That is, it is easy to see many junior efforts stall out during schema design thinking that you can solve all issues with a fancy method of storing the data. It isn't the schema that is important about your data, so much, but where different updates to it are known first and what they will need to go with it.


Such advice is dangerous in the assumption that there is only one type of problem in programming. One type of domain - applications related to the computer and data sciences and computing infrastructure.

While GIT might be particularly about data structures at it’s core, might I suggest you don't try to model into code your next complex payroll, insurance quotation, supply-chain or billing system as a composable set of lists, stacks, queues and trees, modified by code that grows over time to increasingly looking like a big ball of mud.


There are two types of applications: one where you know your data model from the beginning and one where you don't. Static types work exceptionally well when you're modeling something you understand pretty well; especially to the point where it is not expected to change significantly. On the other hand, a lot of programs find their data model while being made. This is fine too, what is expected from a program can change, sometimes a lot. I've built both types of applications in both types static and dynamically typed languages.

What your team knows matters more than either of these.


For those interested in escaping the Hacker Prison where ‘Weeks of coding can save hours of thinking’ I strongly recommend William Kent’s book “Data and Reality’.


Speaking of data structures, I am curious what you guys think of the Entity Attribute Value model.

I worked on an e-commerce "platform" that used EAV and I always struggled to write queries to find anything I needed.

https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...


It’s basically the standard. How else do you add arbitrary key/values to entities? You either normalize into EAV, Map column, or a pair of Array columns.


That's why modern languages that encourage type-driven development like Rust and Gleam are a godsend. Just look at the things of things you can encode in the type system. It can prevent issues like mixing up numbers in different units:

https://blog.hayleigh.dev/phantom-types-in-gleam


TIL that it was named phantom types.

I use them in C via struct/union. It is a little more verbose to use, but offers type safety when needed without any runtime penalty.

That, and a good naming scheme : always encode the unit in the variable name. Always.

timeout_ms is much better than timeout, unless timeout is a human readable string. Same for any config file parameter.


Would like to be clarified to: information + relationship among it components, including how they get transformed by computation.


RDBMS bored me to tears when I first studied it but it’s an invaluable way to view at data and structures.


So basically [engineering] design is more important than implementation details.

I would say the "engineering" part of the design is also optional, as product design is also another lever of higher influence than code optimization.


I'd argue that when your entire approach is experimental, there is no need to fret over structures. If you are convinced that your approach can work, that's the time to design well.


Can a "class", eg in Python or Java, be considered an example of the "data structure" Linus and others are talking about here?

Or are they only talking about tables in databases and such?


Yes, classes are data structures. A data structure is the construct by which you organize, process, retrieve, store, etc data. A traditional "struct" or "table" is how people think of a data structure, as a sort of dumb layout of 1s and 0s in a file or memory somewhere. But how that data is separated, accessed, what constitutes "valid" data, how it is presented to you, what you can do with it, etc is all part of what constitutes the "structure" of the data, because the "structure" enforces what that data looks like, how it behaves, etc. A class fits that purpose aptly.

If water is data, then a straw, a vase, a kettle, a pool, a water pump, even a hydroelectric dam, are all data structures. They enforce the form of the water, what goes in, how you access it, how it comes out, etc. You might say "but doesn't a function do the same thing?", and, well.... yes. A data structure is kind of like a tightly-bound collection of functions that take in data and put out data in particular ways.

A data structure in memory or on disk, without program logic attached to it to enforce the structure, is a "data format". The hope is that a program will do the right thing with the data according to its intended structure.


The class interface (interface contract) falls under the general point being made, but not the class implementation (implementation details).


IMO, “wrong question”, but thanks for asking.

IMHO - “class” definitions are really just a data structure (abstractly, a possibly-nested hash) with associated functions, and that’s it.

This is extremely apparent in Perl, where that’s literally what a “class” is - a hash data structure “blessed” with a class name. JS is almost identical (a class is a hash where some of the values are functions), and “under the hood” both Ruby and Python are similar (although in very different ways - consider that the first arg in a member function in Python is the instance).

Database schema is (usually/mostly) just that structure without those functions; then you maybe use an ORM to provide them again.

So, IMO, on a theory and design level, both the database table and the data part of a class definition are “data structures” (note that you should include in considerations the relationships between data structures)


To take this a step further, IMO a big issue with how OOP is taught & practiced is that people approach it from the wrong direction.

As classes provide both data and functionality, it's better to first model your data and then think about the functionality to provide on those data. But people tend to think about relationships & actions (i.e. functionality) first and model their data to that.

The two approaches will tend to produce very different data structures. I'd agree with the sentiment from the OP that being data-first is ideal.


Yes, but I'd read it with less emphasis on the behaviours (which is what makes a class.) So, more like a struct.


"you should actively seek ways to shift complexity from code to data."

but also somehow we're supposed to write everything to read write flat text...

Thanks UNIX!


"flat text" meaning ASCII/UTF encoded streams of bits/bytes... It's all 0s and 1s under the hood, and encodings make that much nicer to deal with without excessive abstraction. Give me UNIX everything-is-a-file / byte-stream paradigm over PowerShell's complicated object ontology.


Linus Torvalds’ git is the perfect case in point: wonderful data structure wrapped with adequate tooling.


The best code is no code. Which is why the best programming language (Lisp) expresses code as data.


This is how I think about most of my coding, so it must be true.


I dont understand what it means to “move complexity into data”


Complicated algorithm over an array vs simple algorithm over trees and tables and sets and records.


> The actionable tip here is to start with the data. Try to reduce code complexity through stricter types on your interfaces or databases. Spend extra time thinking through the data structures ahead of time.

This is why I love TS over JS. At first it feels like more work up front, more hurdles to jump through. But over time it changed how I approached code: define the data (& their types) first, then write the logic. Type Driven Development!

Coming into TS from JS, it might feel like an unnecessary burden. But years into the codebase, it's so nice to have clear structures being passed around, instead of mystery objects mutated with random props through long processing chains.

Once the mindset changes, to seeing data definition as a new first step, the pains of getting-started friction are replaced by the joys of easy future additions and refactors.


(tangential) In theory I like TS. But in practice, unless I'm the one writing it and can KISS, it can quickly turn into an unmaintainable nightmare that nobody understands. TS astronauts, left unchecked, can make extremely complex types using generics, conditionals, and all the esoteric features of TS, resulting in extremely obtuse code.

For example, I doubt anyone could explain this "type" without studying it for several hours:

https://github.com/openapi-ts/openapi-typescript/blob/main/p...

In this case, the "type" is really an entire program.


I must be a part of the problem because reading that type isn't too difficult.

I also think types like this aren't innately problematic when they live in libraries. They should be highly focused and constrained, and they should also be tried, tested, and verified to not get in the way, but they can absolutely be a huge benefit to what we do.

Maybe it's mostly problematic when type astronauts litter the application layer with types which are awful abstractions of business logic, because types are far less intuitive as programs than regular JavaScript or common data structures can be. Just type those in the dumbest way possible rather than wrap the definition of them up into monolithic, unnavigable, nested types and interfaces.

If a library allows me to define valid states related to events which drive a data store or something narrow like this, that's awesome (assuming it's intuitive and out of the way). I like this kind of type-fu. If it's trying to force coworkers to adhere to business logic in unintuitive ways, in a domain that's not unlikely to shift under our feet, that's a huge problem.


> I must be a part of the problem because reading that type isn't too difficult. I also think types like this aren't innately problematic when they live in libraries.

Despite the star count on the repo (which, if you aren't paying attention to the 0.X versioning, might lead you to believe it's a well tested "library" type), that particular type I linked to has a ton of bugs with it that are currently documented in at least half a dozen open issues, some of which are super esoteric to solve:

https://github.com/openapi-ts/openapi-typescript/issues/1778...

In this case ^ the problem was due to "behavioral differences based on the strictNullChecks ... compiler option. When that option is false (the default), undefined is considered to be a subtype of every other type except never"

Maybe I'm old school, but as long as we are using metaprogramming to solve a problem, I'd rather codegen a bunch of dumb types vs. implement a single ultra complex type that achieves the same thing. Dumb types are easy to debug and you won't run into strange language corner cases like when `undefined extends` has different behavior when strict mode is on or off.

I guess my point is, maybe you find it easy to read, but apparently it's a nightmare to maintain/test otherwise there wouldn't be so many bugs with it:

- https://github.com/openapi-ts/openapi-typescript/issues/1769

- https://github.com/openapi-ts/openapi-typescript/issues/1525

I'm pretty sure I could fairly easily implement `openapi-fetch` by code generating dumb types and it would avoid all of these bugs, and maybe I should as a reference implementation just for comparison purposes in the future for discussions like this.


I'm not trying to say all types in libraries are okay. There are tons of awful ones there, too. One of my favourite libraries actually has some of the worst typing issues I've encountered, and like you're saying, code generation is the perfect solution for the problems they're facing. They actually had a code generator for a previous version of the library, but significant API changes in the latest version caused the code generator to break.

It's imperative that the crazy astro types actually are good; otherwise they really are just going to get in the way. I think my point about libraries though is that if they're hyper-focused on solving a single problem, there's a better chance that the typing will stay relevant, stable, and improve over time. In an application this seems to be less true, leading to all kinds of clever and/or verbose type definitions trying to solve this and one million other problems at once. It's brutal.

After looking closer at that type you linked to, there's this one embedded type called `MaybeOptionalInit`, haha. MaybeOptional. I guess it's optional, sure, and maybe it won't be provided at all (hence the `never` condition), but... Why is that MaybeOptional and not just Optional? That is a bit weird. I see what's happening but I'm not crazy about how it's implemented.


> I'm pretty sure I could fairly easily implement `openapi-fetch` by code generating dumb types and it would avoid all of these bugs, and maybe I should as a reference implementation just for comparison purposes in the future for discussions like this.

FFR: I ended up doing just that: https://github.com/RPGillespie6/typed-fetch


> TS astronauts, left unchecked, can make extremely complex types using generics, conditionals, and all the esoteric features of TS, resulting in extremely obtuse code.

Disclaimer: I guess I'm a fellow TS astronaut.

Most of the time TS astronauts will stick to your methodology of keeping things simple. Everyone likes simple, I think.

However, the type-austronautics is necessary once you need to type some JS with some really strange calling conventions/contracts (think complex config objects inputs, or heterogenous outputs that end up with _not quite the same_ call signatures, using large objects trees for control flow, etc; basically really shit code that arises from JS being a very dynamic language) without modifying the interfaces themselves. Sure you can be a bit lenient, but that makes the code unsound and crates a false sense of security until the inevitable runtime bug happens.

The correct solution would be to refactor the code, but that's not always possible. Especially if your manager was the author of said magnum anus—apologies, I meant magnum opus—and sabotages any attempts at refactoring.

I guess the moral hiding in this anecdote is that I should looking for a new job.


I will agree that some TS libraries have insanely complicated types, and compared to other programming languages I have used (e.g. Clojure), it takes a longer time to understand library code.

But the example provided here doesn't seem too bad. Here is my attempt after skimming it twice.

  Paths extends Record<string, Record<HttpMethod, {}>>
I assume the

  Record<HttpMethod, {}>
is a hacky (but valid) way to have a map where the keys must be HttpMethod and the values contain arbitrary map-like data. e.g. maybe it describes path parameters or some other specific data for a route.

Moving on.

  Method extends HttpMethod
  Media extends MediaType
These seem self-explanatory. Moving on.

  <Path extends PathsWithMethod<Paths, Method>, Init extends MaybeOptionalInit<Paths[Path], Method>>(
    url: Path,
    ...init: InitParam<Init>
  ) => Promise<FetchResponse<Paths[Path][Method], Init, Media>>
Looks like we have two generic parameters: Path should be a type satisfying PathsWithMethod<Paths, Method>. That's probably just requiring a choice of path and associated HTTP method. As for Init, that looks like it's to extract certain route-specific data, probably for passing options or some payload to fetch.

Lastly,

  Promise<FetchResponse<Paths[Path][Method], Init, Media>>
Taken everything I have just guessed, this represents an async HTTP response after fetching a known valid path -- with a known valid method for that path -- together with Init parameters passed to fetch and possibly Media uploaded as multi-part data.

I probably got some details wrong, but this is what I surmised in about 15 seconds of reading the type definition.


Not defending that code - and I agree with you that wild TS code gets nightmarish (I usually call it a “type explosion”) but

Waaaay back when in my C++ days, starting to get into template metaprogramming, the “aha!” moment that made it all much easier was that the type definition could be thought of as a function, with types as input parameters and types as output parameters

Recentlyish, this same perspective really helped with some TS typing problems I ran into (around something like middleware wrapping Axios calls).

It’s definitely a “sharp knife” if you overuse it, you screw yourself, but when you use it carefully and in the right places it’s a super power.


I'd be interested in reading that Axios-wrapper if it's openly available.


It isn’t :/ I’d be down to recreate it if you can point me at an open-source project to do it in! :)

Basically - we had some custom framework-ish code to do things like general error handling, reference/relationship/ORM stuff, and turning things into a React hook.

I rewrote what we had to use function passing, so that you could define your API endpoint as the simple Axios call (making it much easier to pass in options, like caching config, on a per-endpoint basis).

So you’d define your resource nice and simple, then under the hood it’d wrap in the middleware, and you’d get back a function to make the request (or a React hooks doohickey, if you wanted).

But typescript doesn’t really play nice with function currying, so it took some doing to wrap my head around enough of the type system to allow the template type to itself be a template-typed function. That nut cracked when I remembered that experience with C++ typing; in the end it actually came out pretty clean, although I definitely got Clever(TM) in some of the guts.


Back when I used to work in plain js I saw very complicated structures, but there is no type annotation. It’s worse. The horrible part is where the type changes in different cases, so you have to trace everything to know if what you are changing is safe.


You don't have to understand it necessarily to use it. I'm sure there's plenty of library level code that people don't understand. But that's the point. Typescript will tell you if you screw up. A lot of this has to do with generics, and if you're using it and typescript can infer the generic types to use, it'll be a lot simpler and you'll know exactly what's breaking.

And for libraries like this, you'll unfortunately be limited to Typescript ninjas to maintain, but there's no alternative really. I guess use javascript without types, which doesn't remove the dependencies or complexity just hides it away, and who knows what happens at run time


> And for libraries like this, you'll unfortunately be limited to Typescript ninjas to maintain, but there's no alternative really.

The alternative (in this case , at least) is to generate a dumb fetch interface from the openapi spec. You have to generate the openapi spec types anyway, just take it a step further and generate a dumb fetch interface as well and then you don't need complex generics, you just call dumb typed functions in the generated fetch interface.


Oh this is so funny. That exact type was my introduction to typescript! I came over from Python a few months ago for a solo web project, and I struggled with that type mightily!

In the end it took me a few tries to land on something idiomatic and I actually just ended up using inferred types (which I think would be the recommended way to use it?) after several iterations where I tried to manually wrap/type the client class that type supports. Before I found a functional pattern for inferring the type correctly, I was really questioning the wisdom of using typescript when probably several whole days were spent just trying to understand how to properly type this. But in doing so I learned essentially everything I currently know about TS (which admittedly is still very limited) so I don’t think it was wasted time.


Yes, I find myself in type hell with some regularity. TBH it happens with my own codebase too when libraries I want to use are authored by these type astronauts.


I worked in fully typed Java for 8 years before jumping to fully untyped Ruby.

Having leaned hard on the Java type system for many years, I was terrified of the type anarchy.

But it turned out to not be a problem at al. For me at least, being ambitious with writing tests made not miss types at all. In practice, a good test suite catches pretty much any problems typing handles, and then some!

This is only my experience. I'm not saying everyone should or could work that way, or that I'm better than you etc.


My own experience is that working in typed languages, then going to untyped ones... your sense of types and the problems they address is already fairly high. Your 'untyped' code likely still avoids a lot of the problems you might otherwise encounter, just because of whatever habits you may have picked up in the typed system. Going the other way - untyped to typed - tends to present a lot of ... tough moments along the way, because you're having to put a lot more thought about things that you didn't have to before.


Nitpick: Ruby isn't untyped, it's dynamically typed.

Forth and assembly are untyped, as these languages truly lack distinctions between different kinds of data.


>For example, I doubt anyone could explain this "type" without studying it for several hours

From skimming it for about a minute it seems like it's just a strongly typed way to generate HTTP requests? It really doesn't look too complicated


Well it is. You won't discover how deep the rabbit hole goes with that type until you start trying to debug it. For example, try fixing this issue which is the result of a problem with that type:

https://github.com/openapi-ts/openapi-typescript/issues/1769


Interesting, I wonder how much of that is due to poor implementation by the authors vs. issues with TS vs. issues inherent to building a typed language on top of the mess that is JavaScript?

Most languages with strong type systems (I'm thinking at least as strong as Java or C#, maybe stronger) wouldn't have those same sort of footguns. In C# I've run into other kinds of fun nightmares with types, like trying to use interfaces with Entity Framework Core. But I think that's more EF Core's fault than C#'s.


For your linked example…

It has documentation

> This type helper makes the 2nd function param required if params/requestBody are required; otherwise, optional

The type here is the implementation not the documentation. I guess we are so used to types being the documentation, which they are for value/function level programming, but not in type level programming.

I think maybe you are disappointed at the tooling? I do think the docs here should be attached to the type so that it appears in the IDE as well.


Honestly that type you linked looks like a dream to use (I've never OpenAPI). I love APIs that enforce any necessary restrictions on the type level.


It would be... if it wasn't so bug ridden at the moment


side note: don’t have much experience with TS, but the overuse of extend is also common in “enterprise” Java/C# apps


Eh, not anymore. Arbitrary inheritance chains are frowned upon in C# and people get mad when you do so. You also occasionally run into everything sealed by default because of overzealous analyzer settings or belief that not doing so is a performance trap (it never is). Enterprise does like (ab)using interfaces a lot however.


Type-driven development has been a big win for me as well, specifically when writing web front ends. Whether its client side rendering (sometimes a necessary evil) or on the server with a tool like Astro, I try really hard to start by defining types and UI components totally separately.

I'll actually build out the full data flow and UI components in complete isolation, leaving the glue code for the final step. Its kind of a weird pattern from what I've seen, I have gotten some interesting code reviews over the years, but it really is nice focusing on one concern at a time. At the end its also fun watching a bit of glue code wire up the entire app.


I think this is definitely the best way. It feels like you're violating DRY, but you're not.


Agree overall (yay data structures, and that types prompt thinking about them) but I don’t think you need types to make good data structures, and, just because you have types doesn’t mean you end up with good data structures.

But yes, definitely - working in a typed language encourages that mindset, and it’s the application of that mindset that yields the benefits (imo).


To a decent first approximation, and especially given TypeScript's erasure and generally very opt-in design approach, types are just tests the compiler doesn't allow you to not run.

That's not a good way to think about them forever. But it might be a good way to start thinking about them, for those as yet unfamiliar or who've only had bad experiences.

(I've had bad early experiences with a lot of good tools, too, when learning to use them fluently required broadening my perspective and unlearning some leaky prior intuitions. TypeScript was one such tool. I don't say that's the only reason someone would bounce, but if that's the reason you-the-reader did so, you should consider giving it a more leisurely and open-minded try. You may find it rewards your interest more generously than you expected it might.)


When you've really got your data structures and the rules for manipulating them pinned down, and you've built a good interface on top of it, the result is usually something that's so simple and easy to understand that it kind of doesn't matter anymore whether you're working in a static or dynamically typed language.

IOW, I think that the value in static typing (speaking only about this specific issue!) isn't that it makes you do things well; it's that it puts a limit on how poorly you can do them. But I also sometimes worry that it puts a limit on how well people can do, too. I've met way too many people who tacitly believe that all statically typed domain modeling is automatically good domain modeling.


Strict types are a great way to paint yourself into a corner. Good design should only impose strict types within a single module, with very loose coupling outside the module (meaning loose types)

Having a well defined data model is important, but you often can't really know what that data model should be until you've banged on a prototype. So the faster (in the long run), "better" way is to first prototype with very loose types and find what works, and then lock it down, within the scope of the above paragraph


> Strict types are a great way to paint yourself into a corner.

I've never really understood this stance. It's all code. It's not like you can't change it later.

> So the faster (in the long run), "better" way is to first prototype with very loose types and find what works, and then lock it down, within the scope of the above paragraph

I think this depends on the programmer, and what they're experienced with, how they like to think, etc. For example, as counterintuitive as it might seem, I find prototyping in Rust to be much quicker than in Python.


> I've never really understood this stance. It's all code. It's not like you can't change it later.

Actually, you often can't :) Ask Microsoft how easy it is for them to change some code once it's been shipped.

The "new thinking" is that you should teach your users to upgrade constantly, so you can introduce breaking changes and ditch old code, sacrificing backwards compatibility. But this often makes the user's life worse, and anyone/anything else integrating with your component. In the case of a platform it makes life hell. For a library, it often means somebody forks or sticks with the old release. For apps it means many features the users depend on may stop working or work differently. It basically causes problems for everyone except the developer, whose life is now easier because they can "just change the code".

In many cases you literally can't go this route, due to technical issues, downstream/upstream requirements, contractual obligations, or because your customers will revolt. This affects almost all codebases. As they grow and are used more often, it makes changes more problematic.


> Actually, you often can't :) Ask Microsoft how easy it is for them to change some code once it's been shipped.

My understanding is that the OP was talking about prototyping. Once code is in a public interface in the wild, it's hard to change either way. I don't see how dynamic typing will save you there. In fact, stronger typing can at least help you restrict the ways in which an interface can be called.


> Once code is in a public interface in the wild, it's hard to change either way.

Yes! Definitely for the final version (when the prototype becomes production, which is the moment a customer first uses it) everything should be locked down.

> I don't see how dynamic typing will save you there. In fact, stronger typing can at least help you restrict the ways in which an interface can be called.

Strong typing isn't inherently bad here, but it's often associated with strong coupling between components. Often people strongly type because they're making assumptions (or direct knowledge) about some other component. That's death. You want high cohesion and loose coupling, and one way to do that is just not depend on strong types at the interface/boundary.

To recap:

  1. When prototyping, loose types everywhere, to help me make shitty code faster to see it work
  2. When production, loose types at the component boundaries, and strict types within components
https://en.wikipedia.org/wiki/Cohesion_(computer_science) https://en.wikipedia.org/wiki/Loose_coupling


> Often people strongly type because they're making assumptions (or direct knowledge) about some other component.

I'm not really sure why strong typing would have that effect. It seems like an orthogonal concern to me.

In fact, strong static types can potentially help make it easier to see where things are loosely or strongly coupled. Often with dynamic typing it's difficult to tell where implicit assumptions are inadvertently causing strong coupling.


>> Strict types are a great way to paint yourself into a corner.

> I've never really understood this stance. It's all code. It's not like you can't change it later.

Try maintaining a poorly designed relational database. For example, I am dealing with a legacy database where someone tacked on an innocent "boolean" column to classify two different types of records. Then years later they decided that it wasn't actually boolean and now that column is 9-valued. And that's how you get nonsense like "is_red=6 means is_green". Good luck tearing the entire system apart to refactor that (foreseeable) typing/modeling error. The economical path is usually to add to the existing mess, "is_red=10 means is_pink".


And if this was an untyped blob of JSON it would somehow be better?

I'd argue that dynamic typing makes it easier to paint yourself into a corner.


The stuff you are complaining about is caused exactly by lack of type strictness.

Nobody fixes it because nobody has any idea on what code depends on that column. With strict types you just ask the computer what code depends on it, and get a list a few minutes later.


> So the faster (in the long run), "better" way is to first prototype with very loose types and find what works, and then lock it down, within the scope of the above paragraph

Disagree. You can still prototype and refactor with strict types. I don't find working with loose types to be faster at all. Once a program reaches non-trivial complexity loose types make iteration development significantly more difficult and error prone.


Why loosen the types on the API, when the type system is fully capable of encoding higher-level restrictions? That then locks you into that API as a contract. If you instead lock the API down to the strict set, you are free to expand it over time.


Iv started with learning python and java during highschool but python really stuck.

Now as I work on my degree iv started to try to learn C for reverse engineering and low level development. While I do understand some things its a big leap in terms of skill in python. I love how flexible it is and how fast it is. Recently I started a new challenge based on shodfycast's most recent video. ( https://www.youtube.com/watch?v=M8C8dHQE2Ro ) and currently am just focusing on single thread performance, and using array structures. Then I realized my random number isn't true random and debating if my prg is sufficient enough. I also debated generating all the numbers at once into an array, throwing it into CPU cache, then doing the logic for rolls using the faster memory so I'm not waiting for numbers to generate. Single core laptop time is like 56 minutes using wsl.

I'm tempted to try this on my dual socket system using tinycore Linux, so I can shave off some time from useless overhead and use some debug tools to find the slow spots.

Unsure how much time I should sink into this though.


>Unsure how much time I should sink into this though.

You should stop immediately once you start learning less and are fixating on hyper specific problems.

>I also debated generating all the numbers at once into an array, throwing it into CPU cache

You don't control the cache. I recommend that you treat the CPU as a black box, strictly until you can no longer so so. If you are learning C and aren't even writing multi threaded code, you should not fixate on the specifics of how the CPU handles memory.

Please pretend that is true. Manipulating cache is difficult. And you should worry about other things. This will become important when you are doing multi threading.


The funny thing is that...this is why I like dynamic languages. Because I do all of this with the database and when I'm using something like Rails, ActiveRecord handles all of the types automatically based on what I've already defined in the database.

For web apps at least, about 90% of the data coming into and out of the application is just being stored or retrieved from the database. Converted from string to datatype in the DB and then converted back to string as it's being returned back. Enforcing types strictly in this layer that is largely pass-through can create so much additional boilerplate with minimum benefit since it's already handled where it matters.

For a NoSQL DB, sure, I get it. You NEED the application to define your types because your database isn't.

And then there are people who feel very strongly about having it everywhere, all the time and can't imagine working without it.

I like that we work in a field where we can have options.


I totally agree, though this process isn't without its faults. One thing I've had to learn to be mindfull of is "type-crastination" -- if I'm not careful, I can really overdo it and spend way too much time and energy on defining types.


You can think about data structures in a fully dynamic language like JS or python. But you have to write software in a way, which utilizes these types and acknowledges that they exist.

Thinking about what your data structures are is important in any language. Strict typing helps you in pinning them down and communicating them, but the approach is not exclusive to strict typing.

Once your software is about passing anything objects around, you have already lost and proper thinking about data structures becomes impossible. I agree that stricter typing helps you to avoid that trap.


That’s why I really like to write some c++ in my free time after a week of javascript at work


From js to c++?! Come on... For sure there is at least one thing js and c++ have in common - crappy standard library


I feel similarly about FP. It can be more work up front, but what a gift it is to your future self when f(x) == f(x).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: