Hacker News new | past | comments | ask | show | jobs | submit login

One of the goals of any code base of significant size should be to reduce the cognitive load (not lines of code per function).

Assume the developer working here just had a 45 mins sync, has a meaningless repeat meeting about vapourware in 10 mins and a 2 hours town hall meeting after lunch... and still have to deliver that mess of a requirement you made him promise to deliver before any of these interruptions were even on his calendar!

- Always aim for declarative programming (abstract out the how it does it),

- limit the depth of the function calls (rabbit hole developers...),

- separate business logic from framework internals and

- choose composition over inheritance.

- Also avoid traits and mixins (run from the dark magic)

- don't over document the language or framework, only the eyebrow raising lines, the performance sensitive stuff and the context surrounding the file if it isn't obvious.

- name stuff to be easily grepable

Easy rules to go by, (there are probably more), they can make or break your ability to work, so that you can get interrupted 12 times an hour and still commit work.

I don't find these in books, just decades of sweating and pulling my hair "why does it have to be so hard!?" I have met plenty of senior devs who naturally do the same thing now.

The code size fallacy is a prime example of the wrong way to look at it. Plenty of extremely large code base in C++ are far more manageable than small JavaScript apps.

Mixing boilerplate framework junk with your intellectual property algorithms "what makes your software yours" is a sure way to hinder productivity in the long term.

You write code 3-4 times. You read it 365 times a year.

One last thing I recommend if you deal with a lot of interruptions and maybe multiple products, various code bases... keep a

    // @next reintegrate with X
The @next comment 'marker' is a great way to mark exactly which line of which file you were at before you left this project, for a meeting, for lunch, for the day, etc. And it allows you jump back into context by searching for @next and go. Also since it's visual and located, your brain has a much better time remembering the context, since we're good with places and visual landmarks.

It's far more efficient than roughly remembering what I was doing, looking at the last commit, scrolling endlessly through open files. Don't commit @next though :)




Nice list. I would add, near the top, "don't be clever unless it makes the code significantly easier to read for others".

For example, if your code involves a lot of linear algebra, then operator overloading the sensible operators is probably a good thing. But don't use operator overloading just to save some keystrokes.

Over-abstraction is another such case. Try to avoid adding abstraction layers until you need them.

I'd also add "make the code as self-documenting as possible". That means being verbose at the right times, such as writing meaningful identifier names.

And of course "avoid global variables". I've seen people use singletons with a read/write member, which is as much a global variable as any other.


Yes, "avoid global variables".

I forgot to write that one. In fact, I'd say, try to write everything as stateless as possible. That means no shared variables, functional style programming whenever possible (and logical, not for the sake of it).

Debugging a code base with global variables vs. passing what it needs is far easier to reason about when reading the code, but also easier to write tests for. Very good point.

State is evil, but of course you need some otherwise your software doesn't do anything real.


> no shared variables, functional style programming whenever possible

One problem is that saying to write in a "functional style" is not a reliable way to communicate intent today. You must qualify it with an explanation that includes/emphasizes the "no shared variables" part. Otherwise it's ambiguous, because it's not uncommon to come across programmers who make heavy (over)use of closures and refer to it as functional. It's the "literally" vs "figuratively" of the programming world; people say one thing that turns out to be the exact opposite of what it means. This should be called PF ("pseudo-functional") to contrast it with FP (what you're talking about).


We love to hate state, but isn't caching data the key to performance at scale?


> We love to hate state, but isn't caching data the key to performance at scale?

It seems you're mixing things up. "State" in this context doesn't mean caching results. "State" in this case means that your code has internal state that you can't control, thus your components behave unpredictably depending on factors you can't control.

Picking up your caching example, what would you say if your HTTP cache either returned your main page or some random noise in an apparently random way, even though you're always requesting the exact same URL?


You may be correct vis-a-vis OP, but I'll stick to my guns as far as 'state' being any piece of information that has to managed.

If I'm not writing pure functions, then there is some piece of information whose lifecycle suddenly requires care and feeding, especially if there are environmental factors making that state more 'interesting' than the code in view.


The advice is that state is hard to deal with, so you should try to isolate it from the rest of your program. You can even organize stateful components like a cache to isolate the stateful components.

- Make your stateful components "dumb". The cache should just have dumb put,get,delete methods.

- Make your "smart" components stateless. Your component deciding which items to cache or remove or invalidate should be passed all the state they need to make their decisions as immutable parameters. This will make your lifecycle management code much easier to test or log.

- Keep the part gluing the dumb state and the smart functions small

In the end you still have a stateful component, but as a whole it should be easier to work with. For example, it's much easier to test time-based lifecycle decisions if you pass in the current time than if the component grabs the current time from your system clock.


Programming maxims inevitably get compromised by the real world. We just try to do our best.


Using a cache (short of the language tools doing automatic memoization, or memoization hints on a function or procedure maybe) before making the code correct but slow is one of the greatest hallmarks of premature optimization. Remember Knuth said "less than 4 percent of a program typically accounts for more than half of its running time".

Write some simple code that does the thing. Then debug it. Then profile it. Then swap out for some more efficient algorithms if necessary in the hotspots. Then look at microoptimizations. In any case, don't spend more time optimizing that it will save in runtime across the lifecycle of the code.


Typically caching is not supposed to affect correctness, 'only' performance behavior.


hey seriously, i've been grappling with this: what better candidate to be a global variable than a shared cache that otherwise has to be passed through dozens of function calls (which imo adds cognitive load at many places through the stack).


I don't think passing state through functions calls add cognitive load: it just make it visible. Global state has the same cognitive load, except hidden.

If a state is passed through dozens of function calls, this may indicate that it is shared by too many parts of the code, or that it is managed at the wrong layer of the architecture. When using global state, these potential problems are difficult to spot.


> (...) which imo adds cognitive load (...)

Choosing to ignore input data does not reduce cognitive load. You are not supposed to ignore the core function of your work, and call it a good practice.


It's also famously one of the hard problems.


> For example, if your code involves a lot of linear algebra, then operator overloading the sensible operators is probably a good thing.

Operator overloading can be totally useless in linear algebra applications as the algorithms for basic linear algebra operations (read BLAS) aggregate multiple algebraic operations (axpy, gemv,gemm).


True. I was thinking mostly about things like 3d renderers and such, where you do a lot of simple operations on vectors and matrices.

I've written renderers with both operator overloading and without, and the readability of the former is, to me, vastly better.


grug write clever code once, grug need to be clever every day forever.

grug write readable code instead. readable code maybe need clever now, but no need clever in future


> Always aim for declarative programming (abstract out the how it does it)

No, no, no! Do not do this because then I have to go read all of the abstractions to actually figure out what the hell my computer is actually doing.

It seems nice in theory, but I’ve spent way too many hours stepping through over-abstracted declarative code with my debugger, getting frustrated at the opacity of it all.


As I see it, being declarative and limiting depth of function calls go hand-in-hand. Also very, very closely related is locality.

It's really hard to give a good example (because it's not an easy problem), but I really want to see a high-level procedure that describes (declares) every important thing that is going on (from a business standpoint). Then ideally, I only have to dive into each function once to get enough context on the main procedure. Any further dives are library/util-type calls (again, I'm talking super ideally here).

And even though I used the term "procedure" twice there, I'm talking about either FP or OO styles.

And of course (oh boy) the context can change everything.


I agree so much with this. I prefer that all lines in a function are at the same level of abstraction w.r.t your problem domain. One line can't be about customer accounts and the very next one about the file system!

I also feel that Local Reasoning is a very powerful capability in code. In fact, Martin Odersky the creator of Scala pointed to an article that headlined Local Reasoning as one of the clearest explanation of Why FP matters (https://twitter.com/odersky/status/1271182333467602945?lang=...).

Any abstraction (here I include complex types) is a trade-off against local reasoning. Anything that makes the reader of the code drift far away from what they're currently reading on the screen is a hit against local reasoning. So they've to be introduced judiciously.


I really don't know about this, I'm writing audio & media effects in a fairly declarative style with https://github.com/celtera/avendish and I'm so much more productive that it's not even funny - I can rewrite entire effects from scratch in the time that it used to take me to find a bug somewhere


> I’ve spent way too many hours stepping through over-abstracted declarative code, getting frustrated at the opacity of it all.

Then it’s not declarative


True from the reader's perspective. The writer begged to differ!


Arguably, the writer confused "abstract" for "generic".

> The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise. (Djikstra)


That sounds akin to the OO tendency to refactor everything to the point of one executable line of code per class.

Maybe that is the juice that makes Java the 800lb enterprise gorilla that Java is, but that juice seems hardly worth the squeeze of stack traces that go on for multiple screens.


If your abstraction leaks it’s a nightmare, but if not it’s a dream. When was the last time you had to debug a problem in the Django model code itself?


All abstractions eventually leak the higher you go. It's just about whether you want to ignore it.

These leaks are usually performance related.


As someone who works on performance sensitive code with great regularity. I would note that moore's law has beaten most attempts to be smart on performance for the last ~20+ years. If code is written correctly, with correct algorithms, in a reasonably performant language - no amount of micro-optimization will beat moore's law.


Sure. How does this connect with what I mentioned? Not being snarky, but I think I'm missing something because this remark seems to be completely random.


An abstraction that happens to harm performance is rarely a long-term problem. Eventually the hardware catches up. More software has been retired due to crappy abstractions than poor performance.


So how does this contradict what I said? The leak is still there. As I said... people just choose to ignore it.

IO has not been following moores law. So you can see high level abstractions around IO leak like crazy.

And also, there is no "retiring" going on. Retiring something because of a bad abstraction almost never happens, whether it's because of performance or not.

The bad abstractions stay forever. You just don't notice it. You just master the abstraction and keep on using it.

Learning about the leaks and learning how to manipulate the high level abstractions to get around the leaks becomes an intrinsic part of the culture. It becomes so intrinsic that most people can't even wrap their mind around the flaw in the abstraction. They get defensive if you talk about it. It's invisible, but it's still there.

SQL is the most leaky abstraction I have ever seen (in terms of performance). Learning to deal with it and all it's leaks are the norm now. People don't even refer to a query planner as a leak. The technical debt is so entrenched within our culture that it's actually considered to be a great abstraction. And who knows? Maybe mastering SQL with leaks and all is not a bad way to go. But mastering SQL involves dealing with a declarative language and manipulating it in ways that leak into the underlying implementation. No amount of optimization has fixed this.

SQL isn't the only thing with this issue. For SQL the crappy abstraction is just performance related. But you see other abstractions continued to be used because simply it's too hard to retire them. Javascript, CSS, HTML, C++. Python2 (finally this has been retired). Basically it's everywhere.

Abstractions are rarely retired. Abstractions with leaky performance problems are also never retired.


sorry IO is the wrong term. Memory has not been following moores law


I have found this to be very far from the truth, with most compute-bound or memory-bound tasks getting 10x speedups from minimal work by stuffing more work into each instruction and retiring more instructions per cycle without really changing the algorithm.


Yeah but why don’t we have both?

(Actually I think 99% of micro optimisations are harmful, it’s the higher level algorithmic optimisation that gives the best gains.)


Do you mean microopromizations done by programmers or the compiler? Because if the former, I agree (especially regarding old “wisdoms” like left shifting instead of multiplying by two, when that’s the first thing any compiler will do. But it is true of even more “complex” tricks, if anything they hurt performance).


Why is simdjson faster than json parsers that do not contain any simd intrinsics if using pshufb and movemask for string parsing hurts performance?


The former, absolutely. The compiler can do whatever the heck it wants as long as the result is the same.


Some abstractions leak more than others though, and I don’t think that has that much to do with the level at which the abstraction is. ORM-mappers are very leaky and you have to regularly read what SQL they generate and sometimes even read their source code. Most programmers will never have to read assembly generated by compilers and even less compiler source code.


Yeah you don't have to read assembly. But a higher level abstraction of a non-compiled language like python. The leakage occurs and is shown through performance.


Yes as said below, that implies you have leaky abstraction.

In that case, I'm with you, I'd rather see a pile of imperative code...


Every abstraction leakes and that is not a problem at all. Does IP leak under HTTP? Sure it does. But it is still a good abstraction used everywhere. Also, see the quote commented in a sibling comment.


It's bizarre to me how many people seem to be averse to any form of abstraction when literally anything you build is already atop an unbelievably large tower of abstraction.

We should as an industry be teaching people how to build good abstractions, not avoiding them at all costs.


Isn’t the venerable spreadsheet a popular and quite working example of a declarative functional environment?


the OP should preface the abstraction bulletpoint with a caveat that the abstraction must be well formed and logical.

spreadsheets are a great abstraction on calculating values based on tabular data.

ORM is not such a great abstraction on top of SQL.


> ORM is not such a great abstraction on top of SQL.

I agree wholeheartedly with that. I'm not sure that's the direction an ORM should be targeting the abstraction in the first place. It seems more productive to abstract the code away from the point of view of the database than to abstract the database away from the point of view of the code.


or ORM should be only abstracting the slightly different syntax of different database vendor's sql implementation, not the actual mapping of objects or fields to database values.

I think the database's query results should be exposed directly, so that the developer could control the DB properly and directly (such as get access to cursors, etc)


It wouldn't really be an ORM at that point, but there's definitely a point to be made that an ORM is the wrong level of abstraction altogether when dealing with data-heavy applications. I actually really hate seeing SQL embedded in a web application serving things to customers, spread out across a bunch of routes and sometimes different route servers accessing the same database. Having a data API service for your data, or one for each major kind of data (for a store let's say customers, products, orders, etc) that talks to the DB and another layer with the application logic that reads and writes through that layer works well for performance and maintenance. It puts your SQL and your grants in a more controlled realm. The user-facing application and the data API can more easily be maintained, especially if you need a new schema or to further shard your data.


> Assume the developer working here just had a 45 mins sync, has a meaningless repeat meeting about vapourware in 10 mins and a 2 hours town hall meeting after lunch... and still have to deliver that mess of a requirement you made him promise to deliver before any of these interruptions were even on his calendar!

If I have meetings loaded I am telling whoever is planning the sprint I am reducing my load for meetings. We, as software engineers, cannot allow this nonsense to continue. If I am interrupted 12 times a day I am bringing it up and the next sync. If it doesn't get fixed I am leaving. No one pays me to sit in meetings all day. If you want code delivered leave me alone or burn through young guns that don't know what they want yet.

While your post has a lot of good meat it is undermined by this single statement. It's far easier to work when you don't need to proactively reduce cognitive load to caveman levels because developers are by-and-large overworked.


I think the point of that statement was that we should assume the worst case, for the sake of making code easier to deal with for everyone. If the code is easy to deal with for the meeting-ridden hypothetical person, it will definitely be easy to deal with for a non-meeting-ridden hypothetical person.

I understand and agree with your frustration, but I don't agree that the statement undermines the parent comment at all.


I write the @next comment not as a proper comment, but as straight up text, guaranteed to give me a syntax error the next time I compile. No need to remember to search for anything!


I do that as well!

Or this

   Line of code <<<<<<<<<<<<<<<
With pointing, which breaks it.

@next is a nice way to formalise a break though.


> I don't find these in books

Check out A Philosophy of Software Design, 2nd Edition https://a.co/d/dZ8PJpt


I am a student, and have a very long way to go. but Jeremy Howard's fastai (part 2 of the deep learning course) really helped me understand why code needs to be understandable. this has given me a lot of appreciation for the art of writing understandable code.


+1 for this list. Choosing composition over inheritance is generally a good idea, but if the language doesn't support declarative/automatic delegation, building good compositions becomes tedious. Still, inheritance, when used for code reuse, breaks LSP (https://en.wikipedia.org/wiki/Liskov_substitution_principle) and the problems typically manifest, months or years down the line when you try to extend such areas of code


Regarding the @next, i found the best way to get back into a session after being interrupted, was watching yourselve working on the code.

https://github.com/microsoft/codetour

Basically record the last 5 minutes of your work and replay as code tour.


This seems like overkill, and unless it has a Shadowplay-like mode where you're persistently recording and can just dump the last few minutes to the hard drive, I don't think I'd even know (or remember) when to hit record. (So actually, maybe just use Shadowplay instead?)

One other alternative is Idea/Rider's Local History feature which is basically a parallel repo of your entire project that automatically saves a snapshot of your changes every time you save a file. Super useful for all kinds of backtracking, and you don't even need to remember to use it until you actually need it.


> Also avoid traits and mixins (run from the dark magic)

I've seen the word "trait" used to mean different things in different language communities. Do you mean it like in Rust, in Ruby, in Scala, or in C++? (Since you mention mixins, I'm guessing like in Ruby?)


> I don't find these in books, just decades of sweating and pulling my hair "why does it have to be so hard!?"

You've not been reading good books then.

Stuff like "separate business logic from framework internals" is one of the main messages of any book on software architecture, including Bob Martin's classic Clean Architecture.

Then "choose composition over inheritance" is Object-oriented programming 101.

Then the suggestions to not over comment and "name stuff to be easily grepable" are key takes of Bob Martin's Clean Code.

They are all good takes, but hardly unknown.


> Always aim for declarative programming

> limit the depth of the function calls

These two are at odds with each other. The more declarative your code is, the deeper your function calls will be. At the extreme case, you will have an DSL-like code that has an intractable stack trace.


> Also avoid traits and mixins

Too true - I remember working on a PHP project where traits weren't allowed and there were maybe two times that traits would have made life easier but they were easily worked around.


Abstraction basically means creating your own (domain-specific) language and an interpreter for it. It can greatly help code-understanding if you can grok that domain-specific language easily.


The @next is a good idea - gonna snatch that




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: