Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Maybe the spaghetti code conjecture is false (nickdrozd.github.io)
111 points by nickdrozd on Sept 25, 2021 | hide | past | favorite | 47 comments


This reminds me of the distinction between entropy and complexity. A hot cup of water has high entropy, a cold ant has high complexity. Another example I've seen describes the transition between the low-entropy state of two different-temperature bodies of fluid in contact with each other, and the high entropy state when they've equalized: complexity arises in the mixing process, with convection and mixing.

Maybe spaghetti code is closer to being like entropy, and we should expect busy beavers to be more analogous to complexity. Busy beavers are kind of on the transition between terminating (low-entropy?) and non-terminating programs (high entropy?), and complexity arises between low and high entropy temporally... Not a perfect analogy I know, even if I knew how to describe it precisely.


Entropy itself is somewhat dubious as Carlo Rovelli gets into with the idea of relative entropy and course-graining. Entropy only exists with regard to a representation / set of observable operators. When you change your basis, you get a new entropy. So the cold ant, looks like it has low entropy because you've used the temperature observable. Under the structural/integrated-information observables, it would have a much higher entropy.

Read more here: Is Time’s Arrow Perspectival? - Carlo Rovelli

https://arxiv.org/pdf/1505.01125.pdf


This doesn't work for the ant case. Temperature is defined in terms of entropy change per energy added to the system. But if you burn it up entropy would go down in your definition, so you would have negative temperature? Doesn't make sense.

So in physics the temperature view will always be correct. You can use a different basis for it, but it will always be a temperature view, since everything disintegrates into plasma if you add enough energy, so that plasma needs to have higher entropy than whatever you started with, so your "structural" interpretation of physical entropy doesn't work.


The amount of complexity that an intelligent being can cause is far greater than the momentary temperature of that being. You are stuck on local slices of a momentum based basis. Think in broader terms, the entropy of the human race, thermodynamically, is tiny, and yet we could potentially grow to change the entire galaxy, let alone the entire universe. Entropy based on course grating of momentum states is only one possibility and is severely handicapped, especially when one realizes the universe may be cyclic -- clearly if it is, the observables we are using that lead us to believe entropy is increasing are sort of like the Shephard tones [1]

https://en.wikipedia.org/wiki/Shepard_tone

Also, I should add, a photon of the lowest possible energy we can currently detect, still generates the same fidelity of interference and quantum effects that a high energy photon does. This is immensely applicable to low energy quantum computing. Which means that you can get a wavefunction at the final gate of one's quantum computer of enormous complexity, with almost no relation to energy magnitude. This relationship, between fidelity of the double-slit interference of low energy photons vs high energy ones was used to check for a screen-door effect to see if we might be in a discretized simulation that allocated compute based on energy magnitude. We found no such pixelation/screen-door effects. Quantum computation throws a huge wrench in classical thermodynamic entropy.


Sure entropy depends partially on how you choose to describe your space but there are limits to how much two definitions can differ.

In particular the set of low entropy states must be quite small, so even if two definitions don't agree they must still agree for quite a large part of the high entropy states.

A cold ant is simpler because the molecules don't move in complicated ways. It's more complex if you ignore the movement of molecules, but then you're changing more than just how you define entropy you're leaving out bits of physics.


What about the entropy of a black hole, A / 4hG? (A = surface area, h = reduced Planck constant, G = gravitational constant.) That seems very well defined.


But aren’t black holes essentially injecting large amounts of energy into their domain of influence? Entropy would decrease in the same way it would decrease for a star system.


Ascribing qualities of being well structured or spaghetti code to programs as tiny as ~40 bits seems an exercise in futility.

However, we can prove some precise results if we assume for instance that the number of well structured n-bit programs is at most c^n for some constant c<2.

In that case no busy beaver beyond a certain fixed size can be well structured, as that would allow for compressibility.

So busy beavers, by virtue of being incompressible, cannot have much structure.


> Ascribing qualities of being well structured or spaghetti code to programs as tiny as ~40 bits seems an exercise in futility.

I disagree. There are two programs given in the post, each requiring just 32 bits to describe, and the first one is definitely well-structured and the second one is definitely spaghetti. You can see the difference in the control flow graphs.


Reposting this comment (with some expansion) from elsewhere in the thread, but I disagree that the second one is spaghetti.

B is once again a while(input == 0) loop, that exits to D. D and C together form a while(input == 1) loop, that exits to A. And A is again a switch between two routes, although in this case, rather than dispatching to one of two independent routes, it switches between a full route (go to B) and a partial route (jump straight to D).


Suppose you are at some node. Here are two questions you might ask:

1. How did I get here? 2. Where will I go next?

In the first program, those questions are pretty easy to answer. Node D has two entry points and node A has two exit points, and in all other cases there is only one possible answer. If you are at nodes B or C, you know exactly how you got there and exactly where you will go next. If you are at node A, you just came from D, and if you are at node D then you will go to node A next.

But in the second program, three of the four nodes have both two entry points and two exit points. If you are at node C, how did you get there? You could have gone the A-B-D-C route, or you could have come directly from A. And where will you go next? Maybe you will go to A, or maybe you will go to D, and from there...possibly back to C? Or maybe on to A.

The difficulty in answering these basic questions is what makes the second program spaghetti.


But you can logically group C and D, and then it's not so complex, I would say.


GIT (Goedel's Incompleteness Theorem 1) says that for every consistent formal system there is some statement about the natural numbers which can't be proved in it. This article made me think that maybe any statement about the naturals could be proved by F^n(M) where F is some sort of "strengthening" operation on the formal system M (probably Peano arithmetic), and n is an ordinal number. The problem of proving statements about integers then reduces to inventing notations for expressing ordinal numbers. It's then easy to believe that there might be diminishing returns on expressing larger ordinals, and therefore GIT doesn't matter much in practical reality.

I haven't made the above into a formal conjecture. Is there any truth to the above? Have I missed something obvious? Is there a result in the literature to this effect? Cheers.


See https://xorshammer.com/2009/03/23/what-happens-when-you-iter... basically you have problems once you get past the finite ordinals.


Many people fret about spaghetti code. Personally, I also fear lasagna code - too many layers, often with poor separation.


You don't want pasta in your code, and there's more than spaghetti code, like lasagna code, ravioli code, etc.

https://wiki.c2.com/?PastaCode


Describes my workplace. Too pedantic on layers and abstractions. Endless code reviews over nonsense. Before anyone says this - we have linters. The problem is nits about how the code reviewer would do things.

The pedants also can't figure why nothing gets done.


I blame object orientation. It is too easy to create useless classes. Funcional programming data types, on the other hand, are much more direct to the point, which makes it easier to spot useless layers.


It happens everywhere, not just OO. It's just human nature.

I worked on C code where there were 8 layers of calls where each function was f() {return next_layer_f();} . The thing is sometimes layers serve a function and sometimes they don't. Then there's an always present pressure to look "professional" and so tons of layers and lots of details/checklists. The 4 lines of code that actually does everything you need doesn't cut it. And what will you do the rest of your day?

The funny/sad thing is it doesn't matter how hard you try to fight human nature it doesn't seem to work. I love how the Go developers thought `go fmt` will end the formatting nitpicks. Guess what, it doesn't. We just found other things to nitpick. Or "Agile" or whatnot. Python says "A Foolish Consistency is the Hobgoblin of Little Minds" ... Well, try that on your co-worker in your next Python code review ;) Functional programming doesn't help either. In very small teams/companies sometimes the stars align and magic happens but it's really hard.

EDIT: Also nothing to do with the article ;) though maybe we can dream up some connection to the human brain.


I used to be balls deep into OOP. I was in the rabbit hole. For me it was _the_ way programs had to be written. Whenever I started a new project, it would start out as a bunch of interfaces and abstract classes (with the god abstract class which everything extended from present and accounted for). At the time there was also the transition from PHP 4 to PHP 5 which unlocked a treasure trove of new OOP features and obviously every one of those features had to be used. However when things went wrong they would often turn into a nightmare, spending hours chasing down obscure bugs with endless print_r's to follow state changes. Over time I became better at managing this, keeping the amount of abstractions lower and just general better OOP design, but I was still afraid whenever things would go south, it meant hours of debugging, as was often the case. At one point or another I started asking questions about OOP design, the nature and origin of it and searched the internet for answers. Along the way I started realizing that many of the problems I was having are intrinsic to OOP. Whenever I would search for solutions to the complexity of OOP the answer would always be: you're doing it wrong.

Along the way I transitioned to OOP-less programming (I'm calling it OOP-less because I don't want to start a fight over "functional" vs "procedural" vs "..." etc...), not with the intention to do away with OOP but just to play with it. I rewrote some earlier, simple programs and what became rapidly obvious was how easy the code was to debug. The datastructures I had written prior were way too complex with too many levels and useless wrappers to make everything fit into the OOP paradigm. I rapidly became annoyed with the complexity of OOP and realized, I was in a rabbit hole and I had dug in _deep_. Even now when I'm writing bigger programs OOP-less I still find them much easier to debug. I'm still afraid debugging sessions can get rough but as time passes by this seems like more of a PTSD effect from my 15 year long OOP stint.

By now I've mostly done away with OOP programs but it's still annoying to have to interface with OOP designed libraries. I also changed my coding style camelCase to snake_case and the latter is much more legible to me but as a lot of libraries use camelCase or PascalCase it doesn't make the code look pretty.

So yes long story short, I share your assignment of blame. Complexity just seems to be the nature of the beast.


This weekend I did an experiment of writing a toy project as procedural PHP instead of OOP. This was because of a discussion on reddit where someone claimed that procedural was much better than OOP, but without giving any proof or showing any code.

Procedural PHP has ofc always existed, but I wanted to experiment how a modern take with the latest PHP 8.1 features could look like. There was both plus and minuses. I will publish it to github soon.


snake_case is better

"Although, no difference was found between identifier styles with respect to accuracy, results indicate a significant improvement in time and lower visual effort with the underscore style. "

http://www.cs.kent.edu/~jmaletic/papers/ICPC2010-CamelCaseUn...

Personally I can't decide on what to use so there is usually a mix with snake_case and camelCase, even within the same project, depending on what it is applied on.


>spending hours chasing down obscure bugs with endless print_r's to follow state changes

It's not better chasing that using an interactive debugger with expression breakpoints? (pause execution when some variable/property changes?)


I've used interactive debuggers but didn't feel like they made debugging any easier. This article https://buttondown.email/geoffreylitt/archive/starting-this-... mirrors my feelings about debuggers.


Code reviews are misunderstood. They're not about code quality, but about programmer quality. Yes, juniors get to learn tricks from the old monkeys (I'm one btw). But conversely, old monkeys should learn to accept variance in code where the implementation is not absolutely critical.


Company I worked for got some new management and they decided to add 3 more layers to our stack. THREE new layers. I talked them out of one of these but it wasn't enough. I'm no longer working there so I hope it works out for them.


Code should be judged by inputs and outputs only! I hate code reviews. We end up rewriting well-structured code anyway simply because it is always easier to rewrite than to do surgery


That’s a bit too extreme to me (and already covered by unit tests). Code reviews are good mostly for asking questions, not dictating changes.


I do agree but too often they turn into "well I would do it this way".

If it is some very big and very critical system then yes, code reviews can be beneficial compared to the alternative. But for systems that are innovative and rapidly going through iterations as it is, it's better to judge coverage of test cases (and adequate performance but not optimal)


What about performance? E.g. you don't want to implement an O(n) algorithm if an O(log(n)) equivalent exists, and merely testing inputs and outputs won't reveal that.


I do agree, but it depends on the criticality of performance and the level of change the system goes through by default.

For us we are always implementing complex new logic for not very many users, so there's not much benefit to some clever data structure that we are all too dumb to understand. If I get a ticket that involves inserting something into the logic you wrote, and I can't understand how to operate your current solution, I will be hacking up an n logn solution as long as it is adequately performant


I kind of wonder if there is some kind of overlap between this problem or some variation of it, and the (as I understand) still open question of how life manages to evolve as quickly as it does.


I don't think the "living nightmare" control flow graph is that bad?

B is once again a while(input == 0) loop, that exits to D. D and C together form a while(input == 1) loop, that exits to A. And A is again a switch between two routes, although in this case, rather than dispatching to one of two independent routes, it switches between a full route (go to B) and a partial route (jump straight to D); basically you could consider B and C to exist together in an if-block of sorts.


It's possible to write the first one straight out in C using only whiles, with no breaks and no continues. Can the same be done with the second? Maybe, but I couldn't figure out how.


This article presumes too much foreknowledge of what is being talked about for me.

Does anyone have a pointer to what I could read to understand this?


Here is a good starting point: https://webusers.imj-prg.fr/~pascal.michel/

I read about busy beavers a while ago but have only started digging into the subject. Its really fascinating!


It's amazing how many people jumped in to comment without even a minimal glance at the article.

Please folks, before jumping in to comment, at least make sure the article is about what you think it's about.


I think titles get used more as a discussion topic starting point.


[flagged]


The quote is: "The Spaghetti Code Conjecture says that Busy Beaver programs ought to be, in some sense, as complicated as possible."

It takes real gymnastics to misquote that into:

"programs ought to be, in some sense, as complicated as possible"


What a bizarre misquote... busy beaver programs are a specific type of algorithm, in the sense that you want to extract the most possible output, barring infinitely generating functions. It makes a lot of sense that the programs be as complicated as you possibly can make them, from the conceptual level to the explicit coding, because the problem space is densely complex and most of the simple stuff has been iterated over.

It's a statement about number theory, not general software design best practice.


I am about to write a rant about this author must be out of his mind or simply a junior engineer lost in narcissism...

How wrong I have been!

https://en.m.wikipedia.org/wiki/Busy_beaver#:~:text=In%20the.... Busy beaver game is to design a Turing machine that produces the most output for a halting program. And this conjecture is about a general property of the programs that can produce the most output.


Do you mean that you were about to write such a rant?


Spaghetti Code: using a 1000 lines of code where 100 will do.


That’s not what people mean when they say “spaghetti code”. When people are complaining about spaghetti code, they're complaining that the code is difficult to follow. Perhaps the control flow goes all over the place, for example.

That’s how the spaghetti code conjecture got its name. It’s a conjecture about the control flow structure of busy beaver machines. In this conjecture, you’re comparing different programs with the same size (if we think of size as number of states), but where the control structure is connected in different ways.


Maybe but at some point, at least in my country, anything that wasn't a rigid bunch of classes, interfaces and what not following all that Java EE / Designer pattern/ Solid / OOP stuff was deemed "spaghetti code" and it made you "a bad developer", regardless of the language used. Granted this was in the mid 2000 before people "rediscovering" the merits of functional programming. Today, procedural C style coding or functional programming is more accepted in dev shops. I don't think I've heard about UML in the last 5 years anywhere I worked for instance. Young programmers are lucky they avoided that era...


I don't know about that, code golf often produces spaghetti.


Ironically, the definition of busy beaver functions are in some sense code golfed! If the same output could be achieved in fewer bits, it would correspond to BB with smaller argument.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: