Some take "line of code" literally. Others see it as a stand-in for complexity. ...

segfaultbuserr · on March 8, 2021

There are many metrics for measuring the logic complexity of code. An early and best-known one is Cyclomatic Complexity [0] - The more possible execution paths, the higher the score. You can find more at Software Metric [1].

[0] https://en.wikipedia.org/wiki/Cyclomatic_complexity

[1] https://en.wikipedia.org/wiki/Software_metric

tough · on March 8, 2021

For VS Code have used this on the past: https://marketplace.visualstudio.com/items?itemName=kisstkon...

and this one https://marketplace.visualstudio.com/items?itemName=Stepsize...

These two haven't tried before but doing it now:

https://marketplace.visualstudio.com/items?itemName=selcuk-u...

https://marketplace.visualstudio.com/items?itemName=TomiTurt...

gurkendoktor · on March 8, 2021

If there is a useful metric, then I haven't found it yet. The data on cyclomatic complexity[1] does not convince me, and in practice, the linters I've seen have complained about boring code like this:

    switch (foo) {
    case KIND_1: return parseObjectOfKind1();
    case KIND_2: return parseObjectOfKind2();
    case KIND_3: return parseObjectOfKind3();
    ...
    case KIND_15: return parseObjectOfKind5();
    case KIND_16: return parseObjectOfKind16();
    }

There are 16 paths and yet this code is easy to follow. There is no substitute for human judgement (code reviews).

[1] https://en.wikipedia.org/wiki/Cyclomatic_complexity#Correlat...

wvenable · on March 8, 2021

> case KIND_15: return parseObjectOfKind5();

I don't like this type of code for exactly this reason.

Twisol · on March 8, 2021

The association between the two has to be written somewhere. Can you provide an alternative approach that isn't vulnerable to typos?

Jtsummers · on March 8, 2021

Their point seems to be in the mismatch of names. Imagine seeing this sequence, with names instead of numbers:

  switch(HTTP_METHOD) {
    case PUT: processPut(...);
    case POST: processPost(...);
    case GET: processDelete(...); // wtf
    case DELETE: processGet(...); // mate?
  }

To the reader that appears to be an error even if it is precisely the thing you want to happen.

wvenable · on March 8, 2021

This is basic DRY violation. The number is duplicated twice and that can get out of sync (as in the example).

I think everyone has coded this way out of expedience and some of us have eventually messed it up too. But you could, for example, use a macro in C to only list the number once. It might not be a trade-off worth making though.

Personally, I've used reflection to do this kind of mapping of enums to functions in a language that supports reflection.

gurkendoktor · on March 8, 2021

The code was inspired by a piece of Java code that rendered geometric primitives. It was basically case "box": return parseBox(); as you suggested in your other post.

Turning "box" into "parseBox" using string operations and then using reflection to call the right method is an approach I'd consider in Ruby, but not in Java. It breaks IDE features like "Find Usages" and static analysis, and the code to dynamically invoke a Java method is more annoying to review than the boring repetition in my posted snippet.

Twisol · on March 8, 2021

> This is basic DRY violation. The number is duplicated twice 2 and that can get out of sync (as in the example).

I think it's pretty clear that the number is a placeholder for something reasonable, e.g. making an association between two distinct sets of concepts. You'll still be vulnerable to copy-paste or typo issues.

> Personally, I've used reflection

Now you have two problems (and still have to maintain an association between two sets of concepts).

wvenable · on March 8, 2021

I don't think most enum-to-function mappings are distinct sets of concepts.

In my own code, I have used the enum name to map to a set of functions related to that enum value. The association is implicit in the name of the enum value and the name of the functions. No way to mess that up like this.

If it was:

    case: KIND_BOX: return parseObjectOfKindBox();

It's no difference. Still repeating "Box".

Leherenn · on March 8, 2021

Depends on your language, but you could use the type system instead; e.g. the function is chosen by the compiler based on the type of the value.

In more dynamic languages, you could probably use introspection.

Lastly, this does not alleviate the association issue, but I prefer the alternative of declaring the associations in an array/map somewhere, and using map[enum_value]() instead of the switch.

slaymaker1907 · on March 8, 2021

In a lot of ways, I think the map solution is actually worse. There isn't an easy way to figure out what code is used for a given enum value with the map alone. The searchability depends on how the map is initialized. Not to mention a map will always introduce null or option types that you have to handle.

A map can be a good solution though, particularly if it's something like mapping enum values to strings that is constructed via EnumClass.values().map... or something so that you know the map is a total function.

gurkendoktor · on March 8, 2021

Using a map also completely sidesteps the point of cyclomatic complexity because there are no code paths to count anymore; now they're data. And even though the function does the same as before, the linter will congratulate you on the reduced complexity.

Twisol · on March 8, 2021

And you'd have essentially replaced statically understood control flow with the equivalent of a computed goto. It's not like it's a bad approach, but from a code path standpoint, it's unequivocally worse, not better. (Imagine if that map was mutable!)

Leherenn · on March 9, 2021

> Not to mention a map will always introduce null or option types that you have to handle.

It is the same with the switch, at least in C/C++. You either have a default, or must list all the possible cases and still return something "at the bottom".

slaymaker1907 · on March 9, 2021

This is becoming an increasingly less common strategy for languages to take given the popularity of sum types.

nawgz · on March 8, 2021

I mean, hilarious point, but I'm not sure it's valid. Clearly he's numbering things here, but I am guessing in the real world you're far likelier to see tests of typing or some not-numerically-indexed value rather than literally numbered ENUMs

wvenable · on March 8, 2021

This can happen with any enum to function mapping. You cut and paste and forget to change one or the other.

In C, you could use a macro in this call to make sure that the name/number is only specified once.

nawgz · on March 8, 2021

I mean, this can happen with any code by that logic if you're cut and pasting.

I think the real issue is types aren't included in the example. I work in much higher-level languages, but if you are passing strongly typed objects thru and your switch is derived on this typing, it's probably going to be illegal in your type system to return certain results.

If your type system doesn't validate your results, then you'll be prone to the class of error you are discussing. Maybe that's common in C.

nitrogen · on March 8, 2021

It kind of encourages you to use a lookup table or polymorphism, which isn't far off from what the compiler is likely to do with that switch statement.

As for correlation to number of defects, another important factor is maintainability (velocity of new features in growing programs, keeping up with infrastructure churn in mature programs). If reducing perceived complexity improves maintainability, and if measuring cyclomatic complexity despite its flaws helps reduce perceived complexity, then it's still a useful metric.

Waterluvian · on March 8, 2021

Feels like one of those classic, "no one tool gives you the one correct answer. They all give you data and you come to an informed conclusion having utilized one or many of them."

gurkendoktor · on March 8, 2021

> It kind of encourages you to use a lookup table or polymorphism

And I think that is a problem. switch()ing over an enum will give you a warning about unhandled cases in some languages. But if you start building your own lookup tables, you will be just as prone to typos like mine, except with more code, and less static analysis.

Yet the cyclomatic complexity drops from 16 to 1. I don't think that's a healthy incentive.

nitrogen · on March 9, 2021

I think that's a fair criticism of a specific failing of a specific complexity metric. A reasonable response, especially if switch statements or pattern matches in your language are better at detecting completeness than other approaches, might be changing the metric to reward switch statements that completely cover the input range. It also makes it clear why any one metric should not be a target with no exceptions. I don't think it invalidates the concept of measuring complexity.

gurkendoktor · on March 9, 2021

There are many things that the complexity calculator could take into account: the specifics of the language, the libraries that are used (which could start threads or contain hidden conditionals), the kind of project (following a spec vs. explorative programming), and so on.

I think it's easy to come up with an algorithm that's better than a coin flip or counting LOC. But to be useful, the algorithm has to be on par with the judgement of the developers who check in the code and review it. Otherwise the false positives will take time away from other bug-reducing activities like code reviews and writing tests.

Of course I don't have data to prove it, but I'm convinced that every complexity checker I've encountered in the last 15 years has been a (small) net negative for the project. Maybe machine learning will improve the situation.

Waterluvian · on March 8, 2021

That's what I was thinking about when I used "ideas". To me there's one main idea in all that code: "we are returning a parsed object of a kind"

Not that I'm trying to sell "ideas". I don't even know. But it's this very loose concept that floats around my mind when writing code. How many ideas are there in a stanza? Ahh too many. I should break out one of the larger ideas to happen before.

segfaultbuserr · on March 8, 2021

Speaking of measuring "ideas", the holy grail of complexity metric is Kolmogorov complexity - theoretically, the inherent complexity of a piece of data (including code) can be defined as the shortest possible program that generates it. For example, 20 million digits of pi or a routine that uses 20 branches to return the same object has low Kolmogorov complexity, because it's easy to write a generator for that, meanwhile a parser is more complex.

But it's only a mathematical construction and is uncomputable, just like the halting problem. In real life, for some applications a good compression algorithm like LZMA is sufficient to approximate it. But I'm not sure if it's suitable for measuring computer programs - it would still have a strong correlation to the number of lines of code.

oftenwrong · on March 8, 2021

This is one of the benefits of Cognitive Complexity:

https://www.sonarsource.com/docs/CognitiveComplexity.pdf

>Switches

>A `switch` and all its cases combined incurs a single structural increment.

>Under Cyclomatic Complexity, a switch is treated as an analog to an `if-else if` chain. That is, each `case` in the `switch` causes an increment because it causes a branch in the mathematical model of the control flow.

>But from a maintainer’s point of view, a switch - which compares a single variable to an explicitly named set of literal values - is much easier to understand than an `if-else if` chain because the latter may make any number of comparisons, using any number of variables and values.

>In short, an `if-else if` chain must be read carefully, while a `switch` can often be taken in at a glance.

slaymaker1907 · on March 8, 2021

Great paper! I particularly liked how they treat boolean expressions. It kind of has a mathematical justification as well since a series of only && or || is trivial for a SAT solver.

I disagree with them on assuming method calls being free in terms of complexity. Too much abstraction makes it difficult to follow. I've heard of this being called lasagna code since it has tons of layers (unnecessary layers).

Maybe the complexity introduced by overabstraction requires other tools to analyze? It's tricky to look at it via cylcomatic complexity or cognitive complexity since it is non-local by nature.

colonelpopcorn · on March 8, 2021

Except it's not easy to follow. Why does kind 15 return a parsed object of kind 5? That's not counting the complexity of each function called, either.

johncolanduoni · on March 8, 2021

I think they were using the numbers as stand-ins for what would be in reality names be descriptive names (foo, bar, etc.).

gurkendoktor · on March 8, 2021

Ouch, sorry for the typo. But I'll double down on this: The fact that the typo is so obvious underlines that the structure is easy to review :)

kccqzy · on March 9, 2021

> The more possible execution paths, the higher the score.

Great! Now we can all begin to write branchless code. Make branch prediction miss a thing of the past!

mywittyname · on March 9, 2021

You jest, but I had a manager who loved Cyclomatic complexity as a measure of programmer ability. I tried in vain to convince him that CC was measuring the complexity of the underlying business logic, not my programming ability.

I wasted a lot of brainpower cutting down CC in my code doing terrible things like storing function names as strings in hashmaps, i.e.,

    String fn = codepaths.get(if_statement_value);
    obj.getClass().getDeclaredMethod(fn).invoke(param,...);

Because that could replace several 'if' checks, since function calls weren't branches. Of course, no exception handling, because you could save branches by using throws Exception.

To this day, I wonder if Cyclomatic complexity obsession helped make "enterprise Java" what it is today.

nybble41 · on March 9, 2021

The worst part is that this doesn't even reduce the cyclomatic complexity of the code. You replaced conditional branches with table lookups, indirect function calls, and exceptions, but you still have the same number of paths through the code—or perhaps more with the extra code for the table lookups. In the end you're just hiding the real complexity from the tools.

Minimizing cyclomatic complexity might actually be a reasonable approach, if the complexity were accurately measured without these blind spots. For example, any indirect calls should be attributed a complexity based on the number of possible targets, and any function that can throw an exception should be counted as having at least two possible return paths (continue normally or branch to handler / re-throw exception) at each location where it's called.

Waterluvian · on March 8, 2021

Perfect. This is what I was curious about. Haven't heard of either. Thank you.

LanceH · on March 8, 2021

The real measure you want is the cyclomatic complexity divided by the complexity required by the problem. The problem with line counting (or cyclo counting) is that it assumes a constant ratio.

Hammershaft · on March 8, 2021

It is not trivial to understand the necessary complexity of a problem. In SICP, there's an early practice problem that asks you to turn a recursive number pattern generator into an iterative generator. What the problem doesn't tell you is that the generator obscured by complex recursion is actually creating a tribbonaci number sequence. This hidden insight turns a difficult problem into a trivial problem.

LanceH · on March 8, 2021

I certainly didn't mean to imply it was trivial (or even possible). I'm just pointing out that assuming increasing LoC implies an increase in necessary complexity is just wrong.

WorldMaker · on March 8, 2021

Developers have tried to build metrics for "complexity" over the years. Cyclomatic Complexity is often pointed to as one of the more successfully deployed: https://en.wikipedia.org/wiki/Cyclomatic_complexity

As with any metric, it's something to take as useful "advice" and becomes a terrible burden if you are optimizing for/against it directly. (Such as if you are incentivized by a manager to play a game of hitting certain metric targets.)

It's also interesting to note that a complete and accurate complexity metric for software is likely impossible due to it being an extension/corollary of the Halting Problem.

khaledh · on March 8, 2021

I'm definitely not encouraging writing "clever" short code. Always strive to write clear code. You write it once, but it will be read (and potentially changed) many, many times.

As for complexity metrics, this is a contentious topic. It's hard to quantify "ideas" or "concepts". I think this is the part where technical design reviews and good code reviews help keep in check.

Twirrim · on March 8, 2021

My IDE spits out a warning when I reach a certain level of complexity in a method, using Mccabe's Cyclomatic Complexity as a measure. It roughly maps with "too many if statements" in my case.