If you want even better escaping safety: use different types for safe vs unsafe strings.
You should not ever be crossing the streams. So make it a compile-time failure. And anywhere you truly do know better, you can just use a transformation func like `safe = iKnowWhatImDoingCompilerSTFU(unsafe)`
And keep in mind that "escaping" is not a binary. Something that works in an HTML node's body may not work in an attribute or a request header or a URL. This is the primary reason that most things suggest escaping at output time: you cannot safely escape something without knowing where it will be used.
> my example above where we decided that us meant “unsafe string” and s meant “safe string.” They’re both of type string. The compiler won’t help you if you assign one to the other and Intellisense won’t tell you bupkis. But they are semantically different; they need to be interpreted differently and treated differently and some kind of conversion function will need to be called if you assign one to the other or you will have a runtime bug. If you’re lucky.
This is a big advantage of Rust with repr(transparent) and Haskell with newtype. With them, you can make safe and unsafe string types so that the compiler will make sure you don't mix them up, with no effect on the generated code.
You don't need to reach for Haskell or Rust, you can wrap any unsafe string in an UnsafeString type that can't be mistaken for a plain string. I was hoping the post would get to saying 'make wrong code into errors'.
You can't always encode every semantic mismatch in the type system (or be checked by the compiler in general). Although with a sufficiently powerful type system, at least the ones that are reflected in variable naming should be catchable.
For example, the convention of naming your 4x4 matrix transforms AfromB leads to the following kind of composition:
vecInD = DfromC * CfromB * BfromA * vecInA
Here, you can very clearly spot any mismatch (e.g. if someone put BfromC in the middle instead). With the right language, you could also have each matrix source and destination reference frames encoded in the type system if you so desired.
Where it gets harder to produce errors is higher-level patterns that transcend adjacent names:
if (atomicVariable.read()) { /* something that assumes atomicVariable is true */ }
This should invite suspicion - it is clearly concurrent code, so what prevents atomicVariable from changing right after entering the if?
Well, you kind of underline my point: Rust does not encode the difference between "is true" and "was, at least briefly, true" in the type system. You can reach for other types/abstractions (and whatever data you touch inside the if might also have the borrow checker up in arms), but that above code as written is suspect.
This thread seems surreal to me. Rust type system does encode exactly what the later comments are complaining that isn't encoded. And you sate what code cares about the variable keeping its value through a code block, between a pair of braces.
What I meant was *For atomics*, rust does not encode the difference between "is true" and "was, at least briefly, true" in the type system. Obviously, rust has many ways to track this kind of distinction in a compile-time manner (and yes, the borrow checker is one of them). That was not in question.
Hum, ok. You don't consider the borrow checker to be part of the type system.
Well, it's perfectly fine to disagree on name definitions, but that is some fuzzy and dizzy line. Lifetimes impose themselves together with the static types features all the time.
My point is that you can still have a race condition/logic bug, even if all your atomics are accessed atomically and all your borrows are fine (i.e. no data races). Note that I specifically wrote "[code] that assumes atomicVariable is true", not "code that accesses a shared resource under the assumptions that it has unique ownership because atomicVariable is true".
Yes, this situation is much harder to produce by accident in rust - the language significantly raises the amount of things that are encoded/checked at compile-time and strongly guides you to safer designs. But I fail to see the "the equivalent Rust code will not compile" part.
Here is a pretty stupid example including exactly the form of my original snippet: https://godbolt.org/z/6e18K85hT - You might get 2 or 3 printed after "see?". Or you might not get the first print at all (unlikely). And if you say "well nobody would write code like that" then that's exactly the point: It looks suspicious (even though it compiles).
Reading ad-hoc is fine if you just want to observe the value at some point in time.
That is why there is room for both a “peek” function, maybe should be called “unsafeRead”, and a “withValue” method that takes a function with the actual value as argument and locks the atomic variable while executing the closure. This prevents the user from ever forgetting to write the lock.
> Reading ad-hoc is fine if you just want to observe the value at some point in time.
There are such cases, indeed. Double-checked locking comes to mind. But if the code inside the if does assume the atomic to have a given value, it is almost surely wrong (absent of other synchronization mechanisms, such as in double-checked locking).
Thank you, I had the same fuzzy thought in the back of my mind while reading this but couldn’t quite put it into words. The idea of depending on developers to understand “usFoo” means this is an unsafe string and “sFoo” means this is a safe string seems error prone. Those rely on the developer knowing what “us” and “s” represent.
A much better idea would be to give that information to the compiler, so the developer can get a useful error if they try to do something unsafe. We should try to turn these onto compiler errors as much as possible so if a developer does something bad, it wont even compile. This seems like a much more fool proof way to accomplish what the author is trying to do.
100% true. It’s a sign of evolving understanding of software engineering — in 2005, it was still common to believe that the best way to prevent errors was to help fallible human beings learn to detect them better. Now, well, it’s still common but we’re overall getting smarter about letting computers protect us from ourselves.
(And the kind of technique described here still matters because you should understand why the guardrails are where they are.)
What's nice about Rust here is that the coercion can be silent but it isn't by default. So open("file") works even though not all file names are strings, but matrix + filename is a compile error rather than doing something nobody wants.
That results in a cultural difference (new type idiom) that makes UnsafeString viable while in some languages you could do it but you wouldn't
Strong type support is not unique to niche languages. Some very mainstream languages support strong types, too. Notably, C++. As in Haskell or Rust, you are not obliged to use them. But you don't need to wait until you get to use Rust or Haskell at work before you start.
We used to say you can code FORTRAN in any language. C, too. Coding Rust does not force you to sane types. You have to choose to use them. So, choose them now.
I think this feels like the wrong place to solve the problem. I don't think you should be building HTML by concatenating strings like this.
If you build a model and render it separately using a template, whether a string gets entity-encoded or not is a concern for the template, and a sane template system entity-encodes by default unless explicitly told not to do so.
In this specific case you're right, but that was just an example. Generally speaking, I think using naming conventions is a very poor solution, types are better.
C isn't very good at type correctness. In a lot of languages if I call F(some_array) it passes the array, in C it passes... A pointer to the first element in the array. C programmers, like a dog whose master kicks it when he's drink, are used to this, but it's not actually good.
For this behaviour to be ergonomic you need consistent type correctness. A language which thinks 5 is true is already shaky about types, let alone one which thinks arrays are pointers.
I don't disagree with what you're saying, but whatever (little) type safety C has can still be put to good use for the purposes of the article. The problem he's outlining is that you have two kinds of strings: "clean" strings and "tainted" strings, and you can only get a "clean" string from some kind of escaping function that returns "clean" strings. That's the typical problem you solve with types, and it's been true forever, even in old versions of C.
The idea in the article is to make wrong code look wrong, so it's even more important to make it easier for programmers to check code visually for errors in weakly typed languages such as C, where wrong code can easily get past the compiler. Even in strongly typed languages such as Haskell or Ada, it's still a good idea as it's useful self-documentation, and it also means you're less likely to get compilation errors.
While I completely agree that the implicit conversions between arrays and pointers are very bad, in decades of using C and many other programming languages I have seen neither any case when 5 being true is undesirable nor any case when a distinct "Boolean" type is useful.
I've seen plenty of cases of it being undesirable for 0 to be false, in languages which aren't C.
In C, and any other language where the memory model is based on bits and bytes, it's well defined, there's no meaningful way for a value to be 'falsey', so I'd agree that a distinct boolean type doesn't do a lot.
I could make a case that it lets the compiler choose the width of a boolean, but that's not necessarily good.
Having a distinct boolean type prevents this error (fortunately detectable with linters, so pay attention to your linter output):
if(n = 5) { ... }
When the author probably meant:
if(n == 5) { ... }
In languages where the assignment statement can also be used in expressions (as it evaluates to the value being assigned) the above can create very subtle errors in a program. Having a distinct boolean means that the type system will reject that. Of course, it wouldn't reject this:
if(b = true) { ... }
But that's something you generally shouldn't write anyways (and, again, your linter will probably catch).
The possibility of writing by mistake an assignment in a condition test is something strictly specific to C and derived languages and this is without doubt one of the greatest errors in the design of C.
In any language that keeps the Algol notation, i.e. where "=" is a relational operator and ":=" is assignment, this kind of mistake is extremely difficult to make and easy to notice when it happens nonetheless.
So the fact that a programming language can have a bad notation for equality that leads to mistakes cannot be used to justify unnecessary complications in different kinds of Boolean expressions.
Many languages with distinct boolean types do not require that the condition expression in an if statement evaluates to that type, and hence do nothing to prevent that error.
Modern C actually has a boolean type, named _Bool (aliased bool if you ask for it explicitly) and yet sure enough despite this it figures '0', 100 and 0.1 are all truthy.
Yes, what's critical here is that the only type that should be boolean true is the boolean value true, and your language is letting you down when it insists strings, or integers or some other type entirely should be "truthy" because it was easier than writing what you meant - soon or later this type laziness will bite you.
I think this is more obvious in languages with pattern matching where the boolean if condition is just the world's least interesting pattern match, clearly (69, "Nice", true) doesn't match (true, true, true) as a pattern and is instead a type error.
>Look for coding conventions that make wrong code look wrong.
Or code more readable, and more obviously right or wrong.
A few I use :
- always suffix with the unit, for quantities for which we are likely to not always use SI units (ex.: time ("double timeS" for physics, "long timeNs" for low level scheduling, etc.), angles (latRad, latDeg, etc.))
- lists/sets of Foo named fooList/fooSet by default (or xxxFooList, etc)
- Map<Foo,Bar> : barByFoo
- Foo[] : fooArr
- return value : ret (some use "result", but it's longer to type/read, "ret" is more obviously related to "return", and a result can just be intermediate and not something to return)
Heck yeah. One of the few things in this godforsaken industry/craft that I consider to be unambiguously true.
for quantities for which we are likely to not always use SI units
Do you mean "likely to not always use SI base units?" Second and millisecond are, for example, both SI units.
Regardless, I think it's worth suffixing all variables with unit type regardless of SI base units or not. This is particularly true for large and/or long lived projects with multiple developers.
Developers are almost always the limiting factor in any development effort.
Therefore, always prioritize reduction of developers' cognitive load unless there is some specific reason not to (ie unrolling a loop for performance reasons, etc)
The only arguments against longer identifier names are non-issues:
1. "They're harder to type." If you're using a code editor without autocomplete in 2022, that's on you.
2. "If I use long identifier names I can't fit as much code on a line." Bit of a coding style smell if this is an issue, IMO.
Yes, SI base, for example second, not powers of tens of it, but we just call it SI. For linear positions no suffix means meters (x, y, z, etc.), same vx is in m/s etc.
Autocomplete is nice but in my IDE it requires hitting ctrl+space and then you might have a choice to make.
All the domain revolves around things moving around in the real world, there is no pixel.
Also at some point, to prevent unit problems, someone tried to add methods for units handling (set value and unit, get value for unit, or get value and unit separately), but it was a train wreck (people setting with fancy units, people getting the raw value assuming SI, and when unit was properly specified to the getter much time was spent in the object oriented and allocation addict units framework, and the code was bloated with units condiderations), so we got rid of that (which could surely be done better, but we thought it was just better without. We just have a few utility functions on the side for the rare cases where we need units conversions).
That says to me you should be using types with strongly-typed units built in (using templates/generics etc). I wouldn't expect to able to call "Set unit" on a quantity though, just a function to convert it and return the new differently typed quantity. Nor would you need "get unit" as it's known at compile time.
We don't want to be too abstract, to get a feel for what we are doing, especially for collections and their different and possibly touchy algorithmic complexities, but if it makes sense we can use fooColl to reference a set as a general collection, for parts of the code that work well with any collection.
Fun fact: Rust's return keyword was originally spelled "ret" (because, IIUC, Rust designer Graydon wanted all keywords to be <= 5 characters). Enough people complained that "ret" was too ugly and unnecessarily breaking precedent from other languages, so it was changed to "return".
I don't like "ret" because makes me think of "retch" :) but you make a good argument for using it for return values. Mozilla's coding style typically uses "rv".
I’ve done this a few times. Boy do some people get mad.
In one particular case we had two implementations of some complex code with security implications. It wasn’t working with a new feature, and the author gave up and put me on the case. One implementation took an array, the other a single instance, a couple blocks were in different order and some variable names were different.
So I diff them against each other, got them identical except an outer loop, then fixed his bug.
His lead found me the next day, “why did you put this duplicate code here?” I didn’t. I just revealed it. “Oh”
Three hours later, “I removed that duplicate code.” I did not temple my fingers and say “excellent” except inside my own head.
And that is how I learned to use nerd sniping for good, not evil.
Note the context. Joel is writing his examples in VBScript. VBScript was a dynamically typed offshot of Visual Basic, but did not originally give you the ability to define your own types.
In any modern language you would just use a HtmlString type to wrap html, and let the compiler take care of ensuring you don't mix up encoded and unencoded strings. But absent any kind of custom types (and no HtmlString in the standard library), you had to rely on something like Hungarian notation and very carefully eyeball to code.
Making wrong code look wrong is better than nothing, but making wrong code impossible is far better.
Yes exactly, any language with zero\low cost type definitions and natural use of custom types (they don't look second class in e.g operators) would automate what Joel calls Apps Hungarian (the original Hungarian notation). In fact, good custom types would make Apps Hungarian, the useful Hungarian notation, as useless as the Systems Hungarian.
This is a natural direction is programming language research, to make as much of program constraints and custom logic automatically checkable and enforced.
I definitely thought, "man this is out-of-date" when I looked at this article. I'm a big fan of analysis and linting tools to catch what the language cannot, as well. You can always disable that stuff with a magic comment when the circumstances call for it.
I only read TFA around the code examples, so please ignore if this point is addressed.
The problem I see with this strategy is that it relies on the programmer or the reviewer paying attention. If I learned something about writing programs in C, assembly or Forth is that is not enough. Code reviews are great but part of a "defense in depth" strategy, meaning that it has a good cost/benefit ratio at the cost of reliability (the ideal solution being formal proofs but not every shop can afford that).
Before relying on naming conventions, and when the language won't help you, there's another strategy to be tried: to not let the mistake happen. There's no general solution, though.
In this case I would mostly prohibit the use of "dangerous" functions, like this Request(). With C, sometimes teams make it so that dangerous functions such as strcpy will throw an error at compile-time. Same idea; there's most certainly tools for that. Request() would appear in a few places, so that it would trigger the attention of the reviewer.
So I would have e.g. a GetField() function that would do Encode(Request(param)) and that would be one of the few functions in the program where Request() is used.
Next, there are cases where you need both the "safe" (encoded) and "unsafe" (raw) version of the data. In this case, I would try my best to "jail" the unsafe version in a lexical scope, most certainly a function.
This way you don't let the "poison" (the unsafe string) circulate in your system, just like you perform parameter validations at the top of the function, rather that at each use; that would be the equivalent of relying on hungarian notation, in my eyes.
This post had a big influence on me back in the day. When I first read it, I think it was discussed here in the context of Go, which was relatively new -- people were complaining about its use of capitalization to indicate visibility, its lack of operator overloading, and the tedium of its error handling compared to exceptions. Joel makes a good case for all three of those design choices here.
I'd hope the article would present something that scaled better than users needing to remember what to do with u/s-prefixed variables. Response handlers needing to worry about unsafe/safe strings lowers their cohesion and hurts readability. Newtypes are an obvious solution, or a DSL that sits above the actual Response API would also be good.
Luckily, since this was written, most web frameworks use some sort of templating system which probably eliminates 99% of the ad-hoc HTML from response handlers.
On Hungarian notation:
> Somebody, somewhere, read Simonyi’s paper, where he used the word “type,” and thought he meant type, like class, like in a type system, like the type checking that the compiler does. He did not. He explained very carefully exactly what he meant by the word “type,” but it didn’t help. The damage was done.
This is the pop culture of programming: insightful advice stripped of nuance, elevated to Best Practice, and blindly repeated with the source material completely forgotten about. Said practice eventually rots away because it isn't as universal as once touted.
I've always liked this article and 17 years later it's still really relevant. It's just good advice.
It also gave me an appreciation of proper Hungarian notation. Because what Joel describes makes sense. It's kind of like embedding comments in the code. And it's type agnostic. If you want to switch from integers to floats, it can still be the difference between the the horizontal distance or dx.
> Simonyi mistakenly used the word type in his paper, and generations of programmers misunderstood what he meant.
Simony actually distinguish between two meanings of "type":
(1) A set of valid operations for a value
(2) The storage representation of a value
For example he suggest the "sz" prefix for a pointer to a zero-terminated string and "st" for a pointer to a length-prefixed string. Both are represented as the same storage type (a pointer), but obviously different operations are valid on them.
Hungarian notations is a way to represent types-1 in a language which only support types-2. It is obviously bad to mix up the two kinds of string-pointers, but since the type system cannot represent the distinction we have to use Hungarian notation and then very carefully review the code to make sure the operations conform to the type-1 type.
But the types which Simonyi encoded in the variable names is clearly what would be represented as types in modern languages.
Type checking with mypy is a good way to accomplish the "unsafe/safe" string case without having to pollute the variable names. It's a really good use of types in a language that will let you blast through them if you want.
There's something to be said for writing code that is "obviously correct". List/dict/set comprehensions are super underrated for this.
x = [str(y) for y in my_list if y > 5 and y not in other_list]
You can look at that and know it's correct. Versus:
x = []
for y in my_list:
if y not in other_list and y > 5:
x.append(y)
Easy enough to check in this case but it requires checking 4-5 different common mistakes (is x initialized, append vs. extend). Also I got so caught up writing the loop I forgot the point was to cast y to a str.
My main take-away is that I'm still convinced that exceptions are the best invention since the chocolate milkshake and don’t even want to hear any other opinions.
Yes. The universal complaint, among exception complainers, is "I don't know if an exception might happen here." Easy: yes, it might; code accordingly. Cleanup code goes in destructors, where it gets well exercised on every run, exception or no. If you need a type to have the destructor for, make the type.
Clean code in management's eyes is anything at all delivered COB whose customer-deleting latent bugs trigger six months after the author left to work elsewhere. But with 100% code coverage!
Almost all Joel postings are exercises in "Spot the Howler".
Joel's a very smart guy, but as for most very smart guys who know they're very smart, they are better at fooling themselves than anybody else could be. His justifications for fooling himself are always very, very good. It can be hard to identify the exact spot where his argument veers into the weeds. The road bends a bit, and he doesn't, and things get very bouncy and scrapy.
In Joel's case, it usually is right at the point where he mentions How We Did It at Microsoft. Anything he used to do when he was at Microsoft gets a free pass against re-examination to consider whether it was totally bonkers, and repeating it would leave you with, well, Microsoft code.
So when you read a Joel posting, pay attention. Usually, between lots of very reasonable stuff at the beginning, and a totally bonkers end, there is just one spot where he says something that reads a lot like the reasonable stuff before it, but fatally taints everything after. Sometimes there is more than one, but by that point it is too late anyway.
Others here have already identified it, for this posting. Actually, them, because there are more. But he has a lot of other postings. And lots of other very smart people have lots more postings, often with howlers too.
Instead of silly prefixes for "unsafe" strings, have a template language for generating HTML which has the safe interpolation built in.
make_html("<p>Hello, %1</p>", userName)
The % interpolating notation can support an operator for the rarer situation when you want to opt out of HTML escaping.
This is still textual: get the string wrong and you get wrong HTML. You really want structural templating. It could use HTML syntax, but as a HTML quasiquote that is parsed.
Not to give it away, but another example is when Joel says that, to know whether dosomething() might throw an exception, you have to read all down through its call tree, you know you are off in the weeds. Obviously it might throw. Even if it doesn't today, it might tomorrow. So you don't need to guess or explore whether cleanup(), there, is wrong. You already know it's wrong. Cleanup code goes in destructors. Period. Destructors always run.
Other people posting here have noted that using our language's type system intelligently is how we make trouble fail to compile. Counting on our and our colleagues' finely-tuned sense of rightness to keep out bad code is a recipe for having Bad Code. It suffices to explain Microsoft.
> I don’t like exceptions because they are, effectively, an invisible goto
This is supposes to invoke dread because everyone knows goto's are evil right? But if any jump to a different place in the code is a goto, then if/else/while are also goto's, and so are functions and returns.
It is the "invisible" bit that bothers Joel. But of course nothing is invisible: there is a perfectly visible function call operator "()" right there, that he should assume an exception may bubble out of, and act accordingly.
I guess you could call exception an "invisible return", since it is not obvious that "foo()" may exit the current function. But "invisible return" sounds rather less nefarious than "invisible goto".
I like Rusts "foo()?" syntax which explicitly indicates that the function call may exit with an error.
But exceptions do invoke dread in me. Sometimes a goto (or 11) will skip an array bounds check. - This comment is mostly a joke, look at our names. But I did see this happen at least once
I'm not sure you're giving that argument all what it deserves.
Control structures and function calls are a very statically-well-behaved control flow: unlike gotos, when you see a jump you know exactly where its destination is (runtime polymorphism for functions\methods is... an exception, and that's why some people don't like it.), unlike a goto where that might be a dynamic address. The Return is the only modern goto with a dynamic destination, but even then the destination is not completely arbitary (stack discipline), and actually the destination is irrelevant if the function is a well-named non-leaky abstraction, the function simply returns to its "caller", wherever it may be.
Furthermore, traditional control flow is Structured, it "owns" its jump labels. Nobody can jump to an else clause of a conditional except that conditional, nobody can jump to a loop start except the code immediately before it or the loop end (breaks and continues are, again, limited exceptions to this, and that's why some people don't like them), only the Returns of a function can jump back to the callsite of that function. As utterly trivial and basic as that sounds, it's a marvel. Gotos are nothing like this, they jump to public labels, anybody can jump to the label, it's a communism of control flow.
Exceptions break all of that. You don't statically know what's the destination of a Throw is, and unlike a Return, the destination is not irrelevant, the program is disrupted, you really need to know where that Throw is going to end up. Throws don't own their catches as well, any Throw can jump to the same Catch.
In a very real sense, Exceptions are more dangerous gotos than ifs and whiles.
Exceptions are like returns, they go back up the call stack. You cant statically determine where a return will end up either, since it depends on where the function was called.
> unlike a Return, the destination is not irrelevant, the program is disrupted, you really need to know where that Throw is going to end up.
I don't follow your reasoning here. A function should not know where it returns to and a throw should not know where the exceptions will be caught. That is the point of those constructs. If you know exactly at the point of throwing how you want the exception to be handled, then you wouldn't need to throw an exception in the first place.
Yes exactly, exceptions are much much nastier returns. Returns only go up exactly 1 frame, and the call/return pairs are balanced no matter what. Exceptions can unwind the entire stack, and the throw/catch pairs aren't balanced automatically. Returns of a single function own their call sites, nobody jumps to their call site except them, but a single catch accept every single throw from arbitrarily deep into the stack.
Exceptions are an entire call/return system hidden beneath the vanilla call/return system, with less thought in the design and more foot guns.
>I don't follow your reasoning here
Okay forget it, it was a somewhat subjective point, my main point is as above, exceptions are extremely spicy returns, they interact dangerously with the regular call/return system and scoping. That's essentially all of what objections to exceptions boil down to, and it's a legitimate reason to hate and avoid them.
Exceptions interact exactly deterministically with the call/return system. There is no scope for surprises.
In particular, destructors run, absolutely reliably. Rely on them. Using modern C++, you rarely need to code your own destructor; usually the language provides one that is guaranteed to be correct.
> the call/return pairs are balanced no matter what.
I don't get that. A function can have multiple returns and a return can return to multiple different places depending on where the function was called from. I don't see how they are "paired" or why that would even be a desirable property?
Ultimately all arguments against exceptions, as against any modern feature, come down to "don't make me think about how I code". The actual objections expressed are not meant to be examined closely.
It is funny that people using Rust would code almost identically if the language had exceptions. Its lack might end up what keeps Rust performance from catching up to and passing C++. The compiler might come to quietly use exception machinery for most call chains, as an optimization. That seems hard, but a 15% performance boost is worth a lot of work.
That's a very cliche and unnaunced point of view. If you like thinking about your code, you have assembly, I think x86 is a very nice flavor for you, will all its 40+ years galore of special cases. There are Turing Machines, for really testosterone-heavy programmers, there you don't even have registers, registers are for weaklings, am I right ? I think you will like them very much, just an FSM and an infinite tape, the sky is your limit, so much thinking.
If you want even more kickass bragging rights, you can program in a Lambda Calculus, where every single thing you try to do with the language will force you to think about it, more thinking ! yaaay ?.
Generally speaking, yes, the entirety of good design (in general, in all walks of life) and modern PL research comes down to "don't make me think about how I code", because good programmers don't think about the code, they think about the problem the code is trying to solve. As per Alfred Whitehead (a mathematician, so very much a fan of thinking) : "Civilization advances by extending the number of important operations which we can perform without thinking about them.", so is programming language research. It advances by allowing you to forget as much as possible that you're even programming a computer, a good language fades into the background, ideally you're not even programming a good language, you're simply stating the problem to be solved.
This is hampered by leaky and unreliable abstractions like Exceptions. They are a step backward in Error Handling, no static gaurantees, no static type checking, aweful synchronization between use sites.
I understand how exceptions work, apparently much better than you, if your other comment is anything to judge by.
If you're insecure enough that my different opinions about what good code is and how it should be written leads you to claim that I don't understand things without evidence, remove yourself from the conversation right now. It's much better for you.
Call/Return Pairs are balanced at runtime, not lexically. From all possible returns in a function, only a single one runs, and terminates the whole function. Every call site is paired, at runtime, with a single return that will return control to it, and the entire set of candidate returns are known at compile time, and the compiler checks all of this to report dumb mistakes.
This is not how Exceptions work, you can throw more than you can catch, and you can catch more than what can ever be thrown (this is generally harmless, just garbage code that never runs). Neither the number nor the types of throw/catch sites are kept in sync statically.
>a return can return to multiple different places depending on where the function was called from
Every single call site of a function is paired with all the returns of this function. The returns of a different function are paired with the call sites of that function, not the first. Unlike Exceptions, where every catch is a jump destination for every single throw of the same (dynamic) type across every single function beneath the catch on the call stack. That's literally an exponentially larger set than set of all the returns of a single function.
>why that would even be a desirable property
It's desirable because it gives static gaurantees.
Given a throw, you can't even know how much of the stack it would unwind, it can potentially unwind the entire stack if no compatible catch is above it on the call stack*. Exceptions can break through their abstraction boundaries and unwind the stack of a calling component that doesn't know about them. Exceptions are fundamentally dynamically typed Returns that you can forget about. That's as big a footgun as you can get without being deliberately a troll.
Given a catch, you can't even know where is all the places that can go to that catch (unlike a call site). Oh, there is the trivial answer of course : Every single function called in the try{...}, and every single function that those functions call, etc etc etc. Again, an intractable thing to reason about. So people don't, but they have to, the throw/catch pair is a single logical feature, their uses must be reasoned about and kept in sync in assumptions and consequences.
None of this is how a return works. If Structured Programming advocates saw Exceptions, they would be horrified. Even C's gotos don't allow you to jump out of a function.
* : (which, of course, is a dynamic property: in "if(...) then {Do_Something() catch(){...}} else{...}", only one branch will actually catch, one branch can crash the program, and none of this is visible to the compiler at write time)
> Given a throw, you can't even know how much of the stack it would unwind
No, that is the entire point of exceptions. They unwind the stack up until the level of a matching catch block, if any. Why would you want to know how many stack levels a throw will unwind? When you call a function you don't know how many functions it will call in turn, and when you return a value you don't know if the callee will return it another level up. You shouldn't want to, since this would create and undesired tight coupling which would make the program hard to modify. It's a feature, not a bug.
> If Structured Programming advocates saw Exceptions, they would be horrified.
I'm sure they would, just as they would be horrified by having multiple returns in the same function - never mind horrors like "break" and "continue". The dogma was "single entry single exit". Exceptions does adhere to the "only lexical blocks and function calls" paradigm, but throwing is a form of multiple return, so it breaks with the dogmatic structured paradigm. But I prefer simple and maintainable code to blind dogma.
> Even C's gotos don't allow you to jump out of a function.
Goto actually has its uses in C, for example for resource cleanup. I think "goto harmful" have become a mantra where people use the words but forget the underlying reasoning. Dijkstras problem with goto was not that they jumped to a different place in the progam, his problems was that they didn't adhere to lexical structures and call stack. Exceptions actually allow resource cleanup in a structured way (using "finally" blocks).
Which is why they're bad design. The comparison to function calls is misleading :
- Function calls are named, given a function name you can always go to a function definition and see for yourself how many functions it calls in all return paths. You don't have to, but you can, and this is crucial in many many situations, not the least of which is debugging (e.g "oh no! program ran into infinite loop, What was the last function called? I will go and see its definition")
- Function calls can be statically type-checked, given a function that returns X type, the compiler will ensure that every reference that holds its return result is X or one of its subtypes.
But in exceptions :
- Catches (the equivalent of calls) are unnamed, the set of places a catch can catch from is exponential in the number of function calls in its try{...} block. For an example, if you want to debug given a catch, you must recursively inspect every single function call in the try{...} block to know which function threw the exception. And no, the 50-line stack traces don't make things any better, if any thing they just rub in the fact of how many functions you need to inspect to know where the problem came from.
- Throws are not statically type-checked, given a function that throws E1, the compiler will not bother to ensure that it is called in contexts where E1 is caught, you're completely on your own. It is this which makes Exceptions free program-crashing coupons. Only Java tried to solve this as far as I know, and the solution is still full of holes.
>I'm sure they would, just as they would be horrified by having multiple returns in the same function - never mind horrors like "break" and "continue".
You say this tongue-in-cheek, but as a matter of fact plenty of imperative languages ban "break" and "continue" for exactly this reasons, they are a huge source of loop bugs, and many style guide have problems with multiple returns and regulate them heavily if not banning them outright.
Regardless, the comparison is disingenuous, I just spent ~1000 words or so arguing why Exceptions are so utterly and fundamentally different from ordinary control flow. No return statement unwinds the stack arbitrarily, possibly to its very end, and with absolutely 0 static gaurantees.
If you think this is okay, we won't reach any useful consensus, and that's okay. But your original claim that Joel engages in faulty reasoning when he rejects exceptions as a form of goto is wrong, there is very good reasoning why exceptions are actually a much more dangerous goto than most modern goto, and plenty of much smarter people than you or me (or Joel) follow and accept this reasoning. Those people include those responsible for some of the most safety- and\or performance- critical software, like games and operating systems.
>Dijkstras problem with goto was not that they jumped to a different place in the progam,
I read Dijkstra's paper, and I'm pretty sure when I say his problem was in fact exactly that. Unrestricted gotos make control flow an arbitrarily-complex graph, the fundamental reason for this is that the jump labels are public, and exceptions share that in a way that nothing else does, their catches are public to every single throw beneath them in the call stack. Throws are gotos that search for their labels at runtime, possibly failing and crashing the program or unwinding it beyond recognition.
There are certainly valid criticism of exceptions. But the original claim was that exceptions was like goto's, and that is just not true. A goto is a jump to a statically defined location (label or line number) in the code independent of any lexical structure or call stack. Exceptions are just not like that at all.
If it is any consolation, you have to be pretty intelligent to tie yourself into such knots of falsehood. Getting yourself out of such knots takes more than other people need. Take a hint.
It is not often I encounter so many cases of falsehood and wholesale confusion in one posting.
First one call matches exactly one return. Then the same call matches a bunch of returns. Which is it?
Then, a catch catches innumerable throws? When I throw an exception, it is only ever caught exactly once. One throw, one catch. Unwinding an unknown number of call stack frames, on the way, is the whole point of exceptions. You don't get to count that as a flaw.
Exceptions are not, in fact, dynamically typed. Each one thrown has exactly one type, its whole life long. All throws from that place will have exactly that type, and none other. Any catch block that doesn't lexically match that type is skipped over.
Look up "exponentially". It does not mean what you seem to imagine -- if in fact you can be said to mean anything at all.
Spilling out your entire mess of incomprehension does not only make you look silly, it is also rude.
When I'm done explaining how it's you, actually, who is the one who doesn't understand basic programming language theory and terms, and how you keep embarrassing yourself with hilariously false objections, you will probably need to apologize to me for this embarrassing and childish temper tantrum of yours.
>First one call matches exactly one return. Then the same call matches a bunch of returns. Which is it?
There is usually a very basic distinction in Programming between 2 things : the runtime and the compile time. At compile time, the expression 5/0 is a perfectly valid integer expression, it returns an integer that can be used in all subsequent calculations. At runtime, it's a hardware trap, and the entire program will probably abort because of it. Integer or hardware trap? Which one is it? Both.
At compile time, a call site has a finite (and obvious) set of returns it's associated with, usually a relatively small set. At runtime, however, a single return out of this set ends up returning control to the call site. So a call site is associated at compile time with a small set of very obvious returns, and at runtime with a single return drawn from this set. Anyone who doesn't understand this probably hasn't programmed enough time to have a worthwhile opinion.
>Then, a catch catches innumerable throws?
Yes indeed. Every single type-compatible exception that can possibly be thrown on the call stack beneath a catch is catchable by it, this is, contrary to what you imagine, an exponential set, and I do really mean exponential, yes:
If 2 function are called in a try{} block, and each of those 2 function then calls 2 function, and each of those last 2 function then calls another 2 function, then the set of possible throws catchable by your original catch is all throws in the 8 functions. Catch candidate sets are exponential in the number of function calls in the try{} block they catch. Unlike an ordinary call site that can resume control from a very small set of known locations, a catch can resume control from an huge and intractable set of jumps.
>it is only ever caught exactly once
Actually, it's at most once.
>You don't get to count that as a flaw
I do and I did. Exceptions are not some family members of yours to be this defensive and upset because I criticized them. It's okay, people can hate things that you (unhealthily) love, learn to deal with it. It gets better.
>Exceptions are not, in fact, dynamically typed.
This was a subtle point in my comment, so of course you didn't understand it and went on to angry ranting as usual. Here's a more detailed explanation of what I meant by that:
When you call a function foo() and assign the result it returns to an int, the compiler of any decent language will ensure that every single return in foo returns an int. Alternatively and equivalently, if you declare foo() as an int, every single variable that holds the result of foo is an int, else compiler complains.
This doesn't happen in exceptions.
In exceptions, you can throw a FileNotFoundException, in a function that only catches NetworkTimeoutException. The compiler won't complain. You can later call that function in a function that only catches DictKeyNotFoundException, and the compiler won't complain about the lack of catch for FileNotFoundException. You can keep doing this till the very end, forever in fact, always putting incompatible catches around the function that throws, and the compiler will never once complain.
It's in this sense that throwing exceptions is like a dynamically typed returns, it throws an object, i.e returns data, but the compiler never bothers to check for a matching reception of this object at the calling code, like it does for ordinary returns.
Java tried to solve this with its Checked Exceptions, but it also introduced Unchecked Exceptions that anybody can use and thus greatly reduced the benefit and use of Checked Exceptions.
>Look up "exponentially"
Try thinking about something for 5 minutes before you hit the keyboard. It will payoff greatly.
>Any catch block that doesn't lexically match that type is skipped over.
Look up "lexically". It does not mean what you seem to imagine.
A "lexical" thing, in programming language theory, is that which can be inferred from program text alone without running it. Catches don't work lexically because they respect subtyping (i.e inheritance). And with subtyping comes runtime polymorphism.
Specifically, if
- Cat and Dog are both subtypes of Animal, an abstract class\interface, and
- I'm catching a Cat,
Then if I throw an Animal exception, the question "Will my catch, catch that exception?" can't be resolved lexically, or from program text alone without running it. The catch will catch if the object pointed to by the Animal reference I threw is a Cat, it won't catch if the object pointed to by the Animal reference I threw is a Dog. Seriously, go run it and tell me if I'm wrong.
Again, that is allowed. But lecturing in such a condition is foolish. Worse, it wastes the time of any readers who might look for sense in what you posted, which is rude.
"If Structured Programming advocates saw Exceptions being used for regular logical flow, they would be horrified".
Agreed. Unfortunately Java and in some cases C# tend to encourage this because of the behaviour of library functions, e.g. those that parse numbers and/or dates (a string not being in a parseable format rarely justifies being considered an exception, at least at a library level).
On the other hand as an advocate of structured programming I find code that's full of ifs and multiple return statements and global error state variables to handle truly exceptional conditions where code is unable to carry out its primary function to be horrifying.
There is no distinction between "regular" and "error" control flow when it comes to reasoning and understanding, they are both control flow, they both need to be planned and synchronized in the programmer's head.
The arguments I give is for why Throw/Catch is a terrible control flow construct, a throw is an extremely dynamic goto that searches the call stack for its label, not guaranteed to be there by the way, and a catch is an extremely dynamic call that can resume control from any point underneath in the call stack. I don't know about you, but sounds to me like a disaster waiting to happen. Java tried to tame it by trying to incorporate exceptions into the type system, but it's half-baked and also comes with a big glaring hole called Unchecked Exceptions.
Errors or no errors, a bad control flow construct is a bad control flow construct. Error control flow deserve the same treatment as regular one.
>I find code that's full of ifs and multiple return statements and global error state variables to handle truly exceptional conditions
Two wrongs don't make a right. Ad-hoc error handling is awful, but so is Ad-hoc exception handling as well. There is plenty of error handling mechanism (either built-in to languages or as an architecture over boring code) that is not both.
Code that I've worked with over the last 25 years not using exceptions almost invariably has worse error handling than that without. YMMV.
(As far as regular vs exceptional flow goes, think about what sort of acceptance criteria are written into story requirements. I've never seen the expected behaviour for out of memory or even critical network failures written up as part of a user story. In many cases it's decided for you by the libraries used anyway)
You manifestly don't care what the destination of an exception is. Aggressively so.
The only choice to make is: here, or not here. If here is at a high enough level to do something intelligent about it, catch it. If not, it is not your problem: there is nothing to be done, so you do that.
Generally, a good program will have very few places where exceptions are caught, and the code there is easily exercised and well tested. The destructors that run on the way there are also well exercised because they run with or without exceptions. Beware code that only ever runs in exceptional cases. It grows bugs when what was correct becomes wrong as the code changes around it.
For some of these reasons, I prefer to write this:
if (condition){
throw
} else {
<rest of code>
}
Rather than:
if (condition) {
throw
}
<rest of code>
The justification, which I found in this presentation (https://youtu.be/SFv8Wm2HdNM) is that if you only looked at the block structures, could you tell that the rest of the code might never run?
The counter-argument is that you can get arbitrarily deeply nested code that way. The second style uses what is known as guard clauses. The way to think about it is that it establishes a precondition that the rest of the code can then assume is met. It is similar to an assert statement, or to precondition checks in a design-by-contract language. You don’t want to indent the regular-case code just because it is preceded by a guard. If you have multiple such conditions, it’s like going through a checklist of what you have to check before continuing with the actual operation.
Moreover, if you have exceptions, any line of code containing a function call can potentially throw, so that the subsequent lines at the same level are not executed. (So you can’t see from the block structure if code is always executed anyway, when the language uses exceptions.) This is as if you’d factored out the guard clause into a separate function that you invoke at the location where you previously had the guard clause. In the if-then-else alternative, you can’t factor out the if-then separately from the else. The guard-clause style thus also affords more flexibility regarding code organization.
I came across the awkwardness this introduces to guard clauses a while back, and found that it could act as a smell. Instead of drip-feeding the validation, do it all at once if you can.
I do see what you mean with your second paragraph, though. It's a good argument against.
Doing it all at once suggests factoring out all checks into a separate function, in which case it also makes sense to throw from that function, in order to give good exception messages depending on the specific error, and possibly including associated data in the exception. Which means that at the call site you won’t have any indented block structure differentiating the validation call from the rest of the function.
When programming with exceptions, it is the common case that a function consists of a sequence of steps each of which can fail and thus abort the function via an exception. Simple validation steps are not a special case in that respect.
Of course the same argument applies to return statements. And breaks.
Some of us use a convention of a blank line after a block that ends with return, throw, break, or continue, to save on indentation.
Of course that can help only if you are not splattering blank lines everywhere, communicating only that you think the reader doesn't deserve to see much of your code at once.
You should not ever be crossing the streams. So make it a compile-time failure. And anywhere you truly do know better, you can just use a transformation func like `safe = iKnowWhatImDoingCompilerSTFU(unsafe)`
And keep in mind that "escaping" is not a binary. Something that works in an HTML node's body may not work in an attribute or a request header or a URL. This is the primary reason that most things suggest escaping at output time: you cannot safely escape something without knowing where it will be used.