Hacker Newsnew | past | comments | ask | show | jobs | submit | nicops's commentslogin

That's all fine and dandy, but conceptually it's wrong to say the average of an array is 0, and can and will lead to wrong results in a variety of cases. I'm sure you can think of a lot of these cases yourself. I think in the history of computer science we programmers have found that there are a lot of convenience shortcuts that make sense in a lot of cases but bite our asses in other. Implicit is fast and fun, but it's nice to have your seatbelt on when the car crashes. Going back to the average case, if you want an average function that returns 0 on empty arrays, fine. But that's not the average function, and you shouldn't call it that way, and names matter, you should call it averageOrZero or something like that.


Why is it conceptually wrong to say the average of an empty array is zero? My undergrad degree is in pure math and my grad degree is in mathematical statistics and I’ve never heard an idea like saying the mean of an empty array is zero is “conceptually wrong.”

You bring up the history of CS, but even there you have debates about what convention to use for defining 1/0 for function totality and theorem provers.

There’s no aspect of pure math derivation of number systems on up through vector spaces that definitively makes a zero mean for an empty array ill-defined. Whatever choice you make, positive infinity, undefined, 0, or any finite values, etc., any such choice is purely down to convention that depends precisely on your use case.


> Why is it conceptually wrong to say the average of an empty array is zero?

It’s not conceptually wrong, it just means the “mean” you’re referring to calculates a different value than the “mean” we’re taught in school. So, underlying assumptions about the differences in “mean” should be communicated where it’s used.


Sure, I agree they should be communicated. Like, in the docs for “standard” mean functions, and not pushed into “specialized” mean functions, since needing this particular convention is not remotely special, and is rudimentary and expected in 99% of linear algebra and data analytics work, which are the largest drivers of these types of statistical functions.


And it's not like any other language you can choose from for working in the browser is perfect.


The irrationality studied by Dan Ariely et al it's not comparable _at all_ with the freudian irrationality. The former brings to the forefront our fauly heuristics and biases. The later is about how culture and our psyche actively makes us ignorant of parts of ourselves that are inconvenient, and that can come out in bursts of irrational, unstable, "crazy" behavior.


You're right that they are very different takes on irrationality. Whether they are a comparable "at all" is obviously just a matter of context. If I'm interested in research on loss aversion, looking at Freud and Ariely together would be silly. If I'm interested in the history of ideas (which my point was addressing) then looking at them alongside each other makes perfect sense.

You're saying "apples vs oranges" and I'm saying, "sure, but I'm talking about fruits."


But maybe bringing fruits up it's irrelevant in some context, depending on what we're talking about. This is one of these contexts. The very same phrase you quoted from the article clarifies it's not talking about "perfect" rationality.

In the history of psychology/sociology, Freud and Ariely don't make sense with each other.


Agree with the attention to details. Details are crucial.

But in my experience there is a kind of so-called "perfectionists" that won't work well in complex enviroments where:

1. they actually have to weight in the importance of details (and the cost of dealing with them)

2. Details aren't always obvious, and are easy to miss

So these kind of perfectionists might end up very focused on obvious details that don't matter that much and aren't useful to be taken care of in terms of cost-benefit relation, while ignoring not-so-obvious details that might actually matter more.


> If you need to a 'chill' with the original researcher to get critical details, that's not science anymore.

Certainly there is always the possibility that there was some details misunderstood, something that needs to be clarified, a print error, etc. Your "that's not science anymore" statement seems highly exaggerated. People are not supposed to communicate only via papers.


Sure, it's inevitable that, in some cases, eventually someone will have to go back to check notes because some factor mattered when no one thought to write it down.

But there's a difference between that, vs "we expect that someone will have to make personal contact with the original researchers in order to replicate it".

If you're explaining away replication failures by such non-contact (as the quote is), that's confirmation of a problem (in keeping with the standards of science), not a vindication of the results.

There's an additional danger of making it so that "you can't replicate until you have social contact with the original researchers". That way lies favoritism: it's harder to criticize someone as you get closer to them socially, and they can withhold the capability of criticism by not engaging the critics.


Here's an example from my field, which I think is informative because it gives a concrete idea of how this plays out in a public arena.

tl;dr for this wall of text: 1) Authors A describes algorithm, 2) Author B publishes counter-example to show where #1 fails, 3) Authors A say it wasn't wrong, but that the author of #2 'misunderstood', and B should have contacted them first; and in any case here are the missing details, 4) Author B points out that paper #1 should have said those details were missing. 5) Authors C point out that authors A misunderstood many things in their own publications; authors A can't complain about others not contacting them first when they don't do it themselves.

"Canonical Numbering and Constitutional Symmetry" (1977), DOI: 10.1021/ci60010a014 describes an algorithm.

"Erroneous Claims Concerning the Perception of Topological Symmetry" (1978), DOI: 10.1021/ci60014a015 points out examples where the algorithm from the first paper, and from another paper, don't work.

The authors of the first paper followup with "On the Misinterpretation of Our Algorithm for the Perception of Constitutional Symmetry" (1979), DOI: 10.1021/ci60017a012 .

> A recent paper in this journal contained critical comments on two methods for the perception of topological symmetry. Carhart’s claim that our algorithm does not correctly perceive topological symmetry and fails with certain structures is the result of a misinterpretation of our algorithm.

> Unfortunately, the author did not contact us directly to help him clarify his misunderstanding. This failure is unusual and difficult to understand. Thus, it was not until we received the recent issue of this journal that we learned of this misinterpretation.

> In our paper we were particularly aiming at catching the interest of the organic chemist for the problems of uniquely numbering the atoms of a molecule. Therefore, we put particular emphasis on the criteria for determining priorities among atoms to enable the chemist to manually number the atoms of molecules according to our procedure. We restrained from giving all small details of the algorithm to keep the paper concise, working under the assumption that persons interested in the details would contact us directly. It is astonishing that Carhart at the point where we did not fully elaborate on the details works with the premise that we misconceived the problem. Initially one should rather assume that other people, too, understand a problem. Only if explicit errors are found should one digress from this conviction.

Carhart followed up with a letter to the editor, "Perception of Topological Symmetry" (1979) DOI: 10.1021/ci60017a600 :

> I am delighted to see that my critique’ appearing in this Journal has encouraged C. Jochum and J. Gasteiger to present previously unreported steps in their algorithm for the canonical numbering of chemical graphs. They refer to these steps as “small details”, but in fact they are the very essence of any routine which reliably finds unique numberings for, ...

> However, I did not misunderstand their previous article (unless lack of clairvoyance can be classed as misunderstanding); I simply took it at face value. My critical comments, and the counterexamples I presented, were completely appropriate in the context of that article. In contrast with their latest offering, Jochum and Gasteiger’s previous paper did not present a sound and accurate definition of constitutional symmetry, nor did it indicate in any way that crucial steps had been omitted. I am sympathetic with the problems of describing a complex algorithm in the limited space of a journal article, but if space limits the development of a fundamental concept, it is the responsibility of the author to say so, and to indicate that a reader must obtain additional information before he tries to implement the described procedure.

It ended with a letter from still other people writing another letter to the editor, "Canonical Numbering" (1979), DOI: 10.1021/ci60019a600 :

> We have been following with some interest the controversy appearing in this Journal regarding canonical numbering and various types of The first article by Jochum and Gasteiger contains a number of incorrect and misleading statements about both their work and the work of those who preceded them. ...

> Jochum and Gasteiger also strongly implied that they had a “simple” algorithm which gave complete partitioning, eliminating the need for a comparison step. Carhart correctly pointed out that this was not the case. Subsequent publication of the details of Jochum and Gasteiger’s indicated that it does contain a comparison step ...

> On a more general level Jochum and Gasteiger complain that Carhart did not contact them “directly to help him clarify his misunderstanding”. Yet it is obvious from the large number of misinterpretations and/or misrepresentations which appear in their work that they made no attempt to clarify their misunderstandings by discussing such matters with the original authors. Publishing last on a particular subject accords one considerable power, power that carries with it the responsibility to treat the preceding work with fairness and objectivity.


>People are not supposed to communicate only via papers.

A paper and its supplementary materials are supposed to be enough to reproduce the experiment. In practice, this often fails, but that is a fault in the scientific process. Science isn't just about empirical knowledge, it's about public and redundant empirical knowledge, as opposed to losing important knowledge of the natural world when the original investigator gets hit by a bus.


Wouldn't those problems in the scientific process get corrected more easily if you contacted the original author to see if there are any details that were missed and then publish those details with your results instead of just publishing a paper that says "Nope, couldn't reproduce"?


No, you publish the results that you cannot reproduce.

Then maybe the next generation of researchers documents their work better.

Or maybe the original researcher publishes a v2 edition of their paper.


> People are not supposed to communicate only via papers.

That we use written communication that persists through generations is the basis of science and society in general. If we cannot communicate sufficiently via papers, we're in a world of trouble.


I used to hold this opinion, but my experience with academic research changed my mind. Much of the scientific knowledge we have is passed from generation to generation by mentoring. The amount of knowledge is so vast, and our means of searching the written literature for relevant facts so poor, that when I want to learn something or solve a specific problem there is no substitute for a discussion with an expert in the field.

The core problem is that human communication is very difficult. It becomes even more difficult when we try to communicate ideas without interaction, as we do when writing a book and expect someone to read and understand it. If I read a paper and I can't understand a sentence, it might take me days to figure out what's going on by myself, whereas asking an expert might yield an answer in less than an hour (sometimes minutes). The difference is really orders of magnitude.

There are whole fields that have effectively died because no one works on them any more. That knowledge doesn't live in anyone's mind. All the literature is there, but actually acquiring that knowledge by reading the literature is incredibly challenging and time consuming.

I have come to believe that the main purpose of hiring scientists in academia is to keep knowledge alive and have it passed on to future generations. Advancing research is of secondary importance. In fact I would say that most new research I see probably has no intrinsic value. I include my own research in this category. We have researchers solving esoteric problems of no value to anyone besides their own personal entertainment. Except, working on such research keeps our neurons firing and keeps knowledge alive. It is a well known phenomenon that taking a break from research very quickly leads to a sort of decay of memory. Our learned ideas and the connections between them wither away without constant reinforcement. In order to keep knowledge alive we have to engage in research, even if it seems pointless.


>I have come to believe that the main purpose of hiring scientists in academia is to keep knowledge alive and have it passed on to future generations.

Then these scientists should be devoted to producing textbooks and courses which can then be taught to non-research students. Yes, all knowledge about the scale of what a single individual knows (and keeps on their shelves, hard drives, etc) is embodied as communities and traditions, but we still get far greater redundancy of that knowledge from teaching it as undergraduate or master's-level coursework than from passing it down only via research mentoring.

If 25% of the population gets an undergraduate degree, 11% or so gets a postgraduate degree, and only about 1.7% get a PhD, then we need to be embodying society's knowledge among the larger cohorts for that knowledge to survive. We can't afford to live in a world where only 1.7% know how things work.


> Then these scientists should be devoted to producing textbooks and courses ...

Textbooks and courses exist for everything but the most cutting edge stuff (which are still in flux anyway), but they are a very inefficient way of transferring knowledge. I would say they are practically useless without expert guidance. At the most basic level, there are so many of them that an expert has to tell you which ones are both good and relevant to what you want to learn. I've once seen a student waste months of his life studying a book he thought was relevant, only to discover that book wasn't building towards the sort of knowledge he needed in that subject. The book was about the correct subject, but was focused on somewhat different aspects than the ones he was interested in. There was no way for him to know this in advance without guidance.

So we don't know how to organize existing books. Also, even the books that exist are usually pretty bad at conveying knowledge. Or perhaps humans are just pretty bad at learning things from books. Either way, no one knows how to write textbooks and courses that are much better than what we have today. I really don't know of a better way to preserve knowledge than the current one. Perhaps technology can improve the situation by making access to knowledge more interactive. But I suspect this would require a real breakthrough.

> We can't afford to live in a world where only 1.7% know how things work.

Why not?


I have a concrete counterexample. Let's say I write a paper presenting a model, plus some numerical results of large simulations. The code is based on gluing together various pieces of open source code. All these codes are typical scientist codes that are held together with duct tape. My paper is short, but I spent a lot of effort munging things together, and I'm fairly certain nobody can reproduce my results without my source code (preferably the whole environment) unless they spend a lot of time on trial and error like I did.

The tweaks I did to glue things together has no theoretical value and don't belong in the paper. As a practical matter, I can't fit a lot of source code into short paper format.

What do?


Open source the code and supporting data.


It's not that simple. What if some of it is proprietary? What if I'm not allowed to submit code because I need to be anonymous so reviewers can maintain impartiality? What happens when one of the upstreams update and breaks my code? Do I need to keep it updated? Forever?


At my institute at least, scientists are required to maintain everything that is necessary to reproduce a result for at least ten years. That includes all the data and the software used to produce the results. It's not an easy job, but it's important.


If your institute also mandates they make the data/software publicly available, then that's definitely the exception rather than rule. Also must be hideously expensive.

It almost never happens that a paper I read actually comes with usable source code.


Then your results are not reproducible and your conclusion is suspect.


A lot of thought has gone into such questions. For example, see the guidelines at https://www.epsrc.ac.uk/about/standards/researchdata/


> Let's say I write a paper presenting a model, plus some numerical results of large simulations. The code is based on gluing together various pieces of open source code. All these codes are typical scientist codes that are held together with duct tape. My paper is short, but I spent a lot of effort munging things together, and I'm fairly certain nobody can reproduce my results without my source code (preferably the whole environment) unless they spend a lot of time on trial and error like I did.

Then leave out the results since they are just an anecdote. If you want to include experimental results then it has to be done in a scientific fashion.


The usefulness of a paper that doesn't stand on its own is rather limited, though.


All papers should start with a dictionary? No clearly not, so there's always going to be some assumed knowledge - words change their meaning and have different meaning to different people so we're already on to a loser just with the medium we're using.

So, the possibility of things like, say, a researcher not mentioning something that is standard practice in their lab that later is found to be a crucial part of the setup for an experiment seems high. But just like you don't want to provide a dictionary of standard terms with a paper you don't want to provide a list of the chemicals used to mop the floor, or a list of the lumen and colour temperature ratings of lights in the fume cupboards, or ...

IMO if a paper is not reproducible then yes it should be published but also the original team producing the paper should be challenged to reproduce the results. It's not a fight, we're all on the same team - work with them and try to find the reason for the lack of reproducibility.


> So, the possibility of things like, say, a researcher not mentioning something that is standard practice in their lab

I'd suggest a different formulation: "standard practice in their field"

Standard practice in general cooking? That's ok. Standard practice in my kitchen? That's a problem.

The research is IMO like a meal recipe a knowledgable chef should be able to reproduce.

Though it is understandable why one would forget to mention something. Especially if they thought it was general practice to do something their way.


Maybe a paper does stand on its own, with the large list of citations at the end. But, maybe some of those citations are journals that your institution doesn't subscribe to, or are historical and in another language, or et cetera.

There is a page limit to publications in high impact journals, and generally it's not great practice to utilize the limited space on the details of hurdles overcome.

I would argue that some of the most important papers in science don't really stand on their own... they need context and expertise that the paper can't and shouldn't cover.


Agreed. I guess everything is subjective, but when I read the article state:

> While the [ctrl-r] method is more powerful, when doing some redundant task, it’s much easier to remember !22 than it is to muck with ctrl-r type searches or even the arrow keys.

how?!


Let me tell you a story. When I do frontend nowadays, I mostly develop with angular js. People like to complain about angular and having to know so many libraries, and frameworks, when the past was simple.

But what is simple, really? Let's look at some past experiences:

1. At a very first gig, when I was doing my very little first steps in this software thing, I had to build a website prototype. Pure static html pages inside a folder, that were linked to each other, didn't need to deploy, it was just a demo. It didn't use CSS. It didn't use a template engine. Really simple. But, every change was a PITA, because the website had a header, so every change, meant you had to open the file, and rewrite the whole header... in all files, manually!

2. In another moment of my past, I had to do some changes in an app was built with JQuery. JQuery is a library, but it's pretty simple. Select dom elements, and apply changes and hook events up. Simple.

Only, following and understanding the flow of the app was hard as hell. Each page had a lot of code and it wasn't obvious at all what was happening. Complex transformations of the dom were daunting. Small changes would break in intractable ways.

3. PL/SQL Stored procedures + tables. Pure SQL + structured programming is simpler than objects, and way simpler than using ORMS. Of course we had no tests, so less stuff to look at. I'm not even going to describe the downsides because you all people surely know that.

My point is, let's always look at the other side of things, and really weight at all the tradeoffs. I get what you mean but when I read a line like:

> I want to get back to focusing on building logical models that fit the domain, solving problems, and simulating things.

I feel that it's really a little _naive_. I mean, of course you focus on that kind of things. You need to work on backend, solving some kind of problems. But in some other problems, like for example, visual interfaces, you better be reusing shit.


1. been there, done that. 2. been there, done that. 3. Considered that in a case where multiple systems were hitting the same database but doing inconsistent things - putting the logic in the database would at least make it consistent, but went with REST API instead to solve the problems.

I'm not entirely against code reuse, sure, reuse bits like visual interface if they fit, but the idea that it can solve everything by just sticking existing bits together is a false lead. It takes much longer than expected and it makes an unmaintainable mess.


Curious. What do you suggest?


I think that those are valid names in a good context. e.g. if you have a function like "substring", then you can name "str" as the parameter and "result" the variable where you accumulate the result. Any other more semantic name for "result" would be "substring", i.e. the name of the function itself.


Probably because the author doesn't use them and thus has no idea of what their problems are, so they seem "perfect" from a distance.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: