Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] The problem is not plagiarism, but cargo cult science (unfashionable.blog)
78 points by xyeeee on Jan 18, 2024 | hide | past | favorite | 54 comments



I estimate, conservatively, that about 90% of papers published outside the STEM fields can be categorized as Cargo Cult Science.

I like the idea that he calls this a "conservative" estimate. That sounds much better than "factoid I pulled out of my butt". An unfathomably vast number of papers are published in every mainstream field of academia, STEM and not, every year. The reason we only have "conservative" "estimates" is that nobody can possibly read all of them.

Pick a field and take it down if you can. But trying to build a case against all of social science using the example of someone notoriously elevated despite a lackluster career as a publishing academic doesn't get you anywhere!


DARPA is doing a relatively large scale effort to see if social sciences papers replicate, and how easy it is to look at a paper and guess if it will replicate

the preliminary estimation phase predicts like 50% rates of replication

let's be real, the average social science paper is not good

edit: there are lots of garbage papers in every field, the culture of academia is deeply flawed, the low standards are just even lower in some fields


The problem is that replication studies aren't done hard enough.

And by replication studies I mean "take this paper, try to replicate, write a paper on it".

Every singular paper can be wrong, that's why you're supposed to replicate them - that's at least what was taught in psychology, which many consider "social science of less rigour" - well, it was probably the course most hardcore about experimental design in my education and ability to replicate a study...


>The problem is that replication studies aren't done hard enough.

>And by replication studies I mean "take this paper, try to replicate, write a paper on it".

You don't necessarily need to perform a straight-up "replication study" in order to do some useful sanity checking on published results. It can be done as part of another novel study that builds upon prior work. That's basically what happened with the fraudulent UCLA political persuasion study several years ago [1]. Some Berkeley researchers wanted to extend the ideas in that earlier published paper, reused similar methods under the assumption that the UCLA folks were correct, and couldn't get anything reasonable within the ballpark of what was published.

[1] https://fivethirtyeight.com/features/how-two-grad-students-u...


My experience of "representative" studies in psychology (university of Berne) was that they'd come up with something, then gather a group of "volunteer" psychology students - 80% women - who need to participate at x studies as requirement to pass, publish it as representative serious research, and call it a day.


Are they not done or are they done and nobody wants to publish them?


Nobody CAN publish them. An actual criterion for most top journals is literally "novelty". Reviewers will immediately levee "this has been done before" as a damning critique in peer review. There is no replication because you can't currently share replications, let alone sustain a career on them.


Yes there’s a serious issue, but thrust of the previous comment was that it’s not productive to criticize poor research using poor research.

Conservatively 90%? Bullshit. The real numbers are already terrible so why use unintentional hyperbole to make the argument it just weakens it.


I don't think there's any objective way to say that a majority of papers are bullshit, even if they are. Failure to replicate is "easy" to show, but doesn't say anything about the utility or quality of the science done, beyond the absolute basics of a paper describing what a scientist did in detail and not lying about the results. A "sky is blue" paper will replicate, but I'm not sure how you'd find a "fair" way to call it useless.

Anyway, jumping from "50% of papers will probably fail an absolute bare minimum test" to "90% of papers are garbage" is subjectively reasonable to me.

If 90% seems shocking to someone, I suggest they skim a bunch of papers and develop their own feel for how bad the situation is.


Possibly you and the OP would be better served to say “I wouldn’t be surprised if some claim about x%” is true. Or, “I would guess it’s true that”.

Back to the main subject Vertasium has a related video:

Is Most Published Research Wrong https://youtu.be/42QuXLucH3Q?si=p9M0ZOAdHuowD30n


I read a lot of history and some linguistics, it seems really fine to me. Linguistic especially, they work on _a lot_ of data, you don't have the "tested on 15 subjects" from biology or medecine effect.

Even if i agreed with your "subjectively reasonable" idea, the issue is that non-STEM science encapsulate a lot of different fields.


what's non-STEM science? the incentives of publishing and funding systems are similar across most fields (publish papers in the cool journals, get as many citations as possible, leverage that for more funding), I would be astonished if there were fields substantially exempt from bad science.

I feel like my position is fair for any field publishing papers with falsifiable claims based on evidence?


Also, Feynman was very clear that the field of physics is not immune to the underlying dynamics, citing both the historical reluctance to correct the Millikan oil drop experiment and sloppy methods being used in contemporary particle accelerator research.


He might be surprised to learn how many papers published in STEM are cargo cult science.


I'd say conservatively ;-) about 75%


And yet, this is a kind of figure that possibly could be proven, if you either assign a lot of people to read / analyze the papers and mark them "cargo cult" and "not cargo cult", or automate it with modern day text analysis tooling. Getting all the papers in one place will be a logistic challenge though.


Even just 100 or 200 random papers would be okay for a rough indication. There must be someone who has done this?

That some papers are bollocks is not a controversial notion, but I genuinely have no idea if it's 10%, 50%, or 90% that's bollocks. Even just a rough indication of "about 20%" or "about 80%" would be hugely helpful. It's also unclear that non-STEM is more bollocky than STEM – Oxman's thesis is STEM, and also bollocks. STEM word salad papers don't make the news because the conclusions are rarely interesting, whereas sociology and psychology papers do, because the conclusions tend to be more interesting.


I've probably read 100 ML papers. They're pretty bad. Even the most cited ones like Adam, Attention, and Latent Diffusion are messy, unclear, or just straight up wrong at times.

Adam - Their algorithm is inefficient. Their paper can be summarized as "use the signal-to-noise ratio," but the key idea is hidden in a page-long paragraph halfway through the paper.

Attention - I probably read "keys, query, values" a dozen times before I realized the key and query matrices multiply with the same vector (where the "self" comes in).

Latent Diffusion - Their theoretical grounding was just completely wrong. It's not some stochastic diffeq, or Markov process, or w/e terms they threw around. It's just finding the gradient of log-likelihood using finite differences (the error they add).


Leaving aside the vast semantic Gulf between what Feynman was talking about and things being "bollocks", how do you evaluate the truthiness of novel arguments? The typical solution here is domain experts, aka peer review. The author's point is that they don't trust peer review, so you need some other method that's both practical and universal. I can't imagine what that could possibly look like.


SciHub provides an index of their DOIs [1] as well as torrents for the actual content. I presume one could cross reference citation counts and pull the top few hundred papers in different fields to get a pretty good sample over time and research fields.

[1] https://sci-hub.se/database#download


You could also do a random sampling.


The fact that so many papers are published rather works against your argument. The more papers are published in a given field, the more likely they are to be bullshit.

Of course, that applies for STEM fields as well as the social sciences. However, if we do comparisons across fields, such as looking at how many citations a typical paper in some field will receive, we find that STEM tends to do a lot better than the social sciences[1].

>Pick a field and take it down if you can. But trying to build a case against all of social science using the example of someone notoriously elevated despite a lackluster career as a publishing academic doesn't get you anywhere!

The longer such fraud goes uncorrected, the more rewarded a person is for their fraud, and indeed the more lengths other people go to cover up such fraud (as was argued by the OP) the more likely such fraud is normalized in a particular field[2]. The Claudine Gay story suggests that, if anything, XKCD was wildly optimistic about the social sciences[3].

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446096/ [2] https://danluu.com/wat/ [3] https://xkcd.com/451/


How does that follow? Is your premise that a static number of people are producing ever-increasing numbers of papers? That's not what's happening at all. There are lots of papers because there are lots of people doing research. As the number of vulnerability researchers increased, drastically in the 2000s, the number of vulnerabilities went up --- there was a point where people we tracking new vulnerabilities, across all platforms, by hand! But the median quality of a vulnerability research result didn't fall; in fact, while the variance increased, the peak quality of vulnerabilities improved.


I do not assume a static number of researchers, nor do I see why you would infer that from what I said. Of course there are more researchers going into these fields. The more people enter into a field, the more likely the typical person entering that field is going to be a mediocre researcher, and the more likely that their work is going to be crap.

Sure, in the case of a very small and new field with promise of intellectually stimulating work and structural advantages for eliminating bullshit, adding more people and papers is likely going to be a net benefit. But realistically many established disciplines can only add people by lowering standards.


As an aside, why is this the thing you chose to focus on in responding to me? While I do think it is true and I am willing to defend it, it's not really the central issue.

What I understand as the central point of your original comment was that we cannot condemn all of the social sciences (or 90% of the work in them) on the basis of the Claudine Gay story. I refute that with this sentence:

>The longer such fraud goes uncorrected, the more rewarded a person is for their fraud, and indeed the more lengths other people go to cover up such fraud (as was argued by the OP) the more likely such fraud is normalized in a particular field. The Claudine Gay story suggests that, if anything, XKCD was wildly optimistic about the social sciences.

Put another way, in the spirit of Scott Alexander's recent essay "Against Learning From Dramatic Events", the Claudine Gay story represents a significant deviation from what would be predicted by a model that assumed the social sciences had a more robust level of intellectual integrity.


As a STEM prof, all my papers are cargo cult science.


I happen to tell people I’m a statistician—because it’s true, and because I didn’t learn from the first 10,000 times someone heard that, turned around, and left without a word. The always run away, except in one case… If they stay, I know they are

a. a Ph.D. Student and

b. having a very bad time and therefore

c. about to ask me pointed questions about their methodology section, what is a t-value and a p-score, why there are two xis, and whether it’s too late to prevent Bonferonni, Hochberg, Holmes, Benjamini, and Šídák from ever publishing they terrible ideas. (Yes.)

I’ve been confronted with utter confusion in every field of knowledge, including art history, and literature. It’s a miracle we make any progress in science, but I wholeheartedly blame *how we teach statistics*. It’s, at best, a dark cult that requires sacrifices to obscure gods, and most likely three cargo cults in a trench coat. Conditioning scientific discussion to passing under the Caudine forks of Greek letters incantations isn’t helping us. We need a reform.

There is no problem with people looking at black holes, counting cells in a Petri dish, or trying to define inflation. If it hurts when you touch your forehead, your thigh, your elbow, and your knee, it means your finger is broken. There is one thing in common between all those people: the three-hour elective on stats taught by someone who had the social skills of a door hinge, the clarity of VentaBlack, and the patience of a toddler on cocaine.

There are amazing statisticians who explain things beautifully. Get them to teach, and people will finally grok uncertainty. They’ll see how clumsy are most of the methodologies pre-approved by reviewers because it was used in the same journal before. We ask the most critical minds of our generation to challenge the status-quo. They know how. They just need to know what the status-quo actually means.


I'd push that it isn't just statistics. It is finding a way to overcome any learned experience and expectations. I remember people having a hard time understanding that the reason things fall behind the car when you toss them out the window is 100% air friction. To the point that many in my college classes would get that basic question wrong. Or, for fun, show an image of someone spinning a rock on a string overhead, ask what happens when they let go of the string with an arrow for where the rock will go.

Nevermind how many of us (myself included) have a hard time with the idea that .9999.. equals 1. Even more amusing when you consider few have trouble with .3333... * 3 doing so.

Yes, statistics can be particularly difficult to really internalize. I'm not entirely convinced it is uniquely difficult.


Probably.

I just happen to have had the most incredible physics teachers, and I know elite engineering schools do too: Feynman, that Eastern European lady who is way too loud and excited about science, the “let’s see if conservation of movement works, or if that bowling ball will crush my skull” guy. Every school had more than one. Steve Mould is a gift from above, but not a unique one…

On the other hand what Randal Monroe from xkcd and Grant Sanderson from 3blue1Brown have alluded to, that felt like witchcraft unlike anything I’ve seen in class.

I went to what should be one of the most demanding schools for statistics and it took me years to make sense of it. Re-opening my notes after things clicked was earth-shaking.

And it wasn’t specific to the school: we also had economics teacher, and they were incredible: hilarious, terrifying, unforgettable. “Our task is to forecast inflation. We have tried many approach, mainly macroeconomics and bone reading. Bone reading was more accurate, but the minister didn’t think it involved enough suffering.” How do you forget that?


Love the suffering joke!

I do think that most physics benefits from the empirical nature of being able to test ideas. Statistics is tough because you can't really empirically test a lot of distributions. And then a ton of us (again, myself included) have a hard time seeing the subtle differences introduced with some framings. The Monty Hall one is hilarious for just how upset so many people get on being wrong on it.

Note that I think with modern computers, you can start to test more distributions in simulations than you could dream of in the past. Has some of its own problems, of course, and it is a shame that we lose a lot of symbolic manipulation in the process.

So, I still largely think it is getting people past their experiences. I think that is ultimately compatible with your point. I think I'm just calling into question how rosy the rest of the landscape isn't, in the learning environment.


I think the .999 repeating thing has more to do with how we teach limits, people struggle to understand how two things that look definitionally different are the same because the rationale isn't well motivated. I had a professor that framed the whole process as an argument which was much easier to grasp, and I can easily imagine most people just getting a lecture or two on the equations and the implications.

Plus phrasing it as 'equals' or 'are the same' sort of puts the challenging part right in the front. Focusing on the difference more naturally leads you to the same conclusion.


It isn't a limit thing, though? It is a textual representation thing. You are not confused that 01 is the same as 1. Similarly, you are ok that .33... is the same as .3... In fractions, you are ok with the idea that 1/2 is the same as 2/4. And that you are often supposed to work in reduced form, such that you will probably never put 2/4 down as an answer to any problem. We work with "regular" numbers for so much of our life that it is odd to consider how to write them to that level.


Its not a textual representation thing, there are other repeating sequences that do not converge.

I mention limits because that was the context in which I learned about this, others have pointed out you can demonstrate this with just algebra (which makes me wonder if I just wasn't paying attention when that lesson came up).


It can be proven that it equals 1 using basic algebra.


Sure, but if your "basic algebra" includes the popular inconsistent axiom that "numbers have a unique decimal representation (up to leading and trailing zeros)" you can equally well just disprove any of the rules of algebra that you used to reach the conclusion.

When people have (accidentally) been taught that, and then they're presented with a case where it's false, it's perfectly reasonable that they wonder whether it's that thing that they were taught, or something else that they were taught, that is wrong. You don't have to look very far to find cases where it was the other laws of algebra that they believed were wrong, for example the popular proof that 1 + 2 + 3 + ... = -1/12.

Even a mathematically sophisticated alien who comes from a culture that somehow never used the rationals or reals the way we do might at first think that this is a fact that leads to a proof that no object that obeys your rules of algebra exists, rather than a proof that 0.999... = 1.

Most people, even most university students, don't have the benefit of a formal mathematical education that actually clarifies these kinds of riddles.


Relevant video ("Feynman on the social sciences"): https://www.youtube.com/watch?v=zkFPCTwPlkU


The article points out that Oxman's definition of Euclidean distance is jumbled nonsense. I took a quick look at a related Oxman paper [1] and it is a mess. I found a different and even more jumbled definition of distance:

"Euclidean distance √ (d = x2 + y2, also called the L2-norm)"

I don't know how you'd get something this messed up if you understood it at all. The non-superscript 2 is an understandable typo, but the square root is in completely the wrong place. And is it supposed to be applied to all the text in parentheses?

I also note that the definition of tesselation is also completely confused:

"The breaking up of self-intersecting polygons into simple polygons is called tessellation, or more properly, polygon tessellation."

This definition (most of the paragraph is cut-and-paste from Wolfram MathWorld [2]) is a very special case of tesselation since normally you're not dealing with self-intersecting polygons.

(For completeness, I'll mention that the paper uses "ray" when it should use "line segment" but that is kind of quibbling.)

[1] https://papers.cumincad.org/data/works/att/acadia09_122.cont...

[2] https://mathworld.wolfram.com/Tessellation.html


Oxman's dissertation was submitted to the architecture department. It's likely that none of her reviewers were actually equipped or motivated to know which aspects of it were nonsense.

Architecture as an academic discipline suffers from having practitioners with science-envy, but who don't have the training, the disposition, or the incentives to practice real science. The real end product of Oxman's labor is not a paper that advances science in some way: it's a beautiful piece of sculpture that will be acquired by MoMA. She's actually good at this part of the job! Unfortunately she (along with many others within the field) feels compelled to use science-y jargon to dress up what she's doing.


These guys have a systematic methodology for doing meta-analysis in health

https://www.cochrane.org/

The process starts with doing a literature search and then culling low-quality results which is almost always most of them because most studies in health have too small of a sample size, lack controls, and have other basic flaws. I saw one the other day where they found 80 studies but could only include two in the analysis.

Physics has its own hall of shame. It's quite likely that a large fraction of high energy physics is "not even wrong" in that there's not any evidence that the universe has 10 dimensions or that supersymmetry exist. In my field we used to make a plot on log-log paper, draw a line on it and say we'd discovered a power law so we published a lot of junk papers like this one

https://arxiv.org/abs/cond-mat/9512055

Mark Newman was at our institute at the time and after he finally got tenure a decade later he wrote a paper that put the smack down on it but it was in a statistics journal so for all I know people are still doing it wrong.

One of the reasons why I never really found my voice as a scientist was that I never found reconciliation between the ideal of scientific truth and the reality that there is a lot of bullshit (e.g. if you don't publish some bullshit you'll perish)


As someone who worked my tail off for my PhD dissertation for years, the notion that someone just copied and pasted random sections of random works irks me.


I think the OP did a poor job of explaining what he meant by "Cargo Cult Science," and the single quoted paragraph from the Feynman essay which supposedly coins the term is inadequate.

The best I can tell, the issue is adopting some of the forms and jargon of the physical sciences, without fully understanding them in a couple of particular ways, such as failing to:

> try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another.

...or failing repeat previous experiments or to try to control for all variables.

Here's the original essay: https://calteches.library.caltech.edu/51/2/CargoCult.htm.

That essay mentions that the root cause of the problem is that many of these ideas are never explicitly taught (at least at the time its writing).

I wonder how much of this is due to stuff like the (historically) excessive prestige of the physical sciences leading to stuff like "physics envy" forcing other areas of activity to contort themselves into a distorted shape to fit a physical-science-shaped hole.


The definition I came away with is that Cargo Cult Science is anything that is trying to look like science while not embracing the core scientific principle of actively trying to _disprove_ the claims you are making.

Science can't prove anything (deductive vs inductive), it can only fail to disprove. While no attempts to disprove can ever be 100% thorough, real science makes good faith attempts to disprove claims, and won't actively avoid obvious potential routes to such disproving.

Cargo Cult Science does not do this. It makes claims, shows evidence for those claims, but makes zero (or at best token) effort to come up with alternate explanations or find ways in which the claims might be incorrect.


The author points out the core problem: there is no method for changing the culture of the professional bureaucracy. You change the head of the org even, but the org does not change.

Change in orgs is very healthy... Passionate entrepreneurs slowly turn to businesses which devolve eventually into corrupt scams. Bankruptcy is the mechanism for getting rid of the old rot in the business world. How do we get rid of it in the academic and political bureaucratic worlds?


That stuff about Voroni tesselations looks pretty orthodox to me.


Other than the typo in the equation, the Voronoi snippet is pretty straightforward and easily understandable.

It's probably the blog post author who doesn't understand this subject.


I would just say "cargo cult thinking" or "cargo cult scholarship" instead of cargo cult science, as it's more accurate in general, and avoids addressing whether political science and media science (the fields of the people used as examples in the article) are considered science per se. Just as plagiarism cuts across disciplines, including non-science fields, so does cargo cult thinking. So, it's a better term.


I didn’t look too deeply into the scandal about plagiarism (because seeing people I admire fight like kids makes me sad), but I saw one glimpse: one accusation was comparing the Literature section of two papers by a PI and their Ph.D. student, and finding the two first paragraphs matched.

Dude, that’s the “You must read this before taking the class” segment of the paper. It’s a human-legible context for the Reference list (and occasionally, the opportunity for some minor pettiness against the competing lab). No one reads it (except Master’s students trying to catch up), and of course, it’s copy-pasted between two papers on the same topic.

After that, I kind of skipped the topic.


Actually, no, people don't generally copy-paste parts of each others' literature review sections.


I’m convinced it’s not done in many areas of science. I have seen it done extensively in others.

The point is: I’m not sure doing it should be qualified as plagiarism.


I mean, I guess you could call Claudine Gay a cargo culter because she used a particular method without fully understanding its assumptions, but if you set the bar that high then almost nothing is going to pass. If you've ever been in academia, you know Jose Ortega y Gasset was right when he wrote: "it is necessary to insist upon this extraordinary but undeniable fact: experimental science has progressed thanks in great part to the work of men astoundingly mediocre, and even less than mediocre. That is to say, modern science, the root and symbol of our actual civilization, finds a place for the intellectually commonplace man and allows him to work therein with success." Most of us, most of the time, work with "recipes" that we think will produce good science, and often that works, except when it doesn't.

As for Neri Oxman, the kind of word salad she spit out is definitely reminiscent of cargo cult behavior, but the motivation is different: if you're in a cult, you're a believer, whereas it's not clear to me that Oxman believes her own nonsense, and in any event she seems to tend to borrow stuff from other fields and does not presume to contribute to those fields. Consequently, her research doesn't really qualify as the kind of mimicking without understanding that Feynmen was talking about.

So I guess I must ask: is Sven Schnieders (the author) cargo culting about cargo cult science?


From Andrew Gelman: "Any sufficiently crappy research is indistinguishable from fraud" https://statmodeling.stat.columbia.edu/2016/06/20/clarkes-la...


"I estimate, conservatively, that about 90% of papers published outside the STEM fields can be categorized as Cargo Cult Science."

Pot, meet kettle.

How many times have we had room temperature semiconductors and cold fusion breakthroughs again?


How long it took to disprove though?

That's the whole point. It's very hard not to make mistakes when dealing with novel and complex problems like science. But in (at least parts) of STEM things get disproved relatively quickly, while in social sciences almost never.


The criticism of Oxman's research jives with general skepticism I remember towards the Media Lab.

Very prestigious, selective, and status driven. Based on making deep science accessible to the outside world in a way that the rest of the institute wasn't willing or able to. Lots of projects based around the intersection between previously unrelated fields of study, with an eye towards catchy aesthetics.

At an institute where researchers were expected to knowing a single field of study to a mind-boggling level of detail, there was an assumption that mix-and-matching completely disparate fields required either a talent that surpassed the world-class talent in other departments, or a willingness to accept a much more superficial understanding of the constituent fields. The TED-talk/Malcolm-Gladwell shine on so many Media Lab researchers certainly made people assume it was the latter.

I've only read this blog post, the abstract of Oxman's paper, and the referenced section. Drawing together the aesthetic similarity of voronoi diagrams, phase diagrams, and metallurgy structures feels exactly on brand. It's the kind of insight porn that makes the Media Lab accessible to fairly intelligent outsiders who can fund it rather than bothering with a post-doctorate career of their own.

The paper is inherently about design, with the core thesis seeming to be "hey, what if we had designs emerge from the materials instead of designing first and then picking whatever materials we want later?" I'm not sure how deep of an understanding of the sub-fields is necessary. I'd expect an understanding of materials science, since the thesis is about materials, and a material scientist was on the review committee. But sprinkling in other fields without knowing them at a PhD level would seem very expected to me within the Media Lab's culture.

I appreciate a post looking at both pieces of work, given the politically-driven attacks in either direction. That being said, her section on Voronoi diagrams and Delaunay triangulations seems a bit over-explained for a PhD dissertation, but it certainly isn't "technical mumbo-jumbo" or particularly confusing. If he's complaining that she's fluffing up her word count I totally get it, but if he's saying it's GPT-style nonsense I have to disagree.

And if we have to choose between someone defending their MIT PhD (a) not knowing the equation for the hypotenuse of a triangle or (b) messing up the LaTeX and putting a radical in the wrong place... I'm personally going with "b."

The plagiarism itself speaks to the cargo cultism of dissertations themselves, and the need to write long boilerplate sections summarizing prior work and contextualizing the original research. That seems to be where people keep getting caught with verbatim or near-verbatim copying (my first experience with this was my first high school summer program in a lab, where the researcher pointed out a "nice sentence" I should grab from someone else's paper for the intro section).

The very concept of plagiarism in a scientific dissertation is a bit odd, almost a category error when not applied to the actual original research itself (research in the "did something new" sense not in the "read a bunch of other papers" sense). Why is a PhD candidate supposed to write an entirely original summary of the state of the field? I think it is very, very common to start with an existing summary and modify it for these sections. And I'm not sure that I find that un-ethical.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: