Tortured phrases: A dubious writing style emerging in science

dang · on Aug 8, 2021

The paper is at https://arxiv.org/abs/2107.06751.

(We merged this thread and https://news.ycombinator.com/item?id=28108111)

atrettel · on Aug 8, 2021

I have encountered something similar to this for a submission that I reviewed for a scientific journal. I will not list any names or give much detail past those generalities, but I pointed out that the authors were misusing a particular technical term. In my review I defined the term and explained it briefly. I asked the authors to revise their submission accordingly. The paper was not bad but the authors did not know English very well, so it was quite difficult to read. That was its main problem. However, when I received the revised submission, I noticed that the authors plagiarized my definition and explanation almost word for word (from my confidential review). I pointed this out to the editors and they said to just reject the paper with the stated reason being plagiarism, which I did. The journal ended up rejecting the article, but I discovered it a few years later in a different journal. The plagiarized section remained, but the authors swabbed out a lot my phrases for these kind of "tortured phrases".

That said, the authors did not fabricate their research (as far as I can tell). They just did not know English well, so it was easier to just copy things that you know are phrased well than to learn to write English well. As the saying goes, do not attribute to malice what can be explained by ignorance or laziness. That does not excuse it but it makes it more understandable.

I agree with the article that this is probably just the tip of the iceberg. There are likely many more lesser evils being committed with similar tools that are just much more difficult to spot. I would not have noticed my particular example if I were not a reviewer for the paper, for example. It makes me wonder how big the problem really is.

dkarl · on Aug 9, 2021

As a native English speaker, but not an academic researcher, I would have naively done the same.

To preface: I am speaking in ignorance of the mechanisms of credit and advancement in your field, so I'm undoubtedly overly harsh. I'm not even trying to be fair, because I admittedly don't have the knowledge to do so.

I know academia is a different world from private sector industry, but I would think if the point of a confidential review is for you to do anonymous work to improve someone else's credited work, and their work was improved by incorporating your feedback, you would be happy with the outcome, or else why are you participating in a confidential review process in the first place? There are different mechanisms for publishing words that you want credit for.

When someone incorporates my feedback on code or documentation word-for-word, I might in the worst case be suspicious that they are trying to get my approval without engaging with my criticism, but in most cases I'm flattered that they respect my idea enough to put their name on it. Although, in my world, putting your name on something is more about responsibility than credit. The command is called "git blame" and not "git credit", after all.

Wanting them to incorporate your feedback, but also wanting them to make some change to the wording to avoid plagiarism, smacks of how freshman essays are graded, not how real work gets done.

Like I said, I'm not trying to be fair and don't have the right background to be fair to you. I only speak up because I grew up in an academic family and know that there is a presumption that academic work is more idealistic, more altruistic, and less mercenary than private sector work, and I think it's worth pointing out when the reverse is true.

kbenson · on Aug 9, 2021

I almost replied that I had the same view after reading your comment but before opening the article in question. After reading the first but of the article, I think the GP is possibly saying something else, when taken in context, which is that this is exactly the behavior that would be expected from someone that's plagiarizing something else in general using the technique from the article, and all the criticism did was help them get a better plagiarized paper.

I can't speak for GP as to whether this was true, or if there are different norms for that sector, or the author was slightly aggressive in asserting their rights, or the assumptions we're making about the content of the criticism are off, but given the article brought a new dimension to it, I thought that worth mentioning.

throwawaygh · on Aug 9, 2021

> I would think if the point of a confidential review is for you to do anonymous work to improve someone else's credited work

This is one outcome of peer review, and is an explicit goal of (good) peer reviewers. But it's definitely not the main goal of peer review. The main goal of peer review is assessment. Improvement is something that you can strive for as a secondary outcome, but it's not the main point.

This is a significant and important contrast with code review. Peer review is NOT analogous to code review!

NB: The code review style of improvement-focused review also happens in academia! But it happens within groups rather than between groups. I.e., advisors or post-docs in single research group or university critiquing one another's work will behave more like a code review. Peer review is different.

The same is true in industry, btw. Think of peer review as the thing that happens when a regulator reviews the code from a medical device. They're not interested in making pull requests and doing your work for you. Although they do in principle want you to succeed in your goals, and might provide some feedback along those lines, they're making a yes/no decision. Different function.

> their work was improved by incorporating your feedback, you would be happy with the outcome

There are really two concerns here:

1. confidentiality, and

2. plagiarism.

The first item is probably more important than the second. The typical expectation of reviews is that they are anonymous and private unless stated otherwise (eg openreview). It's incredibly bad form to publish a private correspondence without first asking for permission.

Copying a reviewer verbatim without permission is plagiarism, but it's not really something that anyone actually cares about, per se. I've sometimes asked for permission to incorporate components of reviews verbatim into my papers, and never received anything except enthusiastic "of course". I assume the same would be true in the case above. But even in those cases I say something like: "as helpfully observed by an anonymous reviewer of this paper, [insert quote]".

The key point is that I don't go around publishing explicitly confidential correspondences without permission.

> or else why are you participating in a confidential review process in the first place?

Exactly. Confidential.

Again, surely you can see how publishing someone's confidential words is quite rude, even if the person wouldn't mind those words being published if simply asked.

> Like I said, I'm not trying to be fair and don't have the right background to be fair to you. I only speak up because I grew up in an academic family and know that there is a presumption that academic work is more idealistic, more altruistic, and less mercenary than private sector work, and I think it's worth pointing out when the reverse is true.

Academic work is all of those things in terms of goals, not necessarily in terms of process. I don't know anyone who has a passing familiarity with Academia and doesn't realize that it is incredibly competitive.

Altruism and competition are different axes.

dkarl · on Aug 10, 2021

Thanks for the detailed reply.

I guess I don't understand what kind of "confidentiality" was violated. To me a violation of confidentiality would be revealing your name and your role in the process, or publishing sensitive information they got via the process, maybe publishing some idea or data you shared with them that you intended to publish yourself later. But they didn't use your name, and you didn't share any sensitive information with them, so I don't get it.

Plagiarism I guess I can see, though the cynic in me says if it was so obvious that their native language wasn't English, it was in the best interest of readers for them not to do the dance of paraphrasing away the plagiarism.

> I don't know anyone who has a passing familiarity with Academia and doesn't realize that it is incredibly competitive.

Yeah, my family who are in academia talk all the time about how petty and cutthroat it is, and I'm sure that played a role in scaring me away from trying it myself, but I can tell they think business must somehow be worse. I get the feeling that deep down they believe they’re seeing a version of human behavior somewhat elevated by the ideals of academia, and however bad it is inside academia, outside, in environments ruled by cruder values, it must be worse.

turnersr · on Aug 8, 2021

In your review, did you suggest the definition and explanation that they used? In this situation, would have an acknowledgment at the end have been enough? In my mind, it seems like you all had a conversation and the authors took up your suggestions as the reviewer.

atrettel · on Aug 8, 2021

No, I did not suggest the definition and explanation as content for them to use. I was trying to explain a concept that they discussed incorrectly multiple times in the paper. It is an advanced concept that might not even appear in graduate-level courses on the subject, so I can understand why they did not understand it fully. That said, I did not give them permission to copy my words there. If there are any particular changes I want the authors to do I put them in quotes. This wasn't in quotes. It was an explanation for their own benefit so that they can correct the mistakes in the paper (by re-writing it).

Once I re-read the submission I wanted to reject it immediately, but I realized that I should get a second opinion first. So I contacted the editors, who agreed that it was blatant plagiarism. Hence, they rejected the paper once I recommended rejection in my second review. So this wasn't just a conversation where I made some suggestions and the authors used them. Even the editors thought it was plagiarism once they looked at it.

An acknowledgment would be impossible because the review was single-blind. The reviewers knew the identities of the authors but not the other way around. What the authors should have done was just re-phrase where they used the term in the paper. They didn't even need to copy my explanation, to be frank. The paper would worked fine without the paragraph they copied. If they just re-phrased the relevant parts no other changes would have been needed and this whole thing could have been avoided.

wombatmobile · on Aug 9, 2021

> What the authors should have done was ...

In the absence of an explicit directive or request from you, given that the authors are from a different culture, how do you expect them to know what was required by them?

I don't mean to be snarky or accusative. Your comment was thoughtful, articulate and detailed, which tells me you are a sophisticated communicator.

atrettel · on Aug 9, 2021

It's a fair question. They were foreigners submitting to an American journal, so there is always the possibility for some sort of cultural misunderstanding in addition to any language difficulties. Nonetheless, the journal's submission process provides authors with a page listing ethical standards they have to follow, and it says that plagiarism of any form is not allowed. In fact, this journal's particular set of standards even mentions that authors cannot copy anything obtained during the peer review process without the "explicit permission" of the reviewer. So I just expect them to follow the rules that they were told about when they submitted the paper.

chaps · on Aug 9, 2021

So, I understand how it's plagiarism, but I'm still not following why your suggestion, with the goal of helping them get their paper accepted, wouldn't be acceptable to copy/paste. It was to them and only to them, so it's not like it's a piece of substantial work from another team. Seems to be an extreme form of following the letter of the rule, and not the spirit of the rule. But, I'm not an academic so I don't really understand this sort of lack of discretionary allowance..

ganafagol · on Aug 9, 2021

I'm fully on board with fighting plagiarism down to that level.

But that said, I've often times wondered if this requirement of having to "rewrite in your own words" may do a lot of harm too. It obfuscates that things that people are talking about are actually exactly the same, or make it fuzzy what the exact differences are.

In a particular academic CS area I've witnessed people reproduce again and again the essentially identical description of setting and assumptions, but in being afraid of plagiarism accusations, they over and over re-formulate things which made it nonobvious that things are the same as from other authors or even from their own earlier work.

widforss · on Aug 9, 2021

They could just cite someone else with proper references.

dan-robertson · on Aug 9, 2021

My understanding is that something like the following happened:

1. authors submit a paper with expository sections about (eg) some materials being flammable and others inflammable

2. reviewer tries to explain that they have incorrectly understood the meaning of the terms, explains the meaning carefully and maybe suggests the terms they might mean.

3. Authors copy in the explanation and maybe replace incorrect usages with weird tortured phrases

4. Rejection

Obviously this description reads a little bit silly and things were probably more nuanced in practice. I think I’m probably also being uncharitable towards the authors in the example.

tremon · on Aug 11, 2021

3. Authors copy in the explanation verbatim

4. Rejection

5. Authors replace terms in verbatim copy with weird tortured phrases

6. Authors submit to other journals, get published

peterkelly · on Aug 9, 2021

"If there are any particular changes I want the authors to do I put them in quotes. This wasn't in quotes." (atrettel, 2021)

This is a very subtle distinction and it sounds like being more explicit about it would have been a good idea.

foldr · on Aug 11, 2021

Acknowledging anonymous reviewers is common in my (erstwhile) field. “An anonymous reviewer suggests the following definition of…” I have to say that it seems odd to me to regard this as plagiarism.

a9h74j · on Aug 9, 2021

Based upon your description around plagiarism, I doubt a customary end-credit "to an anonymous reviewer" could have been enough.

adaml_623 · on Aug 8, 2021

Not 100% sure but I believe the word confidential implies that the review should only have been read by the editor(s) and not passed on to the authors.

pottertheotter · on Aug 8, 2021

A review is the written feedback authors receive from the journal reviewer. The reviewer can recommend that the authors revise and resubmit, based on the review comments. Usually the review is not published with the final piece, which is what was meant by “confidential review”.

hdjjhhvvhga · on Aug 8, 2021

> The paper was not bad but the authors did not know English very well, so it was quite difficult to read. That was its main problem.

This seems to confirm my suspicion than these cases are not so much about AI-generated content but rather a result of machine translation.

aliswe · on Aug 8, 2021

it's a common technique/first layer of plagiarizing a text to translate it from english to eg. spanish and then from spanish to english, to get rid of the unique words the author used.

craftinator · on Aug 8, 2021

> it's a common technique/first layer of plagiarizing a text to translate it from english to eg. spanish and then from spanish to english

It's also a common technique for people who don't speak English to translate it... In fact, quite a bit more common.

Clewza313 · on Aug 9, 2021

Sure, but there should never be a need to translate the same phrase back into English.

craftinator · on Aug 9, 2021

The parent post to mine was theorizing that the reason the English was so mangled was because it had been translated from English, then back to English. I was replying that, if the researchers didn't use English as a first language, it's ridiculously more likely that they were translating from their native tongue into English. You're misunderstanding where the notion of translating it twice came from.

And sure, there are reasons to translate a phrase from English, to another language, and back to English. This will be familiar to most people who've studied abroad, or done technical conferences on foreign soil, things of that nature. Let's say you're from Bolivia, attending a lecture at an English university, and are planning on referencing some of the content in a paper you're writing, in English. You speak passable English. The lecturer gets into the meat of the topic, and you realize you don't quite understand the context of what they're saying. Some of the conjugations are unfamiliar, so you just write it down as best you can and move on. Later, when writing the paper, you need a way to untangle the phrasing. A simple way is to put it into a translation application, translate to Bolivian, then try to parse it in native tongue. However, you know you have to explain and discuss this section in English; by translating it back, you'll get the English words, but some of the context and grammar structure will be from familiar Bolivian.

So yeah, never say never.

Clewza313 · on Aug 9, 2021

My wife was a professional translator and I did my master's thesis on the topic. With modern translation engines, there is no way you'll end up with "haze figuring" or "arbitrary timberland" translating one way from any source language into English. I also doubt you could get a very specific word like "timberland" from repeated translation of "forest", intentional synonym replacement is much more likely.

Also, they speak Spanish in Bolivia ;)

craftinator · on Aug 9, 2021

> there is no way you'll end up with "haze figuring" or "arbitrary timberland" translating one way from any source language into English.

Good to know; I've always wondered at the peculiarities of different translation engines, but never really dug into them, as most modern ones seem like neural network black boxes to me. I was pointing out that there are some realistic use cases for doing round trip translations. I've used this technique at a few conferences to help straighten out my hazy understanding of a complex idea in a language I spoke quite poorly. And I do agree it is bad form to use this directly in an academic paper.

> Also, they speak Spanish in Bolivia ;)

To be fair they speak Bolivian Spanish, along with many other native languages! I chose Bolivia as a random target without doing any research, so thanks for the pedantic push to go learn something new; things like this are why I do love HN!

yarky · on Aug 9, 2021

I love how confident you sound about the Bolivian language. I guess you meant spanish, or maybe aymara/quechua.

> This will be familiar to most people who've studied abroad

It wasn't to me until now but this explains a lot!

craftinator · on Aug 9, 2021

I chose Bolivia because I know next to nothing about it and it seemed neutral; I should have used a notional country, like the Republic of United Swiss Emirates.

You've never had to do that before? Maybe I run into it a lot due to the nature of the conferences I attend. I know just enough of the language for functional conversation, but as soon as a complex idea is put forward, I need to be able to contextualize the familiar scientific portions of it quickly, and the round trip translation usually helps enough that I can parse it correctly.

yarky · on Aug 10, 2021

> You've never had to do that before?

Not really, but I came across some teammates in college who didn't seem to be able to follow a conversation yet they seemed to have quite good writing skills. This might be the reason why ;)

Anon1096 · on Aug 9, 2021

I find it unlikely a machine translator would spit out phrases like "counterfeit conscience" and "haze figuring" over AI and cloud computing with 1 pass. Plagiarism via multiple pass throughs seems much more likely.

nerdponx · on Aug 9, 2021

Ironically this is exactly what I would have expected from machine translation 15 (edit: 20?) years ago. My friends and I used to get a kick out of running phrases through several rounds of machine translation in different languages and finally back into English, and then playing them with the Mac OS 9 text-to-speech system.

numpad0 · on Aug 9, 2021

Maybe unlikely between two Indo-European languages with closely related sentence structures and vocabularies, but plausibly likely for others. DeepL just gave me "potter's screw" for "pan-head screw" in my language, for example.

rowanG077 · on Aug 8, 2021

What that phrase again? Oh yeah innocent until proven guilty. Whew almost forgot it. Why would you assume bad faith when this could be much more easily be explained by non-english speakers just using a machine translator.

davrosthedalek · on Aug 9, 2021

There is 0 probability that an academic author who wants to write about artificial intelligence does not know the English term. Just from the fact that a properly written paper requires the author to know/cite the relevant literature, which at least to some non-0 percentage is in English. Same goes for the referees.

maydup-nem · on Aug 9, 2021

So you couldn't ask them to attribute this to you as an anonymous reviewer? And instead wanted them to spend their effort using "their own words" because "inexcusable" and what not? Man, are some people stuck up deep in their own ass. Yeah, a copy-paste isn't nice of them, sure, but are you one fragile snowflake.

davrosthedalek · on Aug 9, 2021

They might have wanted to use the phrasing in their own papers. It might now get flagged as plagiarism....

kome · on Aug 11, 2021

But why did you call them for plagiarizing a private note, a confidential review? I mean... In the end they just used your wording to improve their paper, for a concept they already mastered. And you said the paper was good, otherwise. IMHO they acted ethically.

atrettel · on Aug 11, 2021

I believe I've answered your questions in two of my previous posts:

https://news.ycombinator.com/item?id=28112208

https://news.ycombinator.com/item?id=28112358

To sum up, the authors did not just use my wording for a short portion. They copied an entire paragraph of my review nearly verbatim. Both the journal and I thought that the authors acted unethically. They violated one of the ethical standards of the journal regarding plagiarism, and these ethical standards were made available to them when they submitted the paper. Those were the rules that I had to evaluate them with for the review, so my hands were tied in some sense. I would also quibble with saying that they mastered the concept, because I really have no way to gauge their understanding if they just copy my own words.

tremon · on Aug 11, 2021

The solution seems simple though: do not publish in a language you do not master. Get a co-author or fellow researcher who does.

Isn't that what journals are supposed to be for? To help you reach a wider audience?

MikeUt · on Aug 8, 2021

> the authors plagiarized my definition and explanation almost word for word (from my confidential review).

Is there any way the authors could have kept your definition, and somehow credited you, even anonymously? Because rephrasing definitions is the pinnacle of wasted effort, and leads to confusion - you are asking them to say what you said, but without using your words.

atrettel · on Aug 8, 2021

That is a good question that I do not have a good answer for, unfortunately. The review process for this journal is supposed to be blind, so crediting me would only reveal me as a reviewer. An anonymous acknowledgment is better than nothing, if the authors only copied a short definition without my permission, but they copied an entire paragraph from my review without my permission. That's just inexcusable. I can understand to some degree why they did not understand the concept well, since you may not encounter it even in a graduate-level course on the subject, but what they did was just inexcusable and really poor judgment.

anchpop · on Aug 9, 2021

I don’t understand what the problem here is actually. If you plagiarize on an assignment for school, that’s bad, because the goal of assignments is to test your knowledge and plagiarism makes it a less effective test. But here the goal (I think) is to produce an informative and accurate paper. You told them the definition - what would have been gained by rephrasing the definition you told them? Especially if they’re nonnative english speakers, it seems likely that any rephrasing would have made the definition less understandable and possibly less accurate.

I guess the fear is that they don’t actually understand the definition, and anyone reading the paper will incorrectly believe that they do, improving their reputation in a way they don’t deserve?

atrettel · on Aug 9, 2021

As I have said in another comment, the authors did not just copy my short definition but my entire paragraph-length explanation of the concept. I might have let the short definition slide but an entire paragraph is just inexcusable.

I agree that this is not a school assignment, so the nature of plagiarism is a bit different. But you hit the nail right on the head. We should be worried that they are pretending to know something that they really do not. They copied my explanation nearly word-for-word. Doing that does not prove that they actually understand the concept. It only proves that they have copy and paste. Now maybe they did spend some time learning it and looking into, but there is no way to know for sure. The only way to prove that they really understand the concept fundamentally is to make them explain it themselves in their own words. And that is precisely what they should have done.

matz1 · on Aug 9, 2021

>Now maybe they did spend some time learning it and looking into, but there is no way to know for sure

Why are you so concerned on this ? Isn't the goal of a paper is to communicate information?

>It only proves that they have copy and paste.

Yes they copy paste but it doesn't prove that they don't really understand it.

They may very well found your definition is the best way to describe it.

As long as the reader of the paper understand what being communicated, imo it should be fine?

I can understand if its a school assignment where the goal is specifically to prove that the author know their stuff.

For a scientific paper, requiring a concept to be described in different way for the sake of it seem to be inefficient and wasting time.

ganafagol · on Aug 9, 2021

"We thank the anonymous reviewers for constructive feedback that we used to improve the article." Standard phrase, credit where credit is due, no need to make some definition worse just to not trip off a plagiarism filter.

Unless this very phrase does the tripping off. Having to rephrase it would be pretty ridiculous though (and illustrates your point).

davrosthedalek · on Aug 9, 2021

No, it's important to gauge how competent the researchers are, i.e. how trustworthy the remainder of the paper is. The problem with review process is that the reviewers typically cannot reproduce the work itself -- take a physics experiment for example. So they can only do a smell test, and getting basic concepts wrong is a rather bad smell.

davrosthedalek · on Aug 9, 2021

Nature (and some others) now have started the IMHO awful practice to allow reviewers to be named after the paper is accepted. Beside the obvious question of bias etc., this also blurs the lines between author and reviewer.

thisrod · on Aug 8, 2021

> Is there any way the authors could have kept your definition, and somehow credited you, even anonymously?

It's not uncommon for the acknowledgements section of a paper to thank an anonymous reviewer for, e.g., suggesting the authors further investigate a detail that turned out to be important. But in this case, where the authors couldn't write their own explanation of a technical term, maybe it's a bit premature for them to be writing technical papers.

lugged · on Aug 9, 2021

Also found this a super weird complaint.

It's not a, its b.

Ok b.

Hey, you copied my answer?

???

da39a3ee · on Aug 8, 2021

I agree with this. You sound expert and provided a definition. I don't think we should expect serious professionals to mess around altering the words to make it look like it didn't come from the source that it did come from. In fact wouldn't that itself be plagiarism? The usual approach here is to use a phrase like "as suggested by one of our reviewers".

pottertheotter · on Aug 8, 2021

I think this is the cause of some of these weird terms that this HN post is discussing. I have a PhD and found it incredibly frustrating to write research papers because there was an expectation in my field to add a ton of background. That meant I had to spend a lot of time to rephrase bits and pieces of other papers where the authors had worked hard to word something very well. The professors didn’t like me quoting from other papers. I had to come up with my own way to say something very specific.

petschge · on Aug 8, 2021

I write papers too and hate finding a new way to say "my X is a Y that does Z". Especially if it is your tenth paper on the topic and you should even sound like the previous nine times.

But about three sentences into the introduction (where you explain all the background) you start going into "there is also Y's that do Z backwards". Which Y's you compare and connect with is important and says alot about how you think about your X. It might even be a new way of looking at it. So telling other people how you think of it can be important.

And another 5 sentences in you start referencing previous work on the topic. At this point you are crediting other and you get to chose whom to credit how much, with the benefit of hindsight. You refer to papers that are useful to people new in the study of capital letters. What you write here helps them much more than a mere list of papers or a google (well google scholar or ADS or pubmed or what ever) result list, because you can provide a good order to read them or which aspect of X's are best explained where. You also name papers that might be useful to practitioners in the field because they have a particular technique or a good explanation of it.

So it is very much worth while of providing the background that others expect at the beginning of your paper. Even if it requires rewriting that first paragraph several times.

lugged · on Aug 9, 2021

I'm so glad I skipped PhD level and got straight to work.

I would have hated that level of navel gazing.

sunshineforever · on Aug 8, 2021

Yeah. How can you plagiarize a definition?

WastingMyTime89 · on Aug 9, 2021

I was pondering the same thing until I realized the person you are replying to is somewhat stretching the meaning of definition to mean a multiple paragraphs long explanation of a complex concept which was as I understand it lifted verbatim.

slapfrog · on Aug 9, 2021

Even if you only copied a typical one-sentence dictionary definition verbatim without attribution, that would still clearly be a clear-cut case of plagiarism...

WastingMyTime89 · on Aug 9, 2021

Hardly. Plagiarism is passing someone else idea or work as your own. Most terms especially technical ones have precise agreed upon definition. Calling quoting them plagiarism is stretching the notion to its limits. Thankfully there is no philosophical conundrum to be had when people are lifting complex explanations.

eesmith · on Aug 9, 2021

The "precise agreed upon definition" is not mapped to a precise agreed upon sequence of words.

I find it hard to think of any technical terms which have a fixed, well-specified phrase for the definition, much less ones which, if re-used, don't require attribution.

I mean, there are definitional terms like "one meter is the length of the path traveled by light in a vacuum in 1/299 792 458 of a second" or "the discriminant of the quadratic equation is b^2-4ac". Re-use those quoted definitions and no one will blink.

But, what's "evolution", or "electron spin", or "aromaticity"?

Even something as well-defined and concrete as "cosine similarity" has many different variations:

Wikipedia: a measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1

SciKit-learn: the normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||)

towardsdatascience.com: the cosine of the angle between the two non-zero vectors

statology.org: For two vectors, A and B, the Cosine Similarity is calculated as: Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)

uchicago.edu: For vectors, it is the cosine of the angle between those vectors.

datadriveninvestor.com: Cosine similarity of two vectors is just the cosine of the angle between two vectors

While certainly equivalent, these definitions show some creative choice in how they are worded, and thus if copied, should be cited.

(I agree that the creativity level is quite low for some of these, and I believe several people might come up with the same description, but that's a different issue than re-using someone else's definition without attribution.)

WastingMyTime89 · on Aug 9, 2021

> (I agree that the creativity level is quite low for some of these, and I believe several people might come up with the same description, but that's a different issue than re-using someone else's definition without attribution.)

It's funny because I actually find your exemples to be supporting my point more than yours.

All of these sentences are translation in plain English of the absolutly perfectly defined and commonly accepted definition of cosine similarity. SkiKit-Learn is even just writing the formula.

uchicago.edu and datadriveninvestor.com even use exactly the same words. I mean, if you came to see me complaining someone was plagirising for writing "For vectors, cosine similarity is the cosine of the angle between those vectors.", I would find that laughable.

eesmith · on Aug 9, 2021

Yes, I deliberately chose something that just above trivially simple to show that there was diversity of expression even at that level.

That is, even given a technical term with a precise agreed upon definition, the description of that term (eg, in English) does not have a precise agreed-upon form.

Incorrect use of the latter may imply plagiarism, and this thread appears to concern that aspect.

Most definitions are not as simple as "cosine similarity".

Your statement, if true, would mean that most dictionaries would use exactly the same words to describe a given, well-specified scientific concept, yes?

What's "Frame dragging" in general relativity?

Wikipedia: the effect on spacetime caused by a rotating mass ... Frame-dragging is an effect on spacetime, predicted by Albert Einstein's general theory of relativity, that is due to non-static stationary distributions of mass–energy.

doi:10.1126/science.aax7007 : the mass-energy current of a rotating body induces a gravitomagnetic field, so-called because it has formal similarities with the magnetic field generated by an electric current (1). This gravitomagnetic interaction drags inertial frames in the vicinity of a rotating mass. (quoting from the preprint at https://arxiv.org/abs/2001.11405 ).

einstein-online.info: a mass’s rotation influences the motion of objects in its neighbourhood

doi:10.3390/universe7020027 : The term "frame-dragging" usually refers to the influence of a rotating massive body on a gyroscope by producing vorticity in the congruence of world-lines of observers outside the rotating object.

doi:10.1142/9789812564818_0002 : A major consequence of General Relativity and related theories of gravity is that all inertial frames are local. These local frames are accelerated, warped and stretched, and rotated with respect to each other due to the surrounding mass-energy distributions. While only their relative rotations are typically called frame dragging effects, this phrase describes a broader range of gravitational influences on inertia.

Very different definitions, because it's hard to express that concept in English. And I think re-using a few of the more extensive definitions, without attribution, is a minor form of plagiarism. (Reusing any follow-up explanation is, as you've agreed, definitely plagiarism.)

You might recall that atrettel wrote "the authors were misusing a particular technical term".

If you dig in to the papers on frame dragging, you'll note similar complaints, like https://arxiv.org/abs/gr-qc/0509025 : "Many accounts of these experiments have been in terms of frame-dragging. We point out that this terminology has given rise to much confusion and that a better description is in terms of spin-orbit and spin-spin effects."

> I would find that laughable

So would I, which is why I commented 'that's a different issue than re-using someone else's definition without attribution'.

WastingMyTime89 · on Aug 9, 2021

> Your statement, if true, would mean that most dictionaries would use exactly the same words to describe a given, well-specified scientific concept, yes?

No, it definitely doesn't unless you significantly extend what I said in a very uncharitable way to reach that point.

> Most definitions are not as simple as "cosine similarity".

But plenty are. As I said previously, you are going to be hard pressed to constate plagiarism on pure definitions unless you go towards extensive paragraph long ones which are more akin to explanations than what I would refer to as a definition as you did in the comment I am replying to. Actually, if you reread what you just wrote, you are yourself using the world explanation and definitely agree that that can be plagiarised as could very unorthodox and original forms of definition.

But when I read definition, what comes to my mind is akin to the cosine similarity example where even publications reuse mostly the same sentences while applying minor modifications to the subjects or adding an adverb. Thus me pondering the close proximity of the words definition and plagiarism in the original comment until I realized the whole thing was actually about an explanation triggering my reply to someone sharing my initial puzzlement.

eesmith · on Aug 9, 2021

Then I don't know what you mean by "definition".

I showed several examples of definitions for frame dragging, including ones where re-use would, IMO, constitute "a minor form of plagiarism".

Is your view that re-use of those definitions cannot be plagiarism? If so, why not?

WastingMyTime89 · on Aug 10, 2021

> Is your view that re-use of those definitions cannot be plagiarism? If so, why not?

Could you stop pretending you are not understanding my point considering your first example nicely underline it and you yourself admitted it would be laughable to call that plagiarism?

I never was arguing there that the copying of everything you might defined even tenuously as a definition never ever constitute plagiarism. That's a complete strawman. I'm going to stop wasting my time here.

eesmith · on Aug 10, 2021

You and sunshineforever didn't understand how one can plagiarize a definition - https://news.ycombinator.com/item?id=28111656 .

That's a universal statement.

I've been trying to argue that definitions can be plagiarized, with examples which are not "tenuous" but ones which are drawn directly from publications.

slapfrog · on Aug 9, 2021

If you want to quote a dictionary's definition, you should cite the dictionary you're quoting. If you're passing the dictionary's definition off as your own, word for word, that is plagiarism.

FabHK · on Aug 8, 2021

Some of these tortured phrases are great. My favourites:

"flag to clamor" for signal to noise

"individual computerized collaborator" for PDA (personal digital assistant)

"haze figuring" for cloud computing

"information stockroom" for data warehouse

"focal preparing unit" for CPU

"discourse acknowledgement" for voice recognition

"mean square blunder" for MSE (mean square error)

"arbitrary right of passage" for random access

"arbitrary timberland" for random forest

"irregular esteem" for random value

ETA:

"notoriety examination" for sentiment analysis

aaron-santos · on Aug 8, 2021

I enjoyed finding "counterfeit consciousness" for artificial intelligence. To me it evokes a kind of science fiction that's shown up occasionally on HN[1].

[1] https://qntm.org/mmacevedo

Freak_NL · on Aug 8, 2021

Also “haze figuring” for cloud computing.

It sounds like something you'd find in 30s, 40s, 50s sci-fi for sure! Like “visiplate” (E.E. “Doc” Smith, Heinlein) for a computer display screen. (Along with ticker tape printouts and tape reels in the far future of course.)

seoaeu · on Aug 8, 2021

Really highlights that the actual phrases don't make any more sense than the tortured versions, other than the fact that we've been hearing all of them for years so they now sound normal

samplatt · on Aug 9, 2021

Puts me in mind of Pratchett's parody of 'fuzzy logic', "Woolly Thinking".

rjbwork · on Aug 9, 2021

I'll certainly be watching this debacle play out with a hot fresh bag of Banged Grains!

laurent92 · on Aug 8, 2021

Ah, vapordecisionware. But that might be confused with regular management.

slowmovintarget · on Aug 8, 2021

Makes me want to put smog-hosting in my CV.

synquid · on Aug 8, 2021

The smog is just the Chinese cloud.

rhino369 · on Aug 8, 2021

I happen to be reading Dune today, and AI is referred to as counterfeiting the human mind

jareklupinski · on Aug 9, 2021

I searched github for "counterfeit consciousness" and stumbled into quite the rabbit hole https://github.com/search?q=counterfeit+consciousness

disqard · on Aug 9, 2021

Holy cow!

djmips · on Aug 11, 2021

I see nothing today?

abecedarius · on Aug 8, 2021

Reminiscent of https://en.wikipedia.org/wiki/Uncleftish_Beholding

tikwidd · on Aug 9, 2021

Reminds me of a Google translation of "bass" to Chinese as "low frequency fish"

zamfi · on Aug 9, 2021

Reading this list, it almost seems like these were created by looking up each individual word in a thesaurus, which of course destroys much of the meaning.

E.g.,

  Signal -> flag
  To -> to
  Noise -> clamor

…and…

  Data -> information
  Warehouse -> stockroom

This would be a lot easier than running through multiple translation steps (as proposed elsewhere here).

chc · on Aug 9, 2021

This exact process has been used by spammers for a long time now. It's called spinning, and it is basically the kind of thesaurus replacement you're describing here. When I read the OP, my impression was that these authors were running plagiarized portions of their articles through a similar kind of spinner.

native_samples · on Aug 9, 2021

NB: The paper itself describing this problem talks about the use of spinbot.com at the end.

djmips · on Aug 11, 2021

spinbot Tortured Phrases -> "tormented expressions"

nick__m · on Aug 8, 2021

If I was in a situation where I had to write on occupational health and safety in forestry I would shamelessly appropriate "mean square blunder" and "arbitrary timberland", those are superbly above the mean square!

numpad0 · on Aug 9, 2021

These sounded Chinese to me so threw this comment into Google Translate to figure what typical forward translations are and reverse translation of tortured versions will be. Bingo. Those are clearly translated from Chinese. Not sure if those are some form of plagiarisms as clamored or just a bad translation software prevalent in Chinese research communities though.

From top to bottom: 信噪比, 個人數字助理, 雲計算, 數據倉庫, 中央處理單元, 語音識別, MSE（均方誤差）, 隨機訪問, 隨機森林, 隨機值, 情感分析

Deep-fried versions: “旗幟到喧囂”, “個人計算機化合作者”, “霧霾計算”, “信息庫”, “焦點準備單元”, “話語確認”, “均方錯誤”, “任意通行權”, “任意林地”, “不規則尊重”, “惡名考試”

arthur2e5 · on Aug 9, 2021

This is not a machine translation issue. The deep fried versions in Chinese make about as little sense as it does in English. Again, this is caused by word-by-word theasurus-running most likely in English.

djmips · on Aug 11, 2021

Thanks for pulling those out. I enjoyed them. It does remind me that even before AI we had this kind of tortured phrases just in product descriptions of, for example, Chinese products. Like as a child I had been given a clone Rubik's cube labelled the Turrible Tetrahedron.

golemotron · on Aug 8, 2021

There might be a common concept between this, chaff[1] and Steven Pinker's Euphemism Treadmill.

[1] https://en.wikipedia.org/wiki/Chaff_(countermeasure)

cratermoon · on Aug 8, 2021

It's funny when it's done for laughs: https://www.instagram.com/nathanwpylestrangeplanet/?hl=en

netr0ute · on Aug 8, 2021

Reminds me of https://www.youtube.com/watch?v=GyV_UG60dD4

neoCrimeLabs · on Aug 8, 2021

I'm very tempted to introduce tortured phrases at work for occasional humor. For example, who needs "continuous integration" when you have "ceaseless incorporation"? Sometimes it's nice to see if anyone reads my notes.

In all seriousness though, I've experienced something similar before at a Japanese run American corporation as far back as the 90's. The problem was Japanese executives and executive assistants who didn't know American tech-jargon often resulted in accepting mangled suggestions by the spell-checker. A notorious example was the "Data Whorehousing" presentation, which somehow made it through several reviews and rehearsals before being presented to the entire American IT department at an all-hands meeting.

Clearly this made an impact as I remember it 23(ish) years later!

etempleton · on Aug 8, 2021

I often wonder while reading an academic paper how the writing could be as hopelessly bad as it is.

This type of manipulation and plagiarism may be partially to blame, but the academic writing style has also gone completely off the rails to the point that half the journal articles being published today read as if written by some kind of paper writing AI robot even when I am quite certain that that isn't the case. And no, I am not talking about cases where the author is writing in a non-native language.

I have a theory that it may have to do with imposter syndrome and a need to sound smart. The author, fearing that they don't really belong and at any moment will be found out, therefore never making tenure, starts jamming academic sounding words where they don't belong and stretching sentences with commas and semi colons until the whole thing is just as insufferable to read as it was to write.

There is also the possibility that there are just a lot of terrible writers out there.

zwaps · on Aug 8, 2021

I am sure this was not your intention or meaning, but please be aware that it is virtually impossible for a non-native speaker to write perfect English. English is a language you have to intuit. In contrast to other languages, it has very few fixed rules. Writing elegantly in English is most certainly an art form.

Of course, writing good science is hard enough for native speakers. It is very difficult for the vast majority of people on the planet - no matter how good their research.

And just so we are clear: Not everyone can afford professional editing services at every point in their career.

We meet in English under the premise that it allows for universal communication. In this, we accept that English natives are almost infinitely more privileged in writing, speaking, conferencing and networking. We also have to accept that the level of English proficiency varies, and - especially English - is easy to learn and so difficult to master.

heresie-dabord · on Aug 9, 2021

> it is virtually impossible for a non-native speaker to write perfect English. English is a language you have to intuit. In contrast to other languages, it has very few fixed rules. Writing elegantly in English is most certainly an art form.

Learning to write well in any language is difficult. English is not exceptional as a language. Its influence in economic activity is what gives it prevalance.

endtime · on Aug 8, 2021

I think you missed this part of the comment to which you were replying:

> And no, I am not talking about cases where the author is writing in a non-native language.

hoseja · on Aug 9, 2021

Oh that is certainly not true. English is a fuzzy merchant pidgin but that's precisely what makes it almost trivial to learn, plus the CIA in it's infinite wisdom has seen it fit to encourage production of deluge of entertainment media that make immersing oneself in English content easier than any other language.

LargoLasskhyfv · on Aug 8, 2021

I think at least skimming some edition of the

[1] https://en.wikipedia.org/wiki/The_Chicago_Manual_of_Style

and some of what is available under

[2] https://duckduckgo.com/?q=military+writing+guide

would be useful for american english and technical writing.

native_samples · on Aug 9, 2021

The issue here is not bad English in the sense you'd expect from a learner or someone who just isn't fluent. Nobody minds that, although you say not everyone can afford professional editing services: that's what journals are theoretically for!

The actual problem here is fluent English that is written in a totally bizarre style only found in academic papers. I've found that academic-ese is less of a problem in good computer science papers (like the one this article is about), but it crops up in some fields a lot. A trivial and not very important example is the way minor things are routinely described as "novel", a word you rarely find in everyday English, but in the research literature everything is "novel".

TheOtherHobbes · on Aug 9, 2021

There used to be bad writing contests for academics. One of the famous winners was Judith Butler's timeless[1]:

The move from a structuralist account in which capital is understood to structure social relations in relatively homologous ways to a view of hegemony in which power relations are subject to repetition, convergence, and rearticulation brought the question of temporality into the thinking of structure, and marked a shift from a form of Althusserian theory that takes structural totalities as theoretical objects to one in which the insights into the contingent possibility of structure inaugurate a renewed conception of hegemony as bound up with the contingent sites and strategies of the rearticulation of power.

You just can't argue with that.

[1] Ironically.

raincom · on Aug 8, 2021

A friend submitted a paper to a journal in humanities. The reviewer said "his English is informal". In other words, these reviewers are asking for stilted English.

Strilanc · on Aug 8, 2021

I also get this feedback on my papers. E.g. saying that it's written "more like a blog post".

Of course, they're not wrong. It is written more like a blog post. Because the writing style used in blog posts is hands down better than the writing style used in scientific papers. Blogs talk about the real reasons you worked on something, they go through simple examples, and they mention where you struggled and what you found confusing and what you tried that didn't work. All of these things are very useful for understanding, and in my experience almost entirely lacking from papers. Or at least, in my experience they're lacking from modern papers. I think in papers from 100 years ago the authors tended to talk more about their worries and their excitement e.g. [1].

[1]: https://youtu.be/RZfCqWZ8EAY?t=630

LargoLasskhyfv · on Aug 8, 2021

This makes me think of people smelling bad, in dark robes, wearing white powdered wigs, frantically using their https://en.wikipedia.org/wiki/Hand_fan

hutzlibu · on Aug 8, 2021

"There is also the possibility that there are just a lot of terrible writers out there. "

Surely they are and writing in a way that is easy to read and understand is an art in itself.

But I would agree, that the main reason is probably the intention to sound smarter, than they are. Whole scientific disciplines seem to live by that standard.

This is not limited to science though, I recall a german poet (I think Heinrich Heine) said about his fellow poets:

You only fly so high like the swallow, that no one can actually hear your singing.

yissp · on Aug 8, 2021

Good essay by Orwell that touches on this sort of thing https://www.orwellfoundation.com/the-orwell-foundation/orwel... I used to be guilty of writing this way and one of my high school English teachers recommended I read it. I've tried to take the message to heart ever since.

quotemstr · on Aug 9, 2021

Older academic writing was frequently beautiful. See the classic "On Cooling the Mark Out" paper. Every sentence is a joy to read.

https://infofranpro.wdfiles.com/local--files/19520101-on-coo...

gzer0 · on Aug 8, 2021

This is anecdotal evidence at best, but it is worth considering. I know of several individuals who were able to complete their entire Master's thesis utilizing a combination of AI generated content (GPT-3) and a paraphrasing tool.

The generated text was well over 50 pages, completely bypassed all known content/plagiarism checks and was even included in the Universities "exemplary examples". To this day, it is still there.

This is of significant concern as some of these GPT-3 based tools are now integrated within MS Word itself. Word 2021 allows for "add-ons", out of which I have noticed several third party content generation and paraphrasing tools.

bjourne · on Aug 8, 2021

I really doubt you can computer generate a Master's thesis. Completing a Master's thesis at an accredited institution is a heck of a lot of work and even a cursory reading of a thesis by an examiner, supervisor, opponent, or other interested party would give the generated content away. Maybe if you get your degree from a diploma mill you could get away with it, but then your degree wouldn't be worth toilet paper anyway.

I've heard similar stories about generated phd theses and it is even more implausible. The reason is that writing a thesis is much more than just producing a hundred pages or so of prose. Any university student can poop that out in a few weeks. The main job of a thesis is coming up with a research question, conducting an experiment or a study, and describe the results and how it fits in whatever niche of the scientific world you are working in.

hdjjhhvvhga · on Aug 8, 2021

I agree that in most cases it would be very difficult to do. But I can imagine some specific circumstances where it could be pulled off, possibly with some manual modifications: soft sciences like sociology (you can't imagine the amount of bs I've read during my college years), the subject matter being very different from the area your supervising prof specializes in, the topic that allows for arbitrary speculation, an underfunded university branch with profs having a more lax attitude.

andai · on Aug 8, 2021

https://xkcd.com/451/

haihaibye · on Aug 9, 2021

Computer generated articles from much older technology than GPT3 have made it through peer review

https://www.nature.com/articles/d41586-021-01436-7

bjourne · on Aug 9, 2021

That is different. People who submit computer-generated papers submit them to hundreds of journals. A few of them are bound to have so lax editorial standards that they are let through. They also risk nothing, while the student who is caught computer-generating their thesis will be thrown out. Furthermore, the amount of peer review-like process a thesis or dissertation goes through is an order of magnitude greater than the amount of peer review an article gets.

dash2 · on Aug 8, 2021

Oh my sweet summer child.

I regularly get dissertations with any or all of: barely readable English, useless empirics, half-baked research questions.

pottertheotter · on Aug 8, 2021

How are dissertations getting to you like that? When I did my PhD, no one would have allowed a PhD student to start writing a dissertation without first having sufficient research questions and then completing appropriate statistical analyses.

dash2 · on Aug 9, 2021

These are Masters students.

MengerSponge · on Aug 8, 2021

Does Poe's Law cover parody becoming real? Because BBSpot called this nearly 18 years ago: "Word 2004 to Pioneer AutoUnsummarize Feature" https://www.bbspot.com/News/2003/12/autounsummarize.html

13415 · on Aug 8, 2021

Please include a link to these theses, because as it stands this anecdote sounds extremely implausible. I don't know what university you were, but I've been at a few in Europe and at every one of them Master theses were evaluated from the start to the end by several humans. GPT-3 is unable to produce even two pages of coherent text, let alone 50 pages good enough to be accepted as a Master thesis in any discipline at any university I could think of (even the worst ones).

I can imagine that plagiators use paraphrasing software quite extensively, though, and that it is a problem.

gzer0 · on Aug 8, 2021

Let me clarify:

It was not all automated, there was a fair bit of manual intervention needed. I understand your concerns and they are valid and this is why I preface my statement with "anecdotal evidence". What I write is most certainly not the entire story and a fair bit of detail is left out.

It should be known that this is widespread across multiple industries and this will only become more of an issue in the future.

This is a US-based institution, fully accredited.

throwawaygh · on Aug 8, 2021

> Master's thesis

I don't doubt this at all, and I have no doubt that GPT-3 with a bit of human editing can spit out something better than the lower third of masters students at corn row colleges.

Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care; even educators, who at least nominally see intrinsic value in education, go to borderline diploma mills to get that union-mandated raise at minimal effort.

OminousWeapons · on Aug 8, 2021

> Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care; even educators, who at least nominally see intrinsic value in education, go to borderline diploma mills to get that union-mandated raise at minimal effort.

I don't mean this rudely, but it is attitudes like this which cause the CS interviewing process to be 100X more painful than the interviewing process in any other field: "I don't trust your credential so I demand you prove your competence to me on the spot and let's do 5 rounds of interviews just to be sure."

tgsovlerkhgsel · on Aug 8, 2021

I've found very limited correlation between credentials and relevant skills. In CS, the actually important skills are often self-taught or gained through experience.

throwawaygh · on Aug 9, 2021

CS definitely isn't the only field where Master's degrees don't really confer any additional trust.

native_samples · on Aug 9, 2021

But that happens because of the common experience interviewers have of interviewing someone with a PhD or (even worse) various corporate credentials and discovering they can't actually create and compile a program that loops over an array.

quotemstr · on Aug 9, 2021

> I don't trust your credential so I demand you prove your competence to me on the spot

Well, yeah?

bjt · on Aug 8, 2021

> ... masters students at corn row colleges.

First time I've heard the term "corn row colleges". Google's not bringing up anything that looks relevant.

I suggest picking something else. Given that "cornrows" are a predominantly black hairstyle, the term reads like a racial slur.

throwawaygh · on Aug 8, 2021

The name originates as a perforative for small tuition-dependent non-research teaching colleges. Those colleges mostly catered to pastors, teachers, etc. and were located in small towns. The historical reasons that these institutions are now "in the corn fields" provides an interesting topic for historical inquiry. Perhaps many are in old rail-road or factory towns that have since languished, but schools that were similar at time of founding and didn't die are in industrial and post-industrial hubs where they attracting the attention needed to thrive. Who knows. The point is that they are small, inconsequential institutions that are predominately located in rural and semi-rural towns.'

The name now includes small state schools -- usually branch campuses with lower enrollment and no major (R1) research output.

(NB: corn row colleges are also by definition non-elite, so small liberal arts colleges with billion dollar endowments which might otherwise count, don't).

Many such institutions have since started offering graduate (or at least non-bachelors) degrees and certificates that are somehow even more worthless than their undergraduate programs.

Apparently the name has a lot of different meanings these days -- see sibling comments -- but it has DEFINITELY never been meant as a racial pejorative. If anything, exactly the opposite, since most of those "crap-tier midwestern/southern colleges" cater to 99.99% WASP social networks (the P is even explicit).

analog31 · on Aug 9, 2021

As I understand it, the Northwest Ordinance provided for the funding of schools, resulting in the "land grant" college system that still exists today. The most well known land grant colleges are of course the "flagship" universities of the Midwest states. But the states also chartered many smaller, regional, and specialized schools.

I live in Wisconsin, and the state university system is chartered to serve the needs of the state. There are too many students to send them all to UW in Madison, so there are a number of smaller regional universities, many of which now offer graduate degrees, plus an even larger number of "commuter" and "satellite" schools, and an elaborate technical college and trade school system. Not everybody can get a degree at a residential college. Life gets in the way.

We can debate the relative prestige of these colleges, but I've worked with people who attended the regional schools, including many engineers and computer programmers. All I can say is, send me more.

The colleges that catered to pastors were largely private, and in my home state, there was one in every town. Some of them emerged as full service 4-year colleges with additional programs. My undergraduate college was nominally "Christian" but I got a secular science education there, and it was well ranked in science. It also adjoined a seminary where I never set foot.

throwawaygh · on Aug 9, 2021

Yes. As far as I understand, "corn row colleges" wasn't originally a reference to branch campuses of state schools. "corn row college" referred exactly to those tiny private places.

I agree with your assessment that most of Wisconsin's land grants are quite good, btw. YMMV in other states, unfortunately.

myself248 · on Aug 8, 2021

> The point is that they are small, inconsequential institutions that are predominately located in rural and semi-rural towns.

Wow, coastal elitism much?

There are surely many degree mills and garbage universities, but to conflate their worth with their location is both injurious to discourse, and incorrect.

throwawaygh · on Aug 9, 2021

> Wow, coastal elitism much?

I didn't say anything about geographic regions.

I said rural.

The coasts of plenty of rural areas, and plenty of institutions that fit the "corn row college" mould.

CRConrad · on Aug 8, 2021

I (non-American, not a native English speaker) thought it was a pejorative reference to rural universities; ("hick" / "rube") state universities of Midwestern states etc.

voldacar · on Aug 9, 2021

Corn is a plant that is commercially farmed. Corn is planted in linear rows. These rows of corn are called corn rows.

stan_rogers · on Aug 9, 2021

The hairstyle is named after its resemblance to the agricultural arrangement, not the other way 'round, and the name of the hairstyle only makes sense if you're aware of how fields are planted. You have to look really hard to make a racial slur out of it.

selimthegrim · on Aug 8, 2021

No meaning in the cornfields, ie second tier state universities.

aliswe · on Aug 8, 2021

nono, it means the long line of colleges that are virtually indistinguishable from eachother.

Igelau · on Aug 9, 2021

I think it's a tortured phrase for Ivy League.

wolverine876 · on Aug 8, 2021

> Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care...

People don't care about masters degrees engineering, law, business, art, etc. etc.? Try applying for many jobs without one, or with one from lower-ranking colleges.

The Chronicle of Higher Education article recently on the HN front page said that masters in some fields, they give the example of 'positive psychology', are indeed cash cows. But in the example, that degree was not part of the actual Department of Psychology, which is taken very seriously.

fighterpilot · on Aug 8, 2021

In engineering and financial services, my experience is that they don't care. Some weight is given to a PhD but not a Masters.

throwawaygh · on Aug 8, 2021

Engineering and CS masters are 100% cash cows that no one cares about. I promise you.

I haven't even heard of a Masters in Law (law degrees are doctorates), but I can't imagine it's worth the paper it's printed on.

MBAs are worthless unless they're from a few good places, and even then the brand and networking does a lot of the lifting.

wolverine876 · on Aug 8, 2021

> law degrees are doctorates

Law degrees are called Juris Doctor but are professional degrees, like MBAs. You aren't required to publish original research (afaik) and in the US they were formerly Bachelor of Laws (LL.B.) and then renamed (as I understand it).

The doctorate is Doctor of Juridical Science (J.S.D.). You can also get a Master of Law (LL.M.).

CRConrad · on Aug 8, 2021

> Masters degrees are cash cows, which is why no one in unregulated industries cares about them.

So, uh, is Business Administration a regulated industry?

throwawaygh · on Aug 8, 2021

No one cares about MBAs. The networks can be helpful, but, unlike JDs/PhamDs/etc., an MBA from a no-name college & weak alumni network isn't worth the paper isn't printed on.

sellyme · on Aug 9, 2021

...is it even an industry?

derefr · on Aug 8, 2021

How do you feel about doctorates?

throwawaygh · on Aug 8, 2021

Depends. University of Phoenix awards doctorates that take 3-4 years (HUGE red flag -- the best and brightest phd students might get out in 4 years if everything goes perfectly; an "expected time to graduation" of anything less than 5 years is almost certainly a worthless degree).

Those doctorates don't require much more than taking some coursework and paying a boatload in tuition. Basically an expensive and length online masters program. Not worth the paper they're printed on, unless you're employed by the government or in a union job that mandates raises for education attainment.

As a general rule of thumb, PhDs from R01 universities that are paid for by the university through research assistantships or teaching assistantships are generally a good signal of at least minimal training in research skills.

Another good general rule of thumb is that paying for a PhD -- beyond perhaps some MD/PhDs or maybe nursing phds, stuff like that -- is always a good sign of someone who has both a meaningless degree and also poor reasoning/research skills.

But anyways, real doctorates outside of a few fields (e.g., pure math) usually come with a non-trivial publication record that speaks for itself. You don't even need to know that the person has a doctorate; you can just read their papers and a rec letter from an advisor describing the student's role in each paper.

(I'm excluding discussion of professional degrees like JDs, PharmDs, etc. which are technically doctorates but sort of their own class.)

lyaa · on Aug 8, 2021

Length of PhD programs is an indicator that should be considered in context. UK Universities, for example, often have research PhD programs that take 3 years to complete and they are legitimate.

throwawaygh · on Aug 8, 2021

Yes, my comment is specific to US (where, additionally, it's somewhat uncommon to have a masters degree prior to starting the phd).

thebooktocome · on Aug 8, 2021

NB: I'm only speaking about the math doctorate as it currently stands in the United States.

Due to the current market saturation of math doctorates, any pure mathematics PhD worth the paper its printed on will also probably come with a non-trivial publication record. The exceptions I can think of are high-risk high-reward areas like cutting-edge number theory (I had a friend go eight years without publishing, which, yikes, but his thesis was semi-revolutionary (or so I'm told)) or, I guess, suitably abstract category theory (though the people I follow in this area seem to publish lots of interesting papers, like the Baez school or the homotopy type theory people; your mileage may vary).

It's really too bad. One wonders why we can't simply ax the entire advisor-candidate system (with all its myriad opportunities for physical, emotional, and even sexual abuse) and certify new candidates by saying: "You're a doctor of mathematics when you get five professors to sign off on 3-5 papers you've had published."

derefr · on Aug 8, 2021

> certify new candidates by saying: "You're a doctor of mathematics when you get five professors to sign off on 3-5 papers you've had published."

Or one big one.

Basically, take the "honorary doctorates" some Universities give out to people retrospectively to people who have made major contributions to their fields; do it more often; and then make it the only path to getting a doctorate, such that they're no longer "honorary" at all.

hellbannedguy · on Aug 9, 2021

Wow---you are right.

University of Phoenix does offer Ph.D's.

I would have never guessed.

jimmaswell · on Aug 9, 2021

Why is a Master's not seen as at least marginally better than a Bachelor's?

lostlogin · on Aug 8, 2021

> third party content generation and paraphrasing tools.

Presumably this is an arms race against things like https://www.turnitin.com/

Empower students ‘to do their best, original work’ and this is what you get. Though what the alternative is, I have no idea.

laurent92 · on Aug 8, 2021

> I know of several individuals who were able to complete their Master’s thesis utilizing…

Doesn’t it stay published forever? Might be a shame for the someone during their career.

On the other hand, even a chapter of Mein Kampf was accepted in 20 journals, after replacing the old word with newer versions. Human reviews are hard. Maybe we should put computers in charge of reviewing papers, they’d recognize the work of AI quicker?

https://www.foxnews.com/us/academic-journal-accepts-feminist...

phkahler · on Aug 8, 2021

Sounds like automated review of automatically generated papers. And people pay money for that...

lelanthran · on Aug 9, 2021

> I know of several individuals who were able to complete their entire Master's thesis utilizing a combination of AI generated content (GPT-3) and a paraphrasing tool.

That's nice, how did it do the defense?

segfaultbuserr · on Aug 9, 2021

I searched the keyword "SEO" and didn't find any match in the comments here, I'm surprised.

For anyone who has been a webmaster, one can immediately recognize it's an extremely common technique in the blackhat SEO scene for decades, used by content farms everywhere. One just copies articles from somewhere else, replace all words with dictionary synonyms to evade the search engine penalty, and fill the resulted websites with spam.

Perhaps it's not as popular in the English world, but common in China, and is a standard tool included in all blackhat SEO software. And no, it doesn't work well, the output is gibberish too in spite of the language differences. Oh, and the article says:

> A high proportion of these papers came from authors in China.

Exactly what I expected. The spammers found a new market, apparently. It's sad to see that some scientific papers and journals are literally becoming blackhat SEO spam and content farms.

funfunfunction · on Aug 9, 2021

This is called content spinning.

ksaj · on Aug 8, 2021

I noticed this happening in other areas a few years ago, but with faked blogs. The titles and subjects would sound interesting, but then when you tried to read them, you'd need a specialized decoder to get through the utterly baffling word replacements. But they already got their ad revenue by the time you notice the article is complete gibberish.

The first one I found was about dog illnesses. They kept referring to dogs with phrases like "Your domesticated canine," and it was quite a chore trying to figure out most of the symptoms that they were listing. "Heart worms" was translated to "love snakes," which I thought was delightful.

dimatura · on Aug 8, 2021

Yes, this may be a specific example of a more widespread phenomenon. There's certain websites out there that republish articles from well-established publications (e.g., New York Times) almost word for word, except that they are rife with synonym swaps that may or may not make sense in context, presumably to escape some kind of automated copy detection. Results can be amusing. For example, the copied article said "“Drukqs” acquired a blended essential response..." where the original said "“Drukqs” received a mixed critical response...".

_kst_ · on Aug 9, 2021

I've seen this kind of things in articles posted to social media sites (Quora and Facebook, but it probably exists elsewhere).

I don't have a specific example at hand, but it's typically an article a few paragraphs long with really strange phrasing, so strange that it's not explainable by the author not knowing English well.

In a handful of cases, I've managed to find the original source. Common phrases are systematically replaced by ill-fitting synonyms.

I suspect the motivation is to avoid accusations of plagiarism (though I don't know what benefit the posters derive from doing this).

armchairhacker · on Aug 8, 2021

Nowadays too many real blogs are padded with weird phrasing and sentences which don't really mean anything.

In this case, sometimes you get lucky and can actually find meaningful information between the padding. But sometimes you just read an article that takes 5 paragraphs and 500 words to say "we don't know".

tasty_freeze · on Aug 8, 2021

I ran into something like this in an amazon review once. I was looking for a book of transcriptions for the instrument I play, and two of the handful of reviews used the same awkward phrase: "music goals". I scratched my head and then realized what probably happened. They weren't native english speakers and they were being paid to write reviews and they had gotten the wrong synonym. "music goals" was supposed to be "music scores".

guyromm · on Aug 8, 2021

Back in 2004 or so, I was building a distributed CMS with the goal of creating artificial "link pyramids" with the purpose of SEO, which was a rather new thing at the time.

Content generation was one of our bottlenecks, and as Google was already rather successful at detecting duplicate content, we were looking for a way to "uniqify" posts that would be used to stuff sites intended for googlebot, but not humans.

One of the methods that worked was taking source English content, running it through Babelfish, the Altavista translator to French, Spanish or German, and then using the same method to translate it back to English.

This resulted in texts that did not make much sense to humans, were full of precisely such "tortured phrases" but which were considered unique by Google.

thisrod · on Aug 9, 2021

Authorship is the metric that scientists get paid for, so of course it has been thoroughly corrupted.

Fake papers and plagiarism are the most blatant form of corruption. They tend to come from certain, let's say, large countries with less developed scientific cultures. Those countries need to put an end to it, because the rest of us keep having to work harder to suppress the racist impressions that we're bound to form of colleagues who look and sound like the cheats.

In more traditional scientific countries, the corruption is more subtle. Today, many groups publish every paper with half a dozen authors, and no indication of what each of them contributed. This enables the professors who run those groups to manipulate authorship more or less as they please, and have total control over who gets to have a career in science. It turns out that absolute power corrupts senior scientists as absolutely as it does other people.

No doubt there are more clever ways to game the system, that I haven't noticed. As long as million dollar grants and first-world citizenship keep being doled out for something as contrived as scientific paper authorship, corruption is inevitable.

FabHK · on Aug 8, 2021

And the journal involved, Microprocessors and Microsystems, is an Elsevier journal. Huge surprise. I am glad the publisher earns their outrageous fees by careful screening, peer-review, and editing of submitted manuscripts. /s

Ceterum censeo Elsevier(um) esse delendum.

CRConrad · on Aug 8, 2021

Elsevirus?

Sebb767 · on Aug 9, 2021

> [...] the editor of Microprocessors and Microsystems began having concerns about the integrity and rigour of peer review for papers that had been published in some of the journal’s special issues.

> The journal’s publisher, Elsevier, launched an investigation. This is still under way, but in mid-July the publisher added expressions of concern to more than 400 papers that appeared across six special issues of the journal.

I hate to open up this topic and I hate to pick on people that are trying to fix their mistake even more, but oh boy. Elsevier has been a pain in the butt for universities and researchers alike. They leech money from both sides of the community, they sue people trying to bring science forward and they gate scientific success. And their literally only reason to keep existing was to prevent exactly this.

I've never been a big fan of the current scientific publishing model. But Elsevier is a top publisher. It's pretty damming that they have one - highly overpaid - job and they don't even do it.

ipsum2 · on Aug 8, 2021

A high profile case (on the internet) similar to the one described in the article is when Siraj Raval plagiarized a paper on quantum ML and made some amusing replacement phrases:

complex Hilbert space -> Complicated Hilbert space

Quantum gate -> Quantum door

https://www.theregister.com/2019/10/14/ravel_ai_youtube/

okeuro49 · on Aug 9, 2021

I was reading an article on Nature and noticed their definition of "woman" didn't make sense. Tortured phrases isn't confined to plagiarism avoidance.

> Unfortunately, fibroids are just one of many understudied aspects of health in people assigned female at birth. (This includes cisgender women, transgender men and some non-binary and intersex people; the term ‘women’ in the rest of this editorial refers to cis women.)

The article says it only refers to "Cis women" (presumably "cisgender women"), however the article continues to talk about rugby and brains, in which case the word "woman" would not only refer to "cisgender" women, but also to those people who identify as non-binary or transgender men, as surgery and hormone therapy (if that is undertaken by the individual) won't change brain axons, or the person's physical stature.

The article then talks about "male animals", not "animals assigned male at birth". There's no explanation given why animals are not similarly "assigned" a sex.

https://www.nature.com/articles/d41586-021-02085-6

qwerty456127 · on Aug 9, 2021

> software that attempts to disguise plagiarism

AFAIK today it has became necessary to "disguise plagiarism" even when you are not plagiarizing anything because bullshit "anti-plagiarism" software would detect many phrases similar to what somebody else already used. I believe the war on plagiarism brings little good in exchange for the hassle.

FabHK · on Aug 8, 2021

Got to agree with the conclusion of the paper:

> In our strong opinion, the root of the problems discussed in this work is the notorious publish or perish atmosphere (Garfield, 1996) affecting both authors and publishers. This leads to blind counting and fuels production of uninteresting (and even nonsensical) publi- cations.

Animats · on Aug 8, 2021

This is a major failure of Elsevier.

Here's "Microprocessors and Microsystems."[1] This is supposed to be about embedded systems, which is generally a no-bullshit field. I'd never heard of this journal. People read Electronic Design, EE Times, "Embedded.com", maybe Control Systems Journal, etc. Those have either articles about how to do something, or "why what we're selling is great" articles.

Now look at the article titles in Microprocessors and Microsystems.[2] Here are the first three.

- COPS: A complete oblivious processing system

- A perceptron-based replication scheme for managing the shared last level cache

- Efficient underdetermined speech signal separation using encompassed Hammersley-Clifford algorithm and hardware implementation

Now those might be legitimate, although what they're doing in an embedded systems journal isn't clear. They're all behind a paywall, so it's hard to tell if they're any good.

"Oblivious processing" is a security concept. That belongs in a journal on security and encryption, where the crypto people will know what holes to look for. (Microsoft was doing work in this area in 2013, but I don't think a product emerged. If you can make it work, some cloud computing company can use it.)

Cache management belongs in a journal on CPU design, where people who have struggled to make caches work will take a look. There are people using perceptrons for this, which makes sense; a cache has to guess which things will be reused. (If this works well, someone should be trying it in web caches such as NGINX to improve cache hit rates.)

Signal separation is an active field, but this isn't a journal where you'd expect to find articles on it. Wikipedia has a good article on signal separation. The history of that article indicates attempts to sneak in citations to sketchy articles. No idea if the Hammersley-Clifford algorithm is even relevant. (If it's a significant advance, there's commercial value in this in improving audio quality for conferencing systems.)

So these papers were all sent to a journal where the odds of getting published are good, and the odds that the editors have no idea about the subject matter is high.

Why is Elsevier even publishing this journal?

[1] https://www.sciencedirect.com/journal/microprocessors-and-mi...

[2] https://www.sciencedirect.com/journal/microprocessors-and-mi...

[3] https://en.wikipedia.org/w/index.php?title=Signal_separation...

jszymborski · on Aug 8, 2021

As someone who has had to write technically in a second-language (French, funding agencies in Quebec), this rings particularly true.

Luckily, I'm fluent enough to recognise the particularly egregious examples, but finding good translations for technical words is hard!

One example that comes to mind is when trying to translate the phrase "data feed" which came back as "alimentation données" which ostensibly means "animal feed data".

If you're looking for a lot of English-to-French translations of technical terms, check out the theses any English University in Quebec (McGill, Concordia, etc..). They're made public online [0]. Can't vouch for the quality as I'm sure there are plenty that just use Google Translate, but everyone I know has their abstract edited by a francophone in their field.

A good way to validate translated technical terms is to just give them a quick internet search on e.g. DuckDuckGo or Semanticscholar.

[0] McGill's is https://escholarship.mcgill.ca/

doubtfuluser · on Aug 8, 2021

Maybe a future direction would be to train new models to identify plagiarism by training on this information. Use „non matching backtranslations for training classifiers. It’s again the typical cat and mouse game I guess

tarboreus · on Aug 8, 2021

Or someone could...read the papers.

wereHamster · on Aug 8, 2021

It's the classical problem of people trying to find technological solutions to social problems. If plagiarism and fake research is still a problem after we've applied technology to fight it, clearly we haven't applied enough of it.

waterhouse · on Aug 8, 2021

Sometimes technological solutions work really well to solve social problems. For example, at one point, one person using the internet would tie up the phone line for everyone else in the house, and vice versa. Negotiating this shared resource could be considered a household social problem. But now there's no such interference, and most people have their own cell phones.

tnzm · on Aug 8, 2021

This is a social problem around the shared use of a technological resource. I'm reminded of the old saying, "computers can only solve problems that are created with computers".

But then again you can view _all_ solutions to social problems as inherently technological in the broader sense; I adhere to that paradigm.

robertlagrant · on Aug 8, 2021

That saying seems silly. Computers (i.e. Zoom) help with the problem of needing socially distanced education during Covid lockdowns.

PragmaticPulp · on Aug 8, 2021

The number of papers being published is growing at a staggering rate. This requires proportional growth in the number of people reading these papers, which inevitably means the plagiarists and cheaters themselves are being pulled into the review system as well. They don’t care about letting fraudulent papers slip through because they never really cared about the science in the first place.

They see it as a game that they’re playing and they’re doing their best to put as little effort as possible into the game while extracting as much reputation upside as they can.

We really need to make publishing fraudulent papers a career-ending move across academia and even the industry. The only reason this continues to happen is because it has a lot of upside but very little downside. Caught publishing fraudulent papers? Oh well, just leave them off your resume and apply somewhere else.

pas · on Aug 8, 2021

Referees have no real incentive to keep quality high. They already don't get anything in return for doing it. (At best they do it for reciprocity/goodwill.) Papers are usually hard to follow, replication rate is abysmal, etc. The incentives are all set for publishing, not for making real progress.

PhasmaFelis · on Aug 8, 2021

I've been seeing this in news articles as well. Swipe someone else's article, run it through a synonym-replacer algorithm, and have Reddit bots post it on a bunch of news subs. Presumably the thesaurus work fools Google's just-a-copy detector.

It's the next step in clickbait monetization. Why settle for low-effort content when you can have no-effort content?

newsclues · on Aug 8, 2021

Next will be a hybrid model where no effort content that begins to trend virally gets a human to tweak it for optimization.

Rewriting headlines that bots wrote and A B testing humans vs Software

withinboredom · on Aug 8, 2021

Even more entertaining would be all the traffic being from bots trying to do the same thing.

lettergram · on Aug 8, 2021

This is pretty much how corporate news works imo. I can’t tell you how many times I’ve seen one article then generate a million more.

My favorite example, go to google or DuckDuckGo and type:

“Xxx number hospitalized” or “yyy new cases”

You can type almost any number and get a ton of articles. Not exactly a reprint, but they all seem almost generated

coldpie · on Aug 8, 2021

Thanks to advertising as a business model.

wolverine876 · on Aug 8, 2021

Could you share any examples?

PhasmaFelis · on Aug 9, 2021

Don't have any of the actual content handy, but here's an online tool that advertises itself for that specific purpose: https://spinbot.com/ Google "rewriting tool" for more examples. Apparently it's a lot more common than I'd realized.

dmos62 · on Aug 8, 2021

I feel like only the highest profile journals can be trusted at this point. How long will it take academia to adapt?

sampo · on Aug 8, 2021

> I feel like only the highest profile journals can be trusted at this point.

The highest profile journals (Nature, Science, The Lancet in medicine, ...) have some tendency to go for sensationalism. They want to publish radical, ground-breaking research more than there is actual new ground-breaking results happening. So they also end up publishing mediocre research presented as ground-breaking, and some less-than-accurate research where results are exaggerated to make them look ground-breaking.

Animats · on Aug 8, 2021

Um, yes. "Nature" used to have a great reputation. Supposedly it still does in bio. But battery articles in Nature are just awful. They keep blowing up "minor advance in surface chemistry" into "10x better battery that costs 10x less Real Soon Now".

(I'd like to see EV World or something else in that space reprint old articles as "1, 5, and 10 years ago in battery hype".)

petschge · on Aug 8, 2021

Yeah in my field the general attitude is that Nature isn't all that great. I have heard the phrase "it was published in Nature but might still be right" more than once.

08-15 · on Aug 8, 2021

Why do you feel that, though?

My favorite counter example is "A Draft Sequence Of A Neandertal Genome". The article was accepted by both Nature and Science before it was written. The authors chose to publish in Science, because Science offered more on the side: the title page and an unlimited(!) number of "contributed" (this means unreviewed) companion papers. The article itself was about 20 pages of drivel; all the substantial content was relegated to the 200(!) pages of "Online Supplemental Material". Nobody ever read, let alone reviewed, all of that.

After that, I can't trust either Science or Nature, which offered pretty much the same crooked deal. If those two aren't "highest profile", who is?

davrosthedalek · on Aug 9, 2021

For physics, Physical Review Letters is pretty good.

robwwilliams · on Aug 8, 2021

Not even those! The impact factor of a journal is a terrible guide to quality. It is more appropriately thought of as a measure of scientific sex appeal.

You must read each paper to judge its merits. Lots of junk gets published in top ranked journals.

wmf · on Aug 8, 2021

There's an order of magnitude difference between the worst paper published in a good venue vs. the "tortured" fake papers in fake journals though.

AlexCoventry · on Aug 8, 2021

That's a low bar, though. The point is that it's very difficult to judge the scientific merits of a paper without actually reading it. (And even then, it's easy to be fooled.)

nick__m · on Aug 8, 2021

  Lots of junk gets published in top ranked journals.

A lot more get published in vanity journals, so I use the impact factor as a first pass filter: I avoid papers from journals not listed the JCR¹ or those with a factor below 1.000.

I assume, maybe naively, that if an important finding were to be published in such low quality journal, it would eventually get published in a more legit publication.

1- https://www.researchgate.net/publication/342623066_Journal_C...

raincom · on Aug 8, 2021

I thought top ranked journals have good reviewers, since the editorial board consists of researchers/professors from top notch schools. Can you share your thoughts why junk get published in such journals? Has it to do with collusion or reputation-laundering or more?