I have encountered something similar to this for a submission that I reviewed for a scientific journal. I will not list any names or give much detail past those generalities, but I pointed out that the authors were misusing a particular technical term. In my review I defined the term and explained it briefly. I asked the authors to revise their submission accordingly. The paper was not bad but the authors did not know English very well, so it was quite difficult to read. That was its main problem. However, when I received the revised submission, I noticed that the authors plagiarized my definition and explanation almost word for word (from my confidential review). I pointed this out to the editors and they said to just reject the paper with the stated reason being plagiarism, which I did. The journal ended up rejecting the article, but I discovered it a few years later in a different journal. The plagiarized section remained, but the authors swabbed out a lot my phrases for these kind of "tortured phrases".
That said, the authors did not fabricate their research (as far as I can tell). They just did not know English well, so it was easier to just copy things that you know are phrased well than to learn to write English well. As the saying goes, do not attribute to malice what can be explained by ignorance or laziness. That does not excuse it but it makes it more understandable.
I agree with the article that this is probably just the tip of the iceberg. There are likely many more lesser evils being committed with similar tools that are just much more difficult to spot. I would not have noticed my particular example if I were not a reviewer for the paper, for example. It makes me wonder how big the problem really is.
As a native English speaker, but not an academic researcher, I would have naively done the same.
To preface: I am speaking in ignorance of the mechanisms of credit and advancement in your field, so I'm undoubtedly overly harsh. I'm not even trying to be fair, because I admittedly don't have the knowledge to do so.
I know academia is a different world from private sector industry, but I would think if the point of a confidential review is for you to do anonymous work to improve someone else's credited work, and their work was improved by incorporating your feedback, you would be happy with the outcome, or else why are you participating in a confidential review process in the first place? There are different mechanisms for publishing words that you want credit for.
When someone incorporates my feedback on code or documentation word-for-word, I might in the worst case be suspicious that they are trying to get my approval without engaging with my criticism, but in most cases I'm flattered that they respect my idea enough to put their name on it. Although, in my world, putting your name on something is more about responsibility than credit. The command is called "git blame" and not "git credit", after all.
Wanting them to incorporate your feedback, but also wanting them to make some change to the wording to avoid plagiarism, smacks of how freshman essays are graded, not how real work gets done.
Like I said, I'm not trying to be fair and don't have the right background to be fair to you. I only speak up because I grew up in an academic family and know that there is a presumption that academic work is more idealistic, more altruistic, and less mercenary than private sector work, and I think it's worth pointing out when the reverse is true.
I almost replied that I had the same view after reading your comment but before opening the article in question. After reading the first but of the article, I think the GP is possibly saying something else, when taken in context, which is that this is exactly the behavior that would be expected from someone that's plagiarizing something else in general using the technique from the article, and all the criticism did was help them get a better plagiarized paper.
I can't speak for GP as to whether this was true, or if there are different norms for that sector, or the author was slightly aggressive in asserting their rights, or the assumptions we're making about the content of the criticism are off, but given the article brought a new dimension to it, I thought that worth mentioning.
> I would think if the point of a confidential review is for you to do anonymous work to improve someone else's credited work
This is one outcome of peer review, and is an explicit goal of (good) peer reviewers. But it's definitely not the main goal of peer review. The main goal of peer review is assessment. Improvement is something that you can strive for as a secondary outcome, but it's not the main point.
This is a significant and important contrast with code review. Peer review is NOT analogous to code review!
NB: The code review style of improvement-focused review also happens in academia! But it happens within groups rather than between groups. I.e., advisors or post-docs in single research group or university critiquing one another's work will behave more like a code review. Peer review is different.
The same is true in industry, btw. Think of peer review as the thing that happens when a regulator reviews the code from a medical device. They're not interested in making pull requests and doing your work for you. Although they do in principle want you to succeed in your goals, and might provide some feedback along those lines, they're making a yes/no decision. Different function.
> their work was improved by incorporating your feedback, you would be happy with the outcome
There are really two concerns here:
1. confidentiality, and
2. plagiarism.
The first item is probably more important than the second. The typical expectation of reviews is that they are anonymous and private unless stated otherwise (eg openreview). It's incredibly bad form to publish a private correspondence without first asking for permission.
Copying a reviewer verbatim without permission is plagiarism, but it's not really something that anyone actually cares about, per se. I've sometimes asked for permission to incorporate components of reviews verbatim into my papers, and never received anything except enthusiastic "of course". I assume the same would be true in the case above. But even in those cases I say something like: "as helpfully observed by an anonymous reviewer of this paper, [insert quote]".
The key point is that I don't go around publishing explicitly confidential correspondences without permission.
> or else why are you participating in a confidential review process in the first place?
Exactly. Confidential.
Again, surely you can see how publishing someone's confidential words is quite rude, even if the person wouldn't mind those words being published if simply asked.
> Like I said, I'm not trying to be fair and don't have the right background to be fair to you. I only speak up because I grew up in an academic family and know that there is a presumption that academic work is more idealistic, more altruistic, and less mercenary than private sector work, and I think it's worth pointing out when the reverse is true.
Academic work is all of those things in terms of goals, not necessarily in terms of process. I don't know anyone who has a passing familiarity with Academia and doesn't realize that it is incredibly competitive.
I guess I don't understand what kind of "confidentiality" was violated. To me a violation of confidentiality would be revealing your name and your role in the process, or publishing sensitive information they got via the process, maybe publishing some idea or data you shared with them that you intended to publish yourself later. But they didn't use your name, and you didn't share any sensitive information with them, so I don't get it.
Plagiarism I guess I can see, though the cynic in me says if it was so obvious that their native language wasn't English, it was in the best interest of readers for them not to do the dance of paraphrasing away the plagiarism.
> I don't know anyone who has a passing familiarity with Academia and doesn't realize that it is incredibly competitive.
Yeah, my family who are in academia talk all the time about how petty and cutthroat it is, and I'm sure that played a role in scaring me away from trying it myself, but I can tell they think business must somehow be worse. I get the feeling that deep down they believe they’re seeing a version of human behavior somewhat elevated by the ideals of academia, and however bad it is inside academia, outside, in environments ruled by cruder values, it must be worse.
In your review, did you suggest the definition and explanation that they used? In this situation, would have an acknowledgment at the end have been enough? In my mind, it seems like you all had a conversation and the authors took up your suggestions as the reviewer.
No, I did not suggest the definition and explanation as content for them to use. I was trying to explain a concept that they discussed incorrectly multiple times in the paper. It is an advanced concept that might not even appear in graduate-level courses on the subject, so I can understand why they did not understand it fully. That said, I did not give them permission to copy my words there. If there are any particular changes I want the authors to do I put them in quotes. This wasn't in quotes. It was an explanation for their own benefit so that they can correct the mistakes in the paper (by re-writing it).
Once I re-read the submission I wanted to reject it immediately, but I realized that I should get a second opinion first. So I contacted the editors, who agreed that it was blatant plagiarism. Hence, they rejected the paper once I recommended rejection in my second review. So this wasn't just a conversation where I made some suggestions and the authors used them. Even the editors thought it was plagiarism once they looked at it.
An acknowledgment would be impossible because the review was single-blind. The reviewers knew the identities of the authors but not the other way around. What the authors should have done was just re-phrase where they used the term in the paper. They didn't even need to copy my explanation, to be frank. The paper would worked fine without the paragraph they copied. If they just re-phrased the relevant parts no other changes would have been needed and this whole thing could have been avoided.
In the absence of an explicit directive or request from you, given that the authors are from a different culture, how do you expect them to know what was required by them?
I don't mean to be snarky or accusative. Your comment was thoughtful, articulate and detailed, which tells me you are a sophisticated communicator.
It's a fair question. They were foreigners submitting to an American journal, so there is always the possibility for some sort of cultural misunderstanding in addition to any language difficulties. Nonetheless, the journal's submission process provides authors with a page listing ethical standards they have to follow, and it says that plagiarism of any form is not allowed. In fact, this journal's particular set of standards even mentions that authors cannot copy anything obtained during the peer review process without the "explicit permission" of the reviewer. So I just expect them to follow the rules that they were told about when they submitted the paper.
So, I understand how it's plagiarism, but I'm still not following why your suggestion, with the goal of helping them get their paper accepted, wouldn't be acceptable to copy/paste. It was to them and only to them, so it's not like it's a piece of substantial work from another team. Seems to be an extreme form of following the letter of the rule, and not the spirit of the rule. But, I'm not an academic so I don't really understand this sort of lack of discretionary allowance..
I'm fully on board with fighting plagiarism down to that level.
But that said, I've often times wondered if this requirement of having to "rewrite in your own words" may do a lot of harm too. It obfuscates that things that people are talking about are actually exactly the same, or make it fuzzy what the exact differences are.
In a particular academic CS area I've witnessed people reproduce again and again the essentially identical description of setting and assumptions, but in being afraid of plagiarism accusations, they over and over re-formulate things which made it nonobvious that things are the same as from other authors or even from their own earlier work.
My understanding is that something like the following happened:
1. authors submit a paper with expository sections about (eg) some materials being flammable and others inflammable
2. reviewer tries to explain that they have incorrectly understood the meaning of the terms, explains the meaning carefully and maybe suggests the terms they might mean.
3. Authors copy in the explanation and maybe replace incorrect usages with weird tortured phrases
4. Rejection
Obviously this description reads a little bit silly and things were probably more nuanced in practice. I think I’m probably also being uncharitable towards the authors in the example.
Acknowledging anonymous reviewers is common in my (erstwhile) field. “An anonymous reviewer suggests the following definition of…” I have to say that it seems odd to me to regard this as plagiarism.
Not 100% sure but I believe the word confidential implies that the review should only have been read by the editor(s) and not passed on to the authors.
A review is the written feedback authors receive from the journal reviewer. The reviewer can recommend that the authors revise and resubmit, based on the review comments. Usually the review is not published with the final piece, which is what was meant by “confidential review”.
it's a common technique/first layer of plagiarizing a text to translate it from english to eg. spanish and then from spanish to english, to get rid of the unique words the author used.
The parent post to mine was theorizing that the reason the English was so mangled was because it had been translated from English, then back to English. I was replying that, if the researchers didn't use English as a first language, it's ridiculously more likely that they were translating from their native tongue into English. You're misunderstanding where the notion of translating it twice came from.
And sure, there are reasons to translate a phrase from English, to another language, and back to English. This will be familiar to most people who've studied abroad, or done technical conferences on foreign soil, things of that nature. Let's say you're from Bolivia, attending a lecture at an English university, and are planning on referencing some of the content in a paper you're writing, in English. You speak passable English. The lecturer gets into the meat of the topic, and you realize you don't quite understand the context of what they're saying. Some of the conjugations are unfamiliar, so you just write it down as best you can and move on. Later, when writing the paper, you need a way to untangle the phrasing. A simple way is to put it into a translation application, translate to Bolivian, then try to parse it in native tongue. However, you know you have to explain and discuss this section in English; by translating it back, you'll get the English words, but some of the context and grammar structure will be from familiar Bolivian.
My wife was a professional translator and I did my master's thesis on the topic. With modern translation engines, there is no way you'll end up with "haze figuring" or "arbitrary timberland" translating one way from any source language into English. I also doubt you could get a very specific word like "timberland" from repeated translation of "forest", intentional synonym replacement is much more likely.
> there is no way you'll end up with "haze figuring" or "arbitrary timberland" translating one way from any source language into English.
Good to know; I've always wondered at the peculiarities of different translation engines, but never really dug into them, as most modern ones seem like neural network black boxes to me. I was pointing out that there are some realistic use cases for doing round trip translations. I've used this technique at a few conferences to help straighten out my hazy understanding of a complex idea in a language I spoke quite poorly. And I do agree it is bad form to use this directly in an academic paper.
> Also, they speak Spanish in Bolivia ;)
To be fair they speak Bolivian Spanish, along with many other native languages! I chose Bolivia as a random target without doing any research, so thanks for the pedantic push to go learn something new; things like this are why I do love HN!
I chose Bolivia because I know next to nothing about it and it seemed neutral; I should have used a notional country, like the Republic of United Swiss Emirates.
You've never had to do that before? Maybe I run into it a lot due to the nature of the conferences I attend. I know just enough of the language for functional conversation, but as soon as a complex idea is put forward, I need to be able to contextualize the familiar scientific portions of it quickly, and the round trip translation usually helps enough that I can parse it correctly.
Not really, but I came across some teammates in college who didn't seem to be able to follow a conversation yet they seemed to have quite good writing skills. This might be the reason why ;)
I find it unlikely a machine translator would spit out phrases like "counterfeit conscience" and "haze figuring" over AI and cloud computing with 1 pass. Plagiarism via multiple pass throughs seems much more likely.
Ironically this is exactly what I would have expected from machine translation 15 (edit: 20?) years ago. My friends and I used to get a kick out of running phrases through several rounds of machine translation in different languages and finally back into English, and then playing them with the Mac OS 9 text-to-speech system.
Maybe unlikely between two Indo-European languages with closely related sentence structures and vocabularies, but plausibly likely for others. DeepL just gave me "potter's screw" for "pan-head screw" in my language, for example.
What that phrase again? Oh yeah innocent until proven guilty. Whew almost forgot it. Why would you assume bad faith when this could be much more easily be explained by non-english speakers just using a machine translator.
There is 0 probability that an academic author who wants to write about artificial intelligence does not know the English term. Just from the fact that a properly written paper requires the author to know/cite the relevant literature, which at least to some non-0 percentage is in English. Same goes for the referees.
So you couldn't ask them to attribute this to you as an anonymous reviewer? And instead wanted them to spend their effort using "their own words" because "inexcusable" and what not? Man, are some people stuck up deep in their own ass. Yeah, a copy-paste isn't nice of them, sure, but are you one fragile snowflake.
But why did you call them for plagiarizing a private note, a confidential review? I mean... In the end they just used your wording to improve their paper, for a concept they already mastered. And you said the paper was good, otherwise. IMHO they acted ethically.
To sum up, the authors did not just use my wording for a short portion. They copied an entire paragraph of my review nearly verbatim. Both the journal and I thought that the authors acted unethically. They violated one of the ethical standards of the journal regarding plagiarism, and these ethical standards were made available to them when they submitted the paper. Those were the rules that I had to evaluate them with for the review, so my hands were tied in some sense. I would also quibble with saying that they mastered the concept, because I really have no way to gauge their understanding if they just copy my own words.
> the authors plagiarized my definition and explanation almost word for word (from my confidential review).
Is there any way the authors could have kept your definition, and somehow credited you, even anonymously? Because rephrasing definitions is the pinnacle of wasted effort, and leads to confusion - you are asking them to say what you said, but without using your words.
That is a good question that I do not have a good answer for, unfortunately. The review process for this journal is supposed to be blind, so crediting me would only reveal me as a reviewer. An anonymous acknowledgment is better than nothing, if the authors only copied a short definition without my permission, but they copied an entire paragraph from my review without my permission. That's just inexcusable. I can understand to some degree why they did not understand the concept well, since you may not encounter it even in a graduate-level course on the subject, but what they did was just inexcusable and really poor judgment.
I don’t understand what the problem here is actually. If you plagiarize on an assignment for school, that’s bad, because the goal of assignments is to test your knowledge and plagiarism makes it a less effective test. But here the goal (I think) is to produce an informative and accurate paper. You told them the definition - what would have been gained by rephrasing the definition you told them? Especially if they’re nonnative english speakers, it seems likely that any rephrasing would have made the definition less understandable and possibly less accurate.
I guess the fear is that they don’t actually understand the definition, and anyone reading the paper will incorrectly believe that they do, improving their reputation in a way they don’t deserve?
As I have said in another comment, the authors did not just copy my short definition but my entire paragraph-length explanation of the concept. I might have let the short definition slide but an entire paragraph is just inexcusable.
I agree that this is not a school assignment, so the nature of plagiarism is a bit different. But you hit the nail right on the head. We should be worried that they are pretending to know something that they really do not. They copied my explanation nearly word-for-word. Doing that does not prove that they actually understand the concept. It only proves that they have copy and paste. Now maybe they did spend some time learning it and looking into, but there is no way to know for sure. The only way to prove that they really understand the concept fundamentally is to make them explain it themselves in their own words. And that is precisely what they should have done.
"We thank the anonymous reviewers for constructive feedback that we used to improve the article." Standard phrase, credit where credit is due, no need to make some definition worse just to not trip off a plagiarism filter.
Unless this very phrase does the tripping off. Having to rephrase it would be pretty ridiculous though (and illustrates your point).
No, it's important to gauge how competent the researchers are, i.e. how trustworthy the remainder of the paper is. The problem with review process is that the reviewers typically cannot reproduce the work itself -- take a physics experiment for example. So they can only do a smell test, and getting basic concepts wrong is a rather bad smell.
Nature (and some others) now have started the IMHO awful practice to allow reviewers to be named after the paper is accepted. Beside the obvious question of bias etc., this also blurs the lines between author and reviewer.
> Is there any way the authors could have kept your definition, and somehow credited you, even anonymously?
It's not uncommon for the acknowledgements section of a paper to thank an anonymous reviewer for, e.g., suggesting the authors further investigate a detail that turned out to be important. But in this case, where the authors couldn't write their own explanation of a technical term, maybe it's a bit premature for them to be writing technical papers.
I agree with this. You sound expert and provided a definition. I don't think we should expect serious professionals to mess around altering the words to make it look like it didn't come from the source that it did come from. In fact wouldn't that itself be plagiarism? The usual approach here is to use a phrase like "as suggested by one of our reviewers".
I think this is the cause of some of these weird terms that this HN post is discussing. I have a PhD and found it incredibly frustrating to write research papers because there was an expectation in my field to add a ton of background. That meant I had to spend a lot of time to rephrase bits and pieces of other papers where the authors had worked hard to word something very well. The professors didn’t like me quoting from other papers. I had to come up with my own way to say something very specific.
I write papers too and hate finding a new way to say "my X is a Y that does Z". Especially if it is your tenth paper on the topic and you should even sound like the previous nine times.
But about three sentences into the introduction (where you explain all the background) you start going into "there is also Y's that do Z backwards". Which Y's you compare and connect with is important and says alot about how you think about your X. It might even be a new way of looking at it. So telling other people how you think of it can be important.
And another 5 sentences in you start referencing previous work on the topic. At this point you are crediting other and you get to chose whom to credit how much, with the benefit of hindsight. You refer to papers that are useful to people new in the study of capital letters. What you write here helps them much more than a mere list of papers or a google (well google scholar or ADS or pubmed or what ever) result list, because you can provide a good order to read them or which aspect of X's are best explained where. You also name papers that might be useful to practitioners in the field because they have a particular technique or a good explanation of it.
So it is very much worth while of providing the background that others expect at the beginning of your paper. Even if it requires rewriting that first paragraph several times.
I was pondering the same thing until I realized the person you are replying to is somewhat stretching the meaning of definition to mean a multiple paragraphs long explanation of a complex concept which was as I understand it lifted verbatim.
Even if you only copied a typical one-sentence dictionary definition verbatim without attribution, that would still clearly be a clear-cut case of plagiarism...
Hardly. Plagiarism is passing someone else idea or work as your own. Most terms especially technical ones have precise agreed upon definition. Calling quoting them plagiarism is stretching the notion to its limits. Thankfully there is no philosophical conundrum to be had when people are lifting complex explanations.
The "precise agreed upon definition" is not mapped to a precise agreed upon sequence of words.
I find it hard to think of any technical terms which have a fixed, well-specified phrase for the definition, much less ones which, if re-used, don't require attribution.
I mean, there are definitional terms like "one meter is the length of the path traveled by light in a vacuum in 1/299 792 458 of a second" or "the discriminant of the quadratic equation is b^2-4ac". Re-use those quoted definitions and no one will blink.
But, what's "evolution", or "electron spin", or "aromaticity"?
Even something as well-defined and concrete as "cosine similarity" has many different variations:
Wikipedia: a measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length 1
SciKit-learn: the normalized dot product of X and Y: K(X, Y) = <X, Y> / (||X||*||Y||)
towardsdatascience.com: the cosine of the angle between the two non-zero vectors
statology.org: For two vectors, A and B, the Cosine Similarity is calculated as: Cosine Similarity = ΣAiBi / (√ΣAi2√ΣBi2)
uchicago.edu: For vectors, it is the cosine of the angle between those vectors.
datadriveninvestor.com: Cosine similarity of two vectors is just the cosine of the angle between two vectors
While certainly equivalent, these definitions show some creative choice in how they are worded, and thus if copied, should be cited.
(I agree that the creativity level is quite low for some of these, and I believe several people might come up with the same description, but that's a different issue than re-using someone else's definition without attribution.)
> (I agree that the creativity level is quite low for some of these, and I believe several people might come up with the same description, but that's a different issue than re-using someone else's definition without attribution.)
It's funny because I actually find your exemples to be supporting my point more than yours.
All of these sentences are translation in plain English of the absolutly perfectly defined and commonly accepted definition of cosine similarity. SkiKit-Learn is even just writing the formula.
uchicago.edu and datadriveninvestor.com even use exactly the same words. I mean, if you came to see me complaining someone was plagirising for writing "For vectors, cosine similarity is the cosine of the angle between those vectors.", I would find that laughable.
Yes, I deliberately chose something that just above trivially simple to show that there was diversity of expression even at that level.
That is, even given a technical term with a precise agreed upon definition, the description of that term (eg, in English) does not have a precise agreed-upon form.
Incorrect use of the latter may imply plagiarism, and this thread appears to concern that aspect.
Most definitions are not as simple as "cosine similarity".
Your statement, if true, would mean that most dictionaries would use exactly the same words to describe a given, well-specified scientific concept, yes?
What's "Frame dragging" in general relativity?
Wikipedia: the effect on spacetime caused by a rotating mass ... Frame-dragging is an effect on spacetime, predicted by Albert Einstein's general theory of relativity, that is due to non-static stationary distributions of mass–energy.
doi:10.1126/science.aax7007 : the mass-energy current of a rotating body induces a gravitomagnetic field, so-called because it has formal similarities with the magnetic field generated by an electric current (1). This gravitomagnetic interaction drags inertial frames in the vicinity of a rotating mass. (quoting from the preprint at https://arxiv.org/abs/2001.11405 ).
einstein-online.info: a mass’s rotation influences the motion of objects in its neighbourhood
doi:10.3390/universe7020027 : The term "frame-dragging" usually refers to the influence of a rotating massive body on a gyroscope by producing vorticity in the congruence of world-lines of observers outside the rotating object.
doi:10.1142/9789812564818_0002 : A major consequence of General Relativity and related theories of gravity is that all inertial frames are local. These local frames are accelerated, warped and stretched, and rotated with respect to each other due to the surrounding mass-energy distributions. While only their relative rotations are typically called frame dragging effects, this phrase describes a broader range of gravitational influences on inertia.
Very different definitions, because it's hard to express that concept in English. And I think re-using a few of the more extensive definitions, without attribution, is a minor form of plagiarism. (Reusing any follow-up explanation is, as you've agreed, definitely plagiarism.)
You might recall that atrettel wrote "the authors were misusing a particular technical term".
If you dig in to the papers on frame dragging, you'll note similar complaints, like https://arxiv.org/abs/gr-qc/0509025 : "Many accounts of these experiments have been in terms of frame-dragging. We point out that this terminology has given rise to much confusion and that a better description is in terms of spin-orbit and spin-spin effects."
> I would find that laughable
So would I, which is why I commented 'that's a different issue than re-using someone else's definition without attribution'.
> Your statement, if true, would mean that most dictionaries would use exactly the same words to describe a given, well-specified scientific concept, yes?
No, it definitely doesn't unless you significantly extend what I said in a very uncharitable way to reach that point.
> Most definitions are not as simple as "cosine similarity".
But plenty are. As I said previously, you are going to be hard pressed to constate plagiarism on pure definitions unless you go towards extensive paragraph long ones which are more akin to explanations than what I would refer to as a definition as you did in the comment I am replying to. Actually, if you reread what you just wrote, you are yourself using the world explanation and definitely agree that that can be plagiarised as could very unorthodox and original forms of definition.
But when I read definition, what comes to my mind is akin to the cosine similarity example where even publications reuse mostly the same sentences while applying minor modifications to the subjects or adding an adverb. Thus me pondering the close proximity of the words definition and plagiarism in the original comment until I realized the whole thing was actually about an explanation triggering my reply to someone sharing my initial puzzlement.
> Is your view that re-use of those definitions cannot be plagiarism? If so, why not?
Could you stop pretending you are not understanding my point considering your first example nicely underline it and you yourself admitted it would be laughable to call that plagiarism?
I never was arguing there that the copying of everything you might defined even tenuously as a definition never ever constitute plagiarism. That's a complete strawman. I'm going to stop wasting my time here.
I've been trying to argue that definitions can be plagiarized, with examples which are not "tenuous" but ones which are drawn directly from publications.
If you want to quote a dictionary's definition, you should cite the dictionary you're quoting. If you're passing the dictionary's definition off as your own, word for word, that is plagiarism.
I enjoyed finding "counterfeit consciousness" for artificial intelligence. To me it evokes a kind of science fiction that's shown up occasionally on HN[1].
It sounds like something you'd find in 30s, 40s, 50s sci-fi for sure! Like “visiplate” (E.E. “Doc” Smith, Heinlein) for a computer display screen. (Along with ticker tape printouts and tape reels in the far future of course.)
Really highlights that the actual phrases don't make any more sense than the tortured versions, other than the fact that we've been hearing all of them for years so they now sound normal
Reading this list, it almost seems like these were created by looking up each individual word in a thesaurus, which of course destroys much of the meaning.
E.g.,
Signal -> flag
To -> to
Noise -> clamor
…and…
Data -> information
Warehouse -> stockroom
This would be a lot easier than running through multiple translation steps (as proposed elsewhere here).
This exact process has been used by spammers for a long time now. It's called spinning, and it is basically the kind of thesaurus replacement you're describing here. When I read the OP, my impression was that these authors were running plagiarized portions of their articles through a similar kind of spinner.
If I was in a situation where I had to write on occupational health and safety in forestry I would shamelessly appropriate "mean square blunder" and "arbitrary timberland", those are superbly above the mean square!
These sounded Chinese to me so threw this comment into Google Translate to figure what typical forward translations are and reverse translation of tortured versions will be. Bingo. Those are clearly translated from Chinese. Not sure if those are some form of plagiarisms as clamored or just a bad translation software prevalent in Chinese research communities though.
From top to bottom: 信噪比, 個人數字助理, 雲計算, 數據倉庫, 中央處理單元, 語音識別, MSE(均方誤差), 隨機訪問, 隨機森林, 隨機值, 情感分析
This is not a machine translation issue. The deep fried versions in Chinese make about as little sense as it does in English. Again, this is caused by word-by-word theasurus-running most likely in English.
Thanks for pulling those out. I enjoyed them. It does remind me that even before AI we had this kind of tortured phrases just in product descriptions of, for example, Chinese products. Like as a child I had been given a clone Rubik's cube labelled the Turrible Tetrahedron.
I'm very tempted to introduce tortured phrases at work for occasional humor. For example, who needs "continuous integration" when you have "ceaseless incorporation"? Sometimes it's nice to see if anyone reads my notes.
In all seriousness though, I've experienced something similar before at a Japanese run American corporation as far back as the 90's. The problem was Japanese executives and executive assistants who didn't know American tech-jargon often resulted in accepting mangled suggestions by the spell-checker. A notorious example was the "Data Whorehousing" presentation, which somehow made it through several reviews and rehearsals before being presented to the entire American IT department at an all-hands meeting.
Clearly this made an impact as I remember it 23(ish) years later!
I often wonder while reading an academic paper how the writing could be as hopelessly bad as it is.
This type of manipulation and plagiarism may be partially to blame, but the academic writing style has also gone completely off the rails to the point that half the journal articles being published today read as if written by some kind of paper writing AI robot even when I am quite certain that that isn't the case. And no, I am not talking about cases where the author is writing in a non-native language.
I have a theory that it may have to do with imposter syndrome and a need to sound smart. The author, fearing that they don't really belong and at any moment will be found out, therefore never making tenure, starts jamming academic sounding words where they don't belong and stretching sentences with commas and semi colons until the whole thing is just as insufferable to read as it was to write.
There is also the possibility that there are just a lot of terrible writers out there.
I am sure this was not your intention or meaning, but please be aware that it is virtually impossible for a non-native speaker to write perfect English. English is a language you have to intuit. In contrast to other languages, it has very few fixed rules. Writing elegantly in English is most certainly an art form.
Of course, writing good science is hard enough for native speakers. It is very difficult for the vast majority of people on the planet - no matter how good their research.
And just so we are clear: Not everyone can afford professional editing services at every point in their career.
We meet in English under the premise that it allows for universal communication.
In this, we accept that English natives are almost infinitely more privileged in writing, speaking, conferencing and networking. We also have to accept that the level of English proficiency varies, and - especially English - is easy to learn and so difficult to master.
> it is virtually impossible for a non-native speaker to write perfect English. English is a language you have to intuit. In contrast to other languages, it has very few fixed rules. Writing elegantly in English is most certainly an art form.
Learning to write well in any language is difficult. English is not exceptional as a language. Its influence in economic activity is what gives it prevalance.
Oh that is certainly not true. English is a fuzzy merchant pidgin but that's precisely what makes it almost trivial to learn, plus the CIA in it's infinite wisdom has seen it fit to encourage production of deluge of entertainment media that make immersing oneself in English content easier than any other language.
The issue here is not bad English in the sense you'd expect from a learner or someone who just isn't fluent. Nobody minds that, although you say not everyone can afford professional editing services: that's what journals are theoretically for!
The actual problem here is fluent English that is written in a totally bizarre style only found in academic papers. I've found that academic-ese is less of a problem in good computer science papers (like the one this article is about), but it crops up in some fields a lot. A trivial and not very important example is the way minor things are routinely described as "novel", a word you rarely find in everyday English, but in the research literature everything is "novel".
There used to be bad writing contests for academics. One of the famous winners was Judith Butler's timeless[1]:
The move from a structuralist account in which capital is understood to structure social relations in relatively homologous ways to a view of hegemony in which power relations are subject to repetition, convergence, and rearticulation brought the question of temporality into the thinking of structure, and marked a shift from a form of Althusserian theory that takes structural totalities as theoretical objects to one in which the insights into the contingent possibility of structure inaugurate a renewed conception of hegemony as bound up with the contingent sites and strategies of the rearticulation of power.
A friend submitted a paper to a journal in humanities. The reviewer said "his English is informal". In other words, these reviewers are asking for stilted English.
I also get this feedback on my papers. E.g. saying that it's written "more like a blog post".
Of course, they're not wrong. It is written more like a blog post. Because the writing style used in blog posts is hands down better than the writing style used in scientific papers. Blogs talk about the real reasons you worked on something, they go through simple examples, and they mention where you struggled and what you found confusing and what you tried that didn't work. All of these things are very useful for understanding, and in my experience almost entirely lacking from papers. Or at least, in my experience they're lacking from modern papers. I think in papers from 100 years ago the authors tended to talk more about their worries and their excitement e.g. [1].
"There is also the possibility that there are just a lot of terrible writers out there. "
Surely they are and writing in a way that is easy to read and understand is an art in itself.
But I would agree, that the main reason is probably the intention to sound smarter, than they are. Whole scientific disciplines seem to live by that standard.
This is not limited to science though, I recall a german poet (I think Heinrich Heine) said about his fellow poets:
You only fly so high like the swallow, that no one can actually hear your singing.
Good essay by Orwell that touches on this sort of thing https://www.orwellfoundation.com/the-orwell-foundation/orwel... I used to be guilty of writing this way and one of my high school English teachers recommended I read it. I've tried to take the message to heart ever since.
This is anecdotal evidence at best, but it is worth considering. I know of several individuals who were able to complete their entire Master's thesis utilizing a combination of AI generated content (GPT-3) and a paraphrasing tool.
The generated text was well over 50 pages, completely bypassed all known content/plagiarism checks and was even included in the Universities "exemplary examples". To this day, it is still there.
This is of significant concern as some of these GPT-3 based tools are now integrated within MS Word itself. Word 2021 allows for "add-ons", out of which I have noticed several third party content generation and paraphrasing tools.
I really doubt you can computer generate a Master's thesis. Completing a Master's thesis at an accredited institution is a heck of a lot of work and even a cursory reading of a thesis by an examiner, supervisor, opponent, or other interested party would give the generated content away. Maybe if you get your degree from a diploma mill you could get away with it, but then your degree wouldn't be worth toilet paper anyway.
I've heard similar stories about generated phd theses and it is even more implausible. The reason is that writing a thesis is much more than just producing a hundred pages or so of prose. Any university student can poop that out in a few weeks. The main job of a thesis is coming up with a research question, conducting an experiment or a study, and describe the results and how it fits in whatever niche of the scientific world you are working in.
I agree that in most cases it would be very difficult to do. But I can imagine some specific circumstances where it could be pulled off, possibly with some manual modifications: soft sciences like sociology (you can't imagine the amount of bs I've read during my college years), the subject matter being very different from the area your supervising prof specializes in, the topic that allows for arbitrary speculation, an underfunded university branch with profs having a more lax attitude.
That is different. People who submit computer-generated papers submit them to hundreds of journals. A few of them are bound to have so lax editorial standards that they are let through. They also risk nothing, while the student who is caught computer-generating their thesis will be thrown out. Furthermore, the amount of peer review-like process a thesis or dissertation goes through is an order of magnitude greater than the amount of peer review an article gets.
How are dissertations getting to you like that? When I did my PhD, no one would have allowed a PhD student to start writing a dissertation without first having sufficient research questions and then completing appropriate statistical analyses.
Please include a link to these theses, because as it stands this anecdote sounds extremely implausible. I don't know what university you were, but I've been at a few in Europe and at every one of them Master theses were evaluated from the start to the end by several humans. GPT-3 is unable to produce even two pages of coherent text, let alone 50 pages good enough to be accepted as a Master thesis in any discipline at any university I could think of (even the worst ones).
I can imagine that plagiators use paraphrasing software quite extensively, though, and that it is a problem.
It was not all automated, there was a fair bit of manual intervention needed. I understand your concerns and they are valid and this is why I preface my statement with "anecdotal evidence". What I write is most certainly not the entire story and a fair bit of detail is left out.
It should be known that this is widespread across multiple industries and this will only become more of an issue in the future.
I don't doubt this at all, and I have no doubt that GPT-3 with a bit of human editing can spit out something better than the lower third of masters students at corn row colleges.
Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care; even educators, who at least nominally see intrinsic value in education, go to borderline diploma mills to get that union-mandated raise at minimal effort.
> Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care; even educators, who at least nominally see intrinsic value in education, go to borderline diploma mills to get that union-mandated raise at minimal effort.
I don't mean this rudely, but it is attitudes like this which cause the CS interviewing process to be 100X more painful than the interviewing process in any other field: "I don't trust your credential so I demand you prove your competence to me on the spot and let's do 5 rounds of interviews just to be sure."
I've found very limited correlation between credentials and relevant skills. In CS, the actually important skills are often self-taught or gained through experience.
But that happens because of the common experience interviewers have of interviewing someone with a PhD or (even worse) various corporate credentials and discovering they can't actually create and compile a program that loops over an array.
The name originates as a perforative for small tuition-dependent non-research teaching colleges. Those colleges mostly catered to pastors, teachers, etc. and were located in small towns. The historical reasons that these institutions are now "in the corn fields" provides an interesting topic for historical inquiry. Perhaps many are in old rail-road or factory towns that have since languished, but schools that were similar at time of founding and didn't die are in industrial and post-industrial hubs where they attracting the attention needed to thrive. Who knows. The point is that they are small, inconsequential institutions that are predominately located in rural and semi-rural towns.'
The name now includes small state schools -- usually branch campuses with lower enrollment and no major (R1) research output.
(NB: corn row colleges are also by definition non-elite, so small liberal arts colleges with billion dollar endowments which might otherwise count, don't).
Many such institutions have since started offering graduate (or at least non-bachelors) degrees and certificates that are somehow even more worthless than their undergraduate programs.
Apparently the name has a lot of different meanings these days -- see sibling comments -- but it has DEFINITELY never been meant as a racial pejorative. If anything, exactly the opposite, since most of those "crap-tier midwestern/southern colleges" cater to 99.99% WASP social networks (the P is even explicit).
As I understand it, the Northwest Ordinance provided for the funding of schools, resulting in the "land grant" college system that still exists today. The most well known land grant colleges are of course the "flagship" universities of the Midwest states. But the states also chartered many smaller, regional, and specialized schools.
I live in Wisconsin, and the state university system is chartered to serve the needs of the state. There are too many students to send them all to UW in Madison, so there are a number of smaller regional universities, many of which now offer graduate degrees, plus an even larger number of "commuter" and "satellite" schools, and an elaborate technical college and trade school system. Not everybody can get a degree at a residential college. Life gets in the way.
We can debate the relative prestige of these colleges, but I've worked with people who attended the regional schools, including many engineers and computer programmers. All I can say is, send me more.
The colleges that catered to pastors were largely private, and in my home state, there was one in every town. Some of them emerged as full service 4-year colleges with additional programs. My undergraduate college was nominally "Christian" but I got a secular science education there, and it was well ranked in science. It also adjoined a seminary where I never set foot.
Yes. As far as I understand, "corn row colleges" wasn't originally a reference to branch campuses of state schools. "corn row college" referred exactly to those tiny private places.
I agree with your assessment that most of Wisconsin's land grants are quite good, btw. YMMV in other states, unfortunately.
> The point is that they are small, inconsequential institutions that are predominately located in rural and semi-rural towns.
Wow, coastal elitism much?
There are surely many degree mills and garbage universities, but to conflate their worth with their location is both injurious to discourse, and incorrect.
I (non-American, not a native English speaker) thought it was a pejorative reference to rural universities; ("hick" / "rube") state universities of Midwestern states etc.
The hairstyle is named after its resemblance to the agricultural arrangement, not the other way 'round, and the name of the hairstyle only makes sense if you're aware of how fields are planted. You have to look really hard to make a racial slur out of it.
> Masters degrees are cash cows, which is why no one in unregulated industries cares about them. People in regulated/unionized industries also don't actually care...
People don't care about masters degrees engineering, law, business, art, etc. etc.? Try applying for many jobs without one, or with one from lower-ranking colleges.
The Chronicle of Higher Education article recently on the HN front page said that masters in some fields, they give the example of 'positive psychology', are indeed cash cows. But in the example, that degree was not part of the actual Department of Psychology, which is taken very seriously.
Law degrees are called Juris Doctor but are professional degrees, like MBAs. You aren't required to publish original research (afaik) and in the US they were formerly Bachelor of Laws (LL.B.) and then renamed (as I understand it).
The doctorate is Doctor of Juridical Science (J.S.D.). You can also get a Master of Law (LL.M.).
No one cares about MBAs. The networks can be helpful, but, unlike JDs/PhamDs/etc., an MBA from a no-name college & weak alumni network isn't worth the paper isn't printed on.
Depends. University of Phoenix awards doctorates that take 3-4 years (HUGE red flag -- the best and brightest phd students might get out in 4 years if everything goes perfectly; an "expected time to graduation" of anything less than 5 years is almost certainly a worthless degree).
Those doctorates don't require much more than taking some coursework and paying a boatload in tuition. Basically an expensive and length online masters program. Not worth the paper they're printed on, unless you're employed by the government or in a union job that mandates raises for education attainment.
As a general rule of thumb, PhDs from R01 universities that are paid for by the university through research assistantships or teaching assistantships are generally a good signal of at least minimal training in research skills.
Another good general rule of thumb is that paying for a PhD -- beyond perhaps some MD/PhDs or maybe nursing phds, stuff like that -- is always a good sign of someone who has both a meaningless degree and also poor reasoning/research skills.
But anyways, real doctorates outside of a few fields (e.g., pure math) usually come with a non-trivial publication record that speaks for itself. You don't even need to know that the person has a doctorate; you can just read their papers and a rec letter from an advisor describing the student's role in each paper.
(I'm excluding discussion of professional degrees like JDs, PharmDs, etc. which are technically doctorates but sort of their own class.)
Length of PhD programs is an indicator that should be considered in context. UK Universities, for example, often have research PhD programs that take 3 years to complete and they are legitimate.
NB: I'm only speaking about the math doctorate as it currently stands in the United States.
Due to the current market saturation of math doctorates, any pure mathematics PhD worth the paper its printed on will also probably come with a non-trivial publication record. The exceptions I can think of are high-risk high-reward areas like cutting-edge number theory (I had a friend go eight years without publishing, which, yikes, but his thesis was semi-revolutionary (or so I'm told)) or, I guess, suitably abstract category theory (though the people I follow in this area seem to publish lots of interesting papers, like the Baez school or the homotopy type theory people; your mileage may vary).
It's really too bad. One wonders why we can't simply ax the entire advisor-candidate system (with all its myriad opportunities for physical, emotional, and even sexual abuse) and certify new candidates by saying: "You're a doctor of mathematics when you get five professors to sign off on 3-5 papers you've had published."
> certify new candidates by saying: "You're a doctor of mathematics when you get five professors to sign off on 3-5 papers you've had published."
Or one big one.
Basically, take the "honorary doctorates" some Universities give out to people retrospectively to people who have made major contributions to their fields; do it more often; and then make it the only path to getting a doctorate, such that they're no longer "honorary" at all.
> I know of several individuals who were able to complete their Master’s thesis utilizing…
Doesn’t it stay published forever? Might be a shame for the someone during their career.
On the other hand, even a chapter of Mein Kampf was accepted in 20 journals, after replacing the old word with newer versions. Human reviews are hard. Maybe we should put computers in charge of reviewing papers, they’d recognize the work of AI quicker?
> I know of several individuals who were able to complete their entire Master's thesis utilizing a combination of AI generated content (GPT-3) and a paraphrasing tool.
I searched the keyword "SEO" and didn't find any match in the comments here, I'm surprised.
For anyone who has been a webmaster, one can immediately recognize it's an extremely common technique in the blackhat SEO scene for decades, used by content farms everywhere. One just copies articles from somewhere else, replace all words with dictionary synonyms to evade the search engine penalty, and fill the resulted websites with spam.
Perhaps it's not as popular in the English world, but common in China, and is a standard tool included in all blackhat SEO software. And no, it doesn't work well, the output is gibberish too in spite of the language differences. Oh, and the article says:
> A high proportion of these papers came from authors in China.
Exactly what I expected. The spammers found a new market, apparently. It's sad to see that some scientific papers and journals are literally becoming blackhat SEO spam and content farms.
I noticed this happening in other areas a few years ago, but with faked blogs. The titles and subjects would sound interesting, but then when you tried to read them, you'd need a specialized decoder to get through the utterly baffling word replacements. But they already got their ad revenue by the time you notice the article is complete gibberish.
The first one I found was about dog illnesses. They kept referring to dogs with phrases like "Your domesticated canine," and it was quite a chore trying to figure out most of the symptoms that they were listing. "Heart worms" was translated to "love snakes," which I thought was delightful.
Yes, this may be a specific example of a more widespread phenomenon. There's certain websites out there that republish articles from well-established publications (e.g., New York Times) almost word for word, except that they are rife with synonym swaps that may or may not make sense in context, presumably to escape some kind of automated copy detection. Results can be amusing. For example, the copied article said "“Drukqs” acquired a blended essential response..." where the original said "“Drukqs” received a mixed critical response...".
I've seen this kind of things in articles posted to social media sites (Quora and Facebook, but it probably exists elsewhere).
I don't have a specific example at hand, but it's typically an article a few paragraphs long with really strange phrasing, so strange that it's not explainable by the author not knowing English well.
In a handful of cases, I've managed to find the original source. Common phrases are systematically replaced by ill-fitting synonyms.
I suspect the motivation is to avoid accusations of plagiarism (though I don't know what benefit the posters derive from doing this).
Nowadays too many real blogs are padded with weird phrasing and sentences which don't really mean anything.
In this case, sometimes you get lucky and can actually find meaningful information between the padding. But sometimes you just read an article that takes 5 paragraphs and 500 words to say "we don't know".
I ran into something like this in an amazon review once. I was looking for a book of transcriptions for the instrument I play, and two of the handful of reviews used the same awkward phrase: "music goals". I scratched my head and then realized what probably happened. They weren't native english speakers and they were being paid to write reviews and they had gotten the wrong synonym. "music goals" was supposed to be "music scores".
Back in 2004 or so, I was building a distributed CMS with the goal of creating artificial "link pyramids" with the purpose of SEO, which was a rather new thing at the time.
Content generation was one of our bottlenecks, and as Google was already rather successful at detecting duplicate content, we were looking for a way to "uniqify" posts that would be used to stuff sites intended for googlebot, but not humans.
One of the methods that worked was taking source English content, running it through Babelfish, the Altavista translator to French, Spanish or German, and then using the same method to translate it back to English.
This resulted in texts that did not make much sense to humans, were full of precisely such "tortured phrases" but which were considered unique by Google.
Authorship is the metric that scientists get paid for, so of course it has been thoroughly corrupted.
Fake papers and plagiarism are the most blatant form of corruption. They tend to come from certain, let's say, large countries with less developed scientific cultures. Those countries need to put an end to it, because the rest of us keep having to work harder to suppress the racist impressions that we're bound to form of colleagues who look and sound like the cheats.
In more traditional scientific countries, the corruption is more subtle. Today, many groups publish every paper with half a dozen authors, and no indication of what each of them contributed. This enables the professors who run those groups to manipulate authorship more or less as they please, and have total control over who gets to have a career in science. It turns out that absolute power corrupts senior scientists as absolutely as it does other people.
No doubt there are more clever ways to game the system, that I haven't noticed. As long as million dollar grants and first-world citizenship keep being doled out for something as contrived as scientific paper authorship, corruption is inevitable.
And the journal involved, Microprocessors and Microsystems, is an Elsevier journal. Huge surprise. I am glad the publisher earns their outrageous fees by careful screening, peer-review, and editing of submitted manuscripts. /s
> [...] the editor of Microprocessors and Microsystems began having concerns about the integrity and rigour of peer review for papers that had been published in some of the journal’s special issues.
> The journal’s publisher, Elsevier, launched an investigation. This is still under way, but in mid-July the publisher added expressions of concern to more than 400 papers that appeared across six special issues of the journal.
I hate to open up this topic and I hate to pick on people that are trying to fix their mistake even more, but oh boy. Elsevier has been a pain in the butt for universities and researchers alike. They leech money from both sides of the community, they sue people trying to bring science forward and they gate scientific success. And their literally only reason to keep existing was to prevent exactly this.
I've never been a big fan of the current scientific publishing model. But Elsevier is a top publisher. It's pretty damming that they have one - highly overpaid - job and they don't even do it.
A high profile case (on the internet) similar to the one described in the article is when Siraj Raval plagiarized a paper on quantum ML and made some amusing replacement phrases:
complex Hilbert space -> Complicated Hilbert space
I was reading an article on Nature and noticed their definition of "woman" didn't make sense. Tortured phrases isn't confined to plagiarism avoidance.
> Unfortunately, fibroids are just one of many understudied aspects of health in people assigned female at birth. (This includes cisgender women, transgender men and some non-binary and intersex people; the term ‘women’ in the rest of this editorial refers to cis women.)
The article says it only refers to "Cis women" (presumably "cisgender women"), however the article continues to talk about rugby and brains, in which case the word "woman" would not only refer to "cisgender" women, but also to those people who identify as non-binary or transgender men, as surgery and hormone therapy (if that is undertaken by the individual) won't change brain axons, or the person's physical stature.
The article then talks about "male animals", not "animals assigned male at birth". There's no explanation given why animals are not similarly "assigned" a sex.
AFAIK today it has became necessary to "disguise plagiarism" even when you are not plagiarizing anything because bullshit "anti-plagiarism" software would detect many phrases similar to what somebody else already used. I believe the war on plagiarism brings little good in exchange for the hassle.
> In our strong opinion, the root of the problems discussed in this work is the notorious publish or perish atmosphere (Garfield, 1996) affecting both authors and publishers. This leads to blind counting and fuels production of uninteresting (and even nonsensical) publi- cations.
Here's "Microprocessors and Microsystems."[1] This is supposed to be about embedded systems, which is generally a no-bullshit field. I'd never heard of this journal. People read Electronic Design, EE Times, "Embedded.com", maybe Control Systems Journal, etc. Those have either articles about how to do something, or "why what we're selling is great" articles.
Now look at the article titles in Microprocessors and Microsystems.[2] Here are the first three.
- COPS: A complete oblivious processing system
- A perceptron-based replication scheme for managing the shared last level cache
- Efficient underdetermined speech signal separation using encompassed Hammersley-Clifford algorithm and hardware implementation
Now those might be legitimate, although what they're doing in an embedded systems journal isn't clear. They're all behind a paywall, so it's hard to tell if they're any good.
"Oblivious processing" is a security concept. That belongs in a journal on security and encryption, where the crypto people will know what holes to look for. (Microsoft was doing work in this area in 2013, but I don't think a product emerged. If you can make it work, some cloud computing company can use it.)
Cache management belongs in a journal on CPU design, where people who have struggled to make caches work will take a look. There are people using perceptrons for this, which makes sense; a cache has to guess which things will be reused. (If this works well, someone should be trying it in web caches such as NGINX to improve cache hit rates.)
Signal separation is an active field, but this isn't a journal where you'd expect to find articles on it. Wikipedia has a good article on signal separation. The history of that article indicates attempts to sneak in citations to sketchy articles. No idea if the Hammersley-Clifford algorithm is even relevant. (If it's a significant advance, there's commercial value in this in improving audio quality for conferencing systems.)
So these papers were all sent to a journal where the odds of getting published are good, and the odds that the editors have no idea about the subject matter is high.
As someone who has had to write technically in a second-language (French, funding agencies in Quebec), this rings particularly true.
Luckily, I'm fluent enough to recognise the particularly egregious examples, but finding good translations for technical words is hard!
One example that comes to mind is when trying to translate the phrase "data feed" which came back as "alimentation données" which ostensibly means "animal feed data".
If you're looking for a lot of English-to-French translations of technical terms, check out the theses any English University in Quebec (McGill, Concordia, etc..). They're made public online [0]. Can't vouch for the quality as I'm sure there are plenty that just use Google Translate, but everyone I know has their abstract edited by a francophone in their field.
A good way to validate translated technical terms is to just give them a quick internet search on e.g. DuckDuckGo or Semanticscholar.
Maybe a future direction would be to train new models to identify plagiarism by training on this information. Use „non matching backtranslations for training classifiers. It’s again the typical cat and mouse game I guess
It's the classical problem of people trying to find technological solutions to social problems. If plagiarism and fake research is still a problem after we've applied technology to fight it, clearly we haven't applied enough of it.
Sometimes technological solutions work really well to solve social problems. For example, at one point, one person using the internet would tie up the phone line for everyone else in the house, and vice versa. Negotiating this shared resource could be considered a household social problem. But now there's no such interference, and most people have their own cell phones.
This is a social problem around the shared use of a technological resource. I'm reminded of the old saying, "computers can only solve problems that are created with computers".
But then again you can view _all_ solutions to social problems as inherently technological in the broader sense; I adhere to that paradigm.
The number of papers being published is growing at a staggering rate. This requires proportional growth in the number of people reading these papers, which inevitably means the plagiarists and cheaters themselves are being pulled into the review system as well. They don’t care about letting fraudulent papers slip through because they never really cared about the science in the first place.
They see it as a game that they’re playing and they’re doing their best to put as little effort as possible into the game while extracting as much reputation upside as they can.
We really need to make publishing fraudulent papers a career-ending move across academia and even the industry. The only reason this continues to happen is because it has a lot of upside but very little downside. Caught publishing fraudulent papers? Oh well, just leave them off your resume and apply somewhere else.
Referees have no real incentive to keep quality high. They already don't get anything in return for doing it. (At best they do it for reciprocity/goodwill.) Papers are usually hard to follow, replication rate is abysmal, etc. The incentives are all set for publishing, not for making real progress.
I've been seeing this in news articles as well. Swipe someone else's article, run it through a synonym-replacer algorithm, and have Reddit bots post it on a bunch of news subs. Presumably the thesaurus work fools Google's just-a-copy detector.
It's the next step in clickbait monetization. Why settle for low-effort content when you can have no-effort content?
Don't have any of the actual content handy, but here's an online tool that advertises itself for that specific purpose: https://spinbot.com/ Google "rewriting tool" for more examples. Apparently it's a lot more common than I'd realized.
> I feel like only the highest profile journals can be trusted at this point.
The highest profile journals (Nature, Science, The Lancet in medicine, ...) have some tendency to go for sensationalism. They want to publish radical, ground-breaking research more than there is actual new ground-breaking results happening. So they also end up publishing mediocre research presented as ground-breaking, and some less-than-accurate research where results are exaggerated to make them look ground-breaking.
Um, yes. "Nature" used to have a great reputation. Supposedly it still does in bio. But battery articles in Nature are just awful. They keep blowing up "minor advance in surface chemistry" into "10x better battery that costs 10x less Real Soon Now".
(I'd like to see EV World or something else in that space reprint old articles as "1, 5, and 10 years ago in battery hype".)
Yeah in my field the general attitude is that Nature isn't all that great. I have heard the phrase "it was published in Nature but might still be right" more than once.
My favorite counter example is "A Draft Sequence Of A Neandertal Genome". The article was accepted by both Nature and Science before it was written. The authors chose to publish in Science, because Science offered more on the side: the title page and an unlimited(!) number of "contributed" (this means unreviewed) companion papers. The article itself was about 20 pages of drivel; all the substantial content was relegated to the 200(!) pages of "Online Supplemental Material". Nobody ever read, let alone reviewed, all of that.
After that, I can't trust either Science or Nature, which offered pretty much the same crooked deal. If those two aren't "highest profile", who is?
Not even those! The impact factor of a journal is a terrible guide to quality. It is more appropriately thought of as a measure of scientific sex appeal.
You must read each paper to judge its merits. Lots of junk gets published in top ranked journals.
That's a low bar, though. The point is that it's very difficult to judge the scientific merits of a paper without actually reading it. (And even then, it's easy to be fooled.)
Lots of junk gets published in top ranked journals.
A lot more get published in vanity journals, so I use the impact factor as a first pass filter: I avoid papers from journals not listed the JCR¹ or those with a factor below 1.000.
I assume, maybe naively, that if an important finding were to be published in such low quality journal, it would eventually get published in a more legit publication.
I thought top ranked journals have good reviewers, since the editorial board consists of researchers/professors from top notch schools. Can you share your thoughts why junk get published in such journals? Has it to do with collusion or reputation-laundering or more?
I've a horrible premonition that the paper describing this problem (and those that cite it) may eventually end up being flagged for containing too many tortured phrases...
A colleague in an unnamed field, attending an unnamed Polish university mentioned that this kind of thing was rife: publishing Polish papers translated from English texts and occasionally vice versa.
Poland is a country with a strong academic tradition and similar enough institutions to others in the EU so I can only imagine this happens in more ‘peripheral’ countries with even less globalization.
In telecoms they call all the backend infrastructure “back haul” and have never read a satisfactory explanation. I’m convinced that somebody once coined “back hall” with the intention of invoking the image of service passages like what you see in the mall, it was misheard (as is often the case in Telecomms given it’s global nature) and the metaphor of the bulldozer tail stuck for ever after
In freight "backhaul" typically refers to transporting goods during the return journey. During the principle (non-return) journey, often the starting location is more central, like a distribution center, and the destination is a smaller satellite location like a store. So when something is backhauled, that tends to mean it's transported from the smaller satellite location to the central location.
IME I think backhaul is just sending data to the main internet, not all backend. I thought it just meant it hauled the data back into the core network.
Main Internet, across main Internet, between networks, intra-domain. Intra-station. Anything that joins it all together that isn’t “front facing” i.e wireless network towards handsets
I've heard that term used where an ISP is piggbacking on a larger service. Sonic.net offers some of their services over AT&T infrastructure. Data to and from home DSL lines is "backhauled" to Sonic HQ in Santa Rosa, CA and then goes out over the bulk Internet backbone from there. This is a different path the data would take than if handled entirely by AT&T.
If they are not retracted, they might get cited by other works which themselves might get cited. Suddenly this faked, nonexistent research has been "laundered" into mainstream and nobody knows anymore that there was a problem in the first place.
Apparently "haze figuring" is a tortured phrase but "fog computing" is a term of art, despite searches for the former returning many pages containing the latter. So maybe 'fog computing' is a hybrid.
It's long been fairly normal for formal paper writers (of any age, in any discipline) to try to 'jazz up' their lingo to make it sound more erudite/learned. (Apart from specific collegial shorthand like 'lacustrian' or 'normalization'.) Readability suffers, meaning is softened, euphemisms fluorish, mistakes are made.
'Colossal information' in place of 'Big data' ... wow - so wrong.
It reminds me of scientific an article about Canadian journal publishers are being bought by a shady company (OMICS Group Inc.) so they can seemingly publish whatever they want to.
It is the result of a misguided science system that relies mostly on external quality checks (peer reviewed publication) and flooding the world with so much "novelty" that there is no way to digest it. At least you can use the output to train language models up to now: will machines now have to train themselves...
The people in question are paid for writing papers, not the discovery of true knowledge. The root cause is government funding of research, hence this specific paper talking at the end about the "publish or perish" culture. Governments give researchers money to write papers but then don't read the results, only check that it got published somewhere, meaning anyone who can game the publication system has in effect a license to print (taxpayer) money.
That's why there's no real fix for this problem beyond defunding government science budgets. Any quick hacks you can come up with like running GPT-2 detectors over science papers are just treating the symptoms of the problem, not the cause. The root cause is that when governments eliminate the free market by subsidizing research they lose reliable signals of genuine utility that come from polling the market, so they have to use proxies that boil down to quantity-over-quality. The fix is for them to stop subsidizing research. The people who apply research to create new technologies aren't reading it anyway.
I cannot understand: those articles should have been carefully examined before publishing - I understand they are in the set of those "Under the Warranty of the Publications' Authority". But if anyone read them, the rubbish involved would have emerged.
I get this with students a lot. Papers which have been copied from some website, but then they've gone through and altered a few bits of vocabulary to disguise it.
> Out of 404 papers accepted in less then 30 days after submission, 394 papers (97.5%) have authors with affiliations in (mainland) China. Out of 615 papers of which editorial processing time exceeded 40 days, 58 papers (9.5%) only have authors with affiliations in (main- land) China. This tenfold imbalance suggests a differentiated processing of papers affiliated to China characterised by shorter peer-review duration.
No mentions of tortured phrases in the humanities and softer social sciences? For all their supposed appreciation of les belles-lettres (viz., "fine writing") those researchers sure seem to like their tortured phrasings.
How so? It seems quite related to me. Anecdotally, one would expect a pretty clear negative correlation between torturedness in the sense of this article and indicators of research quality.
I discovered something similar a couple days ago. After googling the title of a paywalled article (that I didn't care enough about to actually pay to read), the closest thing to the unredacted article in the search results was a version that that clearly been automatically rewritten in this exact manner. It was barely decipherable, so I gave up immediately.
Unknown Source -> stolen using Machine Translated English and Published -> Stolen again, Machine Translated language X -> Machine Translated English -> re-publish on another web site -
Some people don't have English as their native language. When such people want to write a scientific article in English they will have to use someone who can write English but probably does not know much about the research. So of course there will be articles with "Tortured Phrases".
Did you even read the abstract of the TFA? This should not even be a cultural background issue. As an L2 English speaker myself I have never ever thought about throwing a thesaurus onto some established phrase so I can turn “artificial intelligence” into “counterfeit consciousness”, or “deep neural network” into “profound neural organization”. These are deliberate use of fancy words without trying to make sense.
Heck, we got a word for this sort of rampant plagiarism masking on Chinese internet — 洗稿 (manuscript (or blog post)-laundry).
OT: I do appreciate the funny phrase “elite figuring” for HPC. It’s kind of like how they translate things to Anglish.
People who don't speak English natively could use machine translation, and people plagiarizing could use machine translation. How do they distinguish (if you don't mind saving me digging into the research)?
Phrases like AI and big data are already pretty well defined in almost every major machine translation set. You'd have to forcefully try to thesaurus your way through to make it do that 99% of the time.
(We merged this thread and https://news.ycombinator.com/item?id=28108111)