artificial intelligence

The onslaught of “Tortured phrases”

So I had this funny feeling for quite a while, where I learn about something new like a word or a concept and then I start seeing them everywhere. When this started to happen, I was shook. Although I understand the concept of confirmation or cognitive bias, these words or things I got to know seemed quite uncommonly used so how am I seeing them everywhere all of a sudden. The odds just seemed astronomical if I may say. Like I just learned about the word “onslaught” a couple of weeks ago and now I can’t watch a Youtube video, read an article or listen to a podcast without someone using it. So my initial reaction would be to ask: where was this word all this time, I mean, I’ve been learning this language since I was 4 years old. Is this a new word or is it just a cognitive bias again? So I used Google Ngram viewer to search for how often this word was used between 1995 and now and it did significantly increase in its usage up to 2015. Still, I finally got the courage to put this weird feeling into words and googled if it’s a known phenomenon and behold, it is! It is called the frequency illusion (also known as the Baader–Meinhof phenomenon) and is defined exactly as the first sentence in this blog. So from now on, whenever a similar incident happens, I’d smile knowing it’s just my brain making connections that it would have missed before, had it not known this word.

Regarding the topic of new words, recently, there has been a daily onslaught of news about retractions of academic papers in different fields. Some are because of plagiarism and some are because of faking data. Now from my perspective, there’s two types of plagiarism: scientific-content and textual. But before I start giving my -what people might call ‘radical’- opinion on both types, I’d like to mention the new wave of textual plagiarism detectors. About a year ago or so, I learned about the concept of “tortured phrases”, i.e. phrases that can only make sense in the scientific context if said without paraphrasing or using synonyms such as “heat transfer” not “hotness move” or “wave propagation” not “flow motion”. Now, even before Chatgpt and other AI models became popular, common paraphrasing tools were still used everywhere. People would use them either to talk about something they want to explain but they linguistically can’t, or that these articles where such phrases were used, just wrote anything to fill the space and intentionally paraphrased huge text to avoid plagiarism detectors. So these research-integrity investigators in the attached article below, developed an algorithm to find these phrases. While the article talks about some of their results which show that such phrases were mostly correlated with bad science and sometimes paper-mills, I will focus on the cases that only did this computer-generated paraphrasing without the intention of scientific misconduct.

You see, the issue with the academic publishing of scientific articles in the moment is that it is mostly in English, a language which the majority of scientists from all over the world can master only as a second or third language. It is no wonder, most renowned scientists come from English speaking countries, and yes, I know these countries mostly have better education and invest a lot in research, still, there’s a correlation. Research consensually shows that when students do science in their own native tongue, they flourish. If the opposite happens, as with most academics, they struggle substantially more, even if they master the language. This is why, if you ever visit PubPeer, you’ll find most papers there with misconduct similar to the one described above are by authors from Asia, South America and Africa. Thus, this urge to write papers just like native-speaking academics, pushes these scientists to use such tools to cut corners, since they linguistically can’t write what they understand in their own words. Again, while I understand that countries where most offences take place have lower funding and academic opportunities which makes them vulnerable to more grievous malpractices than just software-based paraphrasing, I am only trying to find an argument for those researchers who have no other means but to use such tools.

Okay, so what’s my opinion on the two types of plagiarism? Well, I don’t think these two are even remotely equal or comparable. People who steal someone’s results, methodological approach etc. and claim it’s theirs to publish are intentionally doing a theft and there is not excuse for it. This is not the case in my opinion for textual-plagiarism (e.g. by failing to paraphrase), where the intention is not to steal but do what every scientist does: paraphrase. Of course I’m not advocating for such practices and I think all scientists should strive to master the language as much as they can but it is also very unfair to assume or require all non-native speakers to learn the language to the same level as native ones. As a possible solution, I had this idea of researchers never having to paraphrase anymore, and just quote anything they want to say with quotation marks and citations of course. One colleague of mine complained about it not letting the text flow and that it will be frustrating to read such paper or book. So maybe in the future, everyone will write in their own mother language and with the rise of machine-learned translators, you’ll be able to read anyone’s publications in your mother language like a Wikipedia page and science will have a universal language. I really hope so, otherwise it will always be a disadvantage to be a non-native speaking scientists.

Standard

Leave a comment