{"id":451364,"date":"2022-09-19T17:59:02","date_gmt":"2022-09-19T20:59:02","guid":{"rendered":"https:\/\/revistapesquisa.fapesp.br\/?p=451364"},"modified":"2022-09-19T17:59:02","modified_gmt":"2022-09-19T20:59:02","slug":"bad-translations-signal-misconduct-in-scientific-articles","status":"publish","type":"post","link":"https:\/\/revistapesquisa.fapesp.br\/en\/bad-translations-signal-misconduct-in-scientific-articles\/","title":{"rendered":"Bad translations signal misconduct in scientific articles"},"content":{"rendered":"<p>A group of researchers from France and Russia investigated the frequent use of meaningless expressions in articles published in computer science journals. Instead of using the well-known term artificial intelligence, for example, some papers have referred to the field as \u201ccounterfeit consciousness.\u201d The term big data has at times been replaced by something with a similar meaning but never used in the context: \u201ccolossal information.\u201d<\/p>\n<p>In a study published on arXiv last year, computer scientists Guillaume Cabanac of the University of Toulouse, Ciryl Labb\u00e9 of Grenoble Alpes University, and Alexander Magazinov from Russian software company Yandex concluded that these terms, which they dubbed \u201ctortured phrases,\u201d can signal various types of misconduct. The most common is plagiarism. These expressions appear strange because they are automatically translated from English into another language and then converted back into English, with the aim of altering the phrases to fool plagiarism detection software.<\/p>\n<p>As well as seeming strange, these tortured phrases make papers difficult to understand, inaccurate, or simply wrong. They could also be an omen of a more serious problem. Tortured phrases were found in totally fraudulent articles generated by AI language programs. \u201cUnlike papers where authors seem to have used paraphrasing software, which changes existing text, these AI models can produce text out of whole cloth,\u201d explained Cabanac, Labb\u00e9, and Magazinov in an article published on the <em>Bulletin of the Atomic Scientists<\/em> website in January. They refer specifically to a neural network called GPT-2, developed by American private research institution OpenAI, which is capable of generating coherent structures that appear to have been written by a human. Last year, the group screened 140,000 abstracts using a program\u2014created by OpenAI itself\u2014that can detect text generated by GPT-2. \u201cHundreds of suspect papers appeared in dozens of reputable journals,\u201d the trio wrote.<\/p>\n<p>Computer programs that generate fake articles are nothing new, but until recently the results were so bad that they were only capable of fooling people paying little attention or being highly negligent. In 2005, three students from the Massachusetts Institute of Technology (MIT) created a program called SciGEN that can combine sequences of words extracted from genuine scientific papers to create new texts\u2014although they do not make any sense (<a href=\"https:\/\/revistapesquisa.fapesp.br\/en\/nonsense-papers\/\" target=\"_blank\" rel=\"noopener\"><em>see<\/em> Pesquisa FAPESP <em>issue<\/em> <em>no. 219<\/em><\/a>). That same year, they submitted one of these manuscripts to a world conference on cybernetics and computing taking place in the USA and managed to get it published\u2014the MIT group\u2019s goal was to highlight that the peer-review process for conference proceedings is often performed poorly. The tool that started as a joke, however, was later adopted by fraudsters. In 2012, Labb\u00e9 showed that the MIT software, freely available online, was being used for wrongdoing\u2014he found articles generated by SciGEN in the proceedings of more than 30 conferences. Labb\u00e9 subsequently developed a program to identify these texts using keywords, which was adopted by scientific publishers to prevent the problem.<\/p>\n<p>Advances in AI have breathed new life into this type of fraud. Cabanac and his colleagues then created a more powerful system called the Problematic Paper Screener, which identifies articles containing tortured phrases. Volunteers compiled frequently mistranslated expressions in papers from various fields of knowledge to feed the screener\u2019s database. In this database, \u201cirregular esteem\u201d is identified as a supposed equivalent of \u201crandom value,\u201d a term commonly used in statistical analysis. Another bizarre example is the term \u201cbosom peril,\u201d which appeared in place of \u201cbreast cancer.\u201d<\/p>\n<p>Not even COVID-19 studies have been able to escape it: in some papers, Severe Acute Respiratory Syndrome (SARS) was converted to Extreme Intense Respiratory Syndrome. One of them, authored by Egyptian doctor Ahmed Elgazzar of Benha University, was removed from the Research Square preprints platform after evidence emerged of misconduct that went beyond bad translations. The study, which suggested that the dewormer ivermectin was effective against SARS-CoV-2, was considered invalid due to discrepancies between the raw research data and the clinical trial protocols. In addition to the strange translation of SARS, there was evidence the author plagiarized from ivermectin press releases that his paraphrasing trickery could not hide.<\/p>\n<p>Journals that publish manuscripts with poor translations may have other problems with their quality control processes. Cabanac also searched the Dimensions database for scientific documents containing the terms he compiled. He detected 860 articles containing at least one tortured phrase, 31 of which were published in the same journal: Elsevier\u2019s<em> Microprocessors &amp; Microsystems<\/em>. The computer scientist decided to download all the papers published by the journal between 2018 and 2021 and analyze them in depth. He found roughly 500 problematic cases\u2014most of which involved irregularities in the peer-review process. Most of the questionable papers had been published in special issues and had identical submission, revision, and acceptance dates\u2014evidence that they were not well reviewed.<\/p>\n<p>A parallel investigation by Elsevier corroborated the Frenchman&#8217;s findings and the publisher subsequently retracted or removed 165 articles. \u201cThe integrity and rigor of the peer-review processes were investigated and confirmed to fall beneath the high standards expected by <em>Microprocessors &amp;\u00a0 Microsystems<\/em>,\u201d the editor of the journal said in a statement. There were also indications that many of the special issues contained &#8220;non-original and heavily paraphrased&#8221; content.<\/p>\n<p>Elsevier was not the only publisher grappling with the problem. In March, UK-based IOP Publishing announced the retraction of 350 articles published in two journals\u2014<em>The Journal of Physics: Conference Series<\/em> and <em>IOP Conference Series: Materials Science and Engineering\u2014<\/em>which disseminate physics, materials science, and engineering conference proceedings. Many contained tortured phrases. These were discovered by Nick Wise, an engineering student at the University of Cambridge who used the screener created by Cabanac&#8217;s group to analyze the journals.<\/p>\n","protected":false},"excerpt":{"rendered":"Researchers compile \u201ctortured phrases\u201d that seek to hide plagiarism and fraud in <em>papers<\/em>","protected":false},"author":11,"featured_media":451365,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"footnotes":""},"categories":[155],"tags":[219,230,215],"coauthors":[98],"class_list":["post-451364","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-good-practices","tag-computation","tag-ethics","tag-scientometrics"],"acf":[],"_links":{"self":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/451364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/comments?post=451364"}],"version-history":[{"count":1,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/451364\/revisions"}],"predecessor-version":[{"id":451369,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/posts\/451364\/revisions\/451369"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/media\/451365"}],"wp:attachment":[{"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/media?parent=451364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/categories?post=451364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/tags?post=451364"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/revistapesquisa.fapesp.br\/en\/wp-json\/wp\/v2\/coauthors?post=451364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}