Researchers at Penn State University, USA, investigated the extent to which natural language models like ChatGPT, which use artificial intelligence (AI) to formulate realistic and articulate prose in response to user questions, can generate content that does not qualify as plagiarism. This is an important question to answer, given that these systems function by processing, memorizing, and reproducing existing information from huge volumes of data available online, such as books, scientific articles, Wikipedia pages, and news reports.
The group analyzed 210,000 texts generated by GPT-2, developed by OpenAI, the same startup that created ChatGPT, in search of evidence of three different types of plagiarism: verbatim, when excerpts of text are directly copied and pasted; paraphrasing, when text is reworded using synonyms to slightly alter the result; and the use of ideas developed by someone else without crediting them, even if formulated differently.
The study concluded that all three types of plagiarism were present and the larger the set of parameters used to train the models, the more frequently they occurred. The researchers used two types of models: pretrained models based on a broad spectrum of data, and fine-tuned models, which were refined by the Penn State team to focus on a smaller set of scientific and legal documents, academic articles related to COVID-19, and patent applications. The choice of content type was no accident. In these types of text, plagiarism is considered highly problematic and intolerable.
In the text generated by pretrained models, verbatim copying was the most prevalent form of plagiarism, while the fine-tuned models were more likely to paraphrase and appropriate ideas without crediting the source. “Plagiarism comes in different flavors,” said one of the paper’s authors, Dongwon Lee of Penn State’s College of Information Sciences and Technology, according to the news outlet Eurekalert. The findings will be disclosed in more detail at the Web Science Conference, an event organized by the Association for Computing Machinery (ACM), set to take place between April 30 and May 4 in the city of Austin, USA.
ChatGPT is one of several AI systems that has gained notoriety since having been made available for public use. Since November, it has been tested by more than 100 million people, impressing many with its ability to generate coherent texts that mimic the writing of human beings (see Pesquisa FAPESP issue no. 325). One of the controversies it has raised is the originality of the responses and the fear that it could become a source of academic misconduct.
“People pursue large language models because the larger the model gets, generation abilities increase,” said the paper’s lead author, Jooyoung Lee, a PhD student at Penn State’s College of Information Sciences and Technology. These AI tools can create unique and personalized answers to questions posed by users, even when extracting the information from a database. However, this does not mean the software is not capable of plagiarism, even in forms that are more difficult to detect. “We taught language models to mimic human writings without teaching them how not to plagiarize,” Lee said.
Several programs are being developed to detect content generated by AI software. OpenAI itself has created a tool capable of identifying AI-generated text . There are others of its kind available online, such as Writer’s AI Content Detector and Content at Scale. As the development of natural language systems continues, the technology used to identify AI-generated content will also need to be continuously updated.
A team from Penn State’s College of Engineering and Applied Sciences showed that it is possible to train people to identify these texts, reducing the reliance on software. The study, led by computer scientist Chris Callison-Burch and presented at an Association for the Advancement of Artificial Intelligence (AAAI) conference in Washington DC in February, showed that AI tools are highly efficient at producing fluent prose and following grammar rules. “But they make other kinds of mistakes that we can learn to spot,” Liam Dugan, a PhD student at Penn State and one of the authors of the paper, told the blog Penn Engineering Today.
The experiment used a game available online called Real or Fake Text. The study participants, all of whom were undergraduate or graduate students taking an AI course at Penn State, were presented with passages of text that were written by human beings until a certain point and then completed by a language model. The passages were taken from news published in the media, presidential speeches, fictional stories, and culinary recipes. Participants were asked to estimate at what point in the text the AI system took over and explain the reasons for their decision. They were awarded points for correct guesses. The main reasons given were the appearance of irrelevant content, logic errors, contradictory sentences, highly generic phrases, and grammar problems. The participants found it easier to correctly identify AI-generated text in cooking recipes than the other types of text.
They scored significantly higher than randomized responses, showing that text generated by AI is detectable. Although the ability of the participants varied widely, their performance improved the more they played, showing that they learned over time. “Five years ago, language models couldn’t stay on topic or produce a fluent sentence,” said Dugan. “Now, they rarely make a grammar mistake. Our study identifies the kind of errors that characterize AI chatbots, but it’s important to keep in mind that these errors will continue to evolve. People will need to continue training themselves to recognize the difference and work with detection software as a supplement.”
Republish