Why it matters: AI-based plagiarism is becoming an increasingly annoying and dangerous phenomenon, especially for genuine science research publications. Many people (and researchers) are trying to develop a practical solution against this kind of troublesome pettiness, and a new approach seems to work particularly well for a specific kind of scientific papers.
ChatGPT is extremely good at faking man-made creative content, even though actual professionals are finding the chatbot to be pretty "sh*tty" and redundant as a writer. When it comes to scientific writing, however, chatbots can turn from simple nuisances or school cheating tools to actual threats against science and proper research practices.
Newly published research by scientists from the University of Kansas is proposing a potential solution for the AI-based plagiarism problem, boasting a pretty remarkable ability to distinguish actual human-made science writing from ChatGPT output "with over 99% accuracy." A result obviously achieved through AI algorithms and a specifically trained language model.
Chemistry professor Heather Desaire and colleagues are fighting AI with AI, and they are seemingly getting very good results in that respect: the researchers focused their efforts on "perspective" articles, a particular style of article published in scientific magazines to provide overviews of specific research topics.
The scientists chose 64 perspectives articles, on topics ranging from biology to physics, and then they asked ChatGPT to generate new paragraphs on the same research to put 128 "fake" articles together. The AI spat out 1,276 paragraphs, which were then used to train the language model chosen by researchers to try and classify AI-made text.
Two more datasets, one containing 30 real perspectives articles and the other with 60 papers generated by ChatGPT, were compiled to test the newly trained algorithm. And the algorithm seemingly passed the tests prepared by researchers with flying colors: the AI classifier was able to detect ChatGPT articles 100% of the time, while accuracy for detecting individual fake paragraphs dropped to 92%.
The scientists say that chatbots mangle textual contents by using a particular "writing" style, therefore, their "hand" could be identified in a pretty effective way. Human scientists tend to have a richer vocabulary, and write longer paragraphs containing more diverse words and punctuation marks. Furthermore, ChatGPT isn't exactly renowned for its precision level, and it tends to avoid providing specific figures or quoting other scientists names.
Kansas researchers defined their approach against AI plagiarism as a "proof-of-concept" study, even though it has shown to be very effective at identifying fake perspectives articles. Further (human-made) research is needed to establish if that same approach could be applied to other types of scientific papers or general AI-made textual outputs.