Results for ""
Real science, documented in scientific publications, is one of the most sacred sources of knowledge due to its invaluable contribution to future discoveries. It is imperative to safeguard the authenticity of scientific publications from fraud or any influential factors that compromise the integrity of this crucial source of knowledge.
When ChatGPT and other generative AI can produce scientific articles that look real - especially to someone outside that field of research, a group of researchers are trying to find the best way to figure out which ones are fake.
As an attempt, Ahmed Abdeen Hamed, a visiting research fellow at Binghamton University, State University of New York, has created a machine-learning algorithm named xFakeSci that can detect up to 94% of bogus papers -- nearly twice as successfully as more common data-mining techniques.
"My main research is biomedical informatics, but because I work with medical publications, clinical trials, online resources and mining social media, I'm always concerned about the authenticity of the knowledge somebody is propagating," said Hamed, who is part of George J. Klir Professor of Systems Science Luis M. Rocha's Complex Adaptive Systems and Computational Intelligence Lab. "Biomedical articles, in particular, were hit badly during the global pandemic because some people were publicizing false research," Hamed added.
In a new paper published in the journal Scientific Reports, Hamed and collaborator Xindong Wu, a professor at Hefei University of Technology in China, created 50 fake articles on each of three popular medical topics—Alzheimer's, cancer, and depression—and compared them to the same number of real articles on the same topics.
Hamed said when he asked ChatGPT for the AI-generated papers, "I tried to use the same keywords that I used to extract the literature from the [National Institutes of Health's] PubMed database to have a common basis for comparison. My intuition was that there must be a pattern exhibited in the fake world versus the actual world, but I had no idea what this pattern was."
After some experimentation, he programmed xFakeSci to analyze two major features of the papers. One is the number of bigrams, which are two words that frequently appear together, such as "climate change," "clinical trials," or "biomedical literature." The second is how those bigrams are linked to other words and concepts in the text.
"The first striking thing was that the number of bigrams was very few in the fake world, but in the real world, the bigrams were much richer," Hamed said. "Also, even though there were very few bigrams in the fake world, they were so connected to everything else."
Hamed and Wu theorize that the writing styles are different because human researchers don't have the same goals as AIs prompted to produce a piece on a given topic.
"Because ChatGPT is still limited in its knowledge, it tries to convince you using the most significant words," Hamed said. "It is not the job of a scientist to make a convincing argument to you. A real research paper honestly reports what happened during an experiment and the method used. ChatGPT is about depth on a single point, while real science is about breadth," the researcher added.
Source: Scientific reports