AI-generated nonsense is leaking into scientific journals

Mack DeGeurin

March 19, 2024 at 4:00 PM·5 min read

In February, an absurd, AI-generated rat penis somehow snuck its way into a since retracted Frontiers in Cell and Developmental Biology article. Now that odd travesty seems like it may just be a particularly loud example of a more persistent problem brewing in scientific literature. Journals are currently at a crossroads on how best to respond to researchers using popular but factually questionable generative AI tools to help draft manuscripts or produce images. Detecting evidence of AI use isn’t always easy, but a new report from 404 Media this week shows what appears to be dozens of partially AI-generated published articles hiding in plain sight. The dead give away? Commonly uttered, computer generated jargon.

404 Media searched the AI-generated phrase “As of my last knowledge update” into Google Scholar’s public database and reportedly found 115 different articles that appeared to have relied on copy and pasted AI model outputs. That string of words are one of many turns of phrase often churned out by large language models like OpenAI’s ChatGPT. In this case, the “knowledge update” refers to the period when a model’s reference data was updated. Chat. Other common generative-AI phrases include “As an AI language model” and “regenerate response.” Outside of academic literature, these AI artifacts have appeared scattered in Amazon product reviews, and across social media platforms.

Several of the papers cited by 404 Media appeared to copy the AI text directly into peer-reviewed papers purporting to explain complex research topics like quantum entanglement and the performance of lithium metal batteries. Other examples of journal articles appearing to include the common generative AI phrase “I don't have access to real-time data” were also shared on X, formerly Twitter, over the weekend. At least some of the examples reviewed by PopSci did appear to be in relation to research into AI models. The AI utterances, in other words, were part of the subject material in those instances.

Though several of these phrases appeared in reputable, well-known journals, 404 Media claims the majority of the examples it found stemmed from small, so-called “paper mills” that specialize in rapidly publishing papers, often for a fee and without scientific scrutiny or scrupulous peer review.. Researchers have claimed the proliferation of these paper mills has contributed to an increase in bogus or plagiarized academic findings in recent years.

Unreliable AI-generated claims could lead to more retractions

The recent examples of apparent AI-generated text appearing in published journal articles comes amid an uptick in retractions generally. A recent Nature analysis of research papers published last year found more than 10,000 retractions, more than any year previously measured. Though the bulk of those cases weren’t tied to AI-generated content, concerned researchers for years have feared increased use of these tools could lead to more false or misleading content making it past the peer review process. In the embarrassing rat penis case, the bizarre images and nonsensical AI-produced labels like “dissiliced” and “testtomcels” managed to slip by multiple reviewers either unnoticed or unreported.

There’s good reason to believe articles submitted with AI-generated text may become more commonplace. Back in 2014, the journals IEEE and Springer combined removed more than 120 articles found to have included nonsensical AI-generated language. The prevalence of AI-generated text in journals has almost surely increased in the decade since then as more sophisticated, and easier to use tools like OpenAI’s ChatGPT have gained wider adoption.

A 2023 survey of scientists conducted by Nature found that 1,600 respondents, or around 30% of those polled, admitted to using AI tools to help them write manuscripts. And while phrases like “As an AI algorithm” are dead giveaways exposing a sentence’s large language model (LLM) origin, many other more subtle uses of the technology are harder to root out. Detection models used to identify AI-generated text have proven frustratingly inadequate.

Those who support permitting AI-generated text in some instances say it can help non-native speakers express themselves more clearly and potentially lower language barriers. Others argue the tools, if used responsibly, could speed up publication times and increase overall efficiency. But publishing inaccurate data or fabricated findings generated by these models risks damaging a journal’s reputation in the long term. A recent paper published in Current Osteoporosis Reports comparing review article reports written by humans and generated by ChatGPT found the AI-generated examples were often easier read. At the same time, the AI-generated reports were also filled with inaccurate references.

“ChatGPT was pretty convincing with some of the phony statements it made, to be honest,” Indiana University School of Medicine professor and paper author Melissa Kacena said in a recent interview with Time. “It used the proper syntax and integrated them with proper statements in a paragraph, so sometimes there were no warning bells.”

Journals should agree on common standards around generative AI

Major publishers still aren’t aligned on whether or not to allow AI-generated text in the first place. Since 2022, journals published by Science have been strictly prohibited from using AI-generated text or images that are not first accepted by an editor. Nature, on the other hand, released a statement last year saying they wouldn't allow AI-generated images or videos in its journals, but would permit AI-generated text in certain scenarios. JAMA currently allows AI-generated text but requires researchers to disclose when it appears and what specific models were used.

These policy divergences can create unnecessary confusion both for researchers submitting works and reviewers tasked with vetting them. Researchers already have an incentive to use tools at their disposal to help publish articles quickly and boost their overall number of published works. An agreed upon standard around AI generated content by large journals would set clear boundaries for researchers to follow. The larger established journals can also further separate themselves from less scrupulous paper mills by drawing firm lines around certain uses of the technology or prohibiting it entirely in cases where it’s attempting to make factual claims.

TechCrunch
Snowflake releases a flagship generative AI model of its own
All-around, highly generalizable generative AI models were the name of the game once, and they arguably still are. Case in point: Snowflake, the cloud computing company, today unveiled Arctic LLM, a generative AI model that's described as "enterprise-grade." Available under an Apache 2.0 license, Arctic LLM is optimized for "enterprise workloads," including generating database code, Snowflake says, and is free for research and commercial use.
3d ago
TechCrunch
Amazon wants to host companies' custom generative AI models
AWS, Amazon's cloud computing business, wants to become the go-to place companies host and fine-tune their custom generative AI models. Today, AWS announced the launch of Custom Model Import (in preview), a new feature in Bedrock, AWS' enterprise-focused suite of generative AI services. The feature lets organizations import and access their in-house generative AI models as fully managed APIs.
4d ago
Engadget
Adobe Photoshop's latest beta makes AI-generated images from simple text prompts
Adobe will now let you use AI to generate images directly within Photoshop. That's just the beginning.
4d ago
TechCrunch
Adobe's working on generative video, too
Adobe says it's building an AI model to generate video. Offered as an answer of sorts to OpenAI's Sora, Google's Imagen 2 and models from the growing number of startups in the nascent generative AI video space, Adobe's model -- a part of the company's expanding Firefly family of generative AI products -- will make its way into Premiere Pro, Adobe's flagship video editing suite, sometime later this year, Adobe says. Like many generative AI video tools today, Adobe's model creates footage from scratch (either a prompt or reference images) -- and it powers three new features in Premiere Pro: object addition, object removal and generative extend.
12d ago
Engadget
Meta’s Oversight Board will rule on AI-generated sexual images
Meta's Oversight Board has accepted two cases that deal with AI-made explicit images of public figures.
11d ago
Engadget
Amazon debuts a generative AI-powered playlist feature
Amazon Music has announced a generative AI-powered playlist feature. Enter a prompt (which can include emoji) and Maestro will try to find songs that match it.
11d ago
TechCrunch
Generative AI is coming for healthcare, and not everyone's thrilled
Generative AI, which can create and analyze images, text, audio, videos and more, is increasingly making its way into healthcare, pushed by both Big Tech firms and startups alike. Google Cloud, Google's cloud services and products division, is collaborating with Highmark Health, a Pittsburgh-based nonprofit healthcare company, on generative AI tools designed to personalize the patient intake experience. Amazon's AWS division says it's working with unnamed customers on a way to use generative AI to analyze medical databases for "social determinants of health."
13d ago
Engadget
US bill proposes AI companies list what copyrighted materials they use
A new bill would make AI companies detail which copyrighted materials they took data from.
17d ago
Engadget
The US and UK are teaming up to test the safety of AI models
The UK and the US governments have signed a Memorandum of Understanding in order to create a common approach for independent evaluation on the safety of generative AI models.
25d ago
Engadget
Meta plans to more broadly label AI-generated content
Mets says it will more broadly apply labels to AI-generated content across its platforms. The company claimed its existing policy was "too narrow," concurring with an Oversight Board recommendation.
22d ago
TechCrunch
Google.org launches $20M generative AI accelerator program
Google.org, Google's charitable wing, is launching a new program to help fund nonprofits developing tech that leverages generative AI. Called Google.org Accelerator: Generative AI, the program is to be funded by $20 million in grants and include 21 nonprofits to start, including Quill.org, a company creating AI-powered tools for student writing feedback, and World Bank, which is building a generative AI app to make development research more accessible. In addition to funding, nonprofits in the six-month accelerator program will get access to technical training, workshops, mentors and guidance from an "AI coach."
a month ago
TechCrunch
Meta AI tested: Doesn't quite justify its own existence, but free is free
Meta's new large language model, Llama 3, powers the imaginatively named "Meta AI," a newish chatbot that the social media and advertising company has installed in as many of its apps and interfaces as possible. It tends to regurgitate a lot of web search results, and it doesn't excel at anything, but hey — the price is right. You can currently access Meta AI for free on the web at Meta.ai, on Instagram, Facebook, WhatsApp and probably a few other places if those aren't enough.
16h ago
Yahoo News
Quiz: How well do you know RFK Jr.'s policy positions?
Test your knowledge of the policy positions of third-party presidential candidate Robert F. Kennedy Jr.
59m ago
Engadget
Apple has reportedly resumed talks with OpenAI to build a chatbot for the iPhone
Apple has resumed talks with OpenAI, the maker of ChatGPT, to build an AI-powered chatbot into the iPhone, according to a new report.
12h ago
Autoblog
Hyundai will build more hybrids to supplement slowing EV demand
Hyundai said it would add more hybrids to its previously EV-only production lines in Georgia.
2h ago
TechCrunch
The IBM-HashiCorp coupling could be more complicated than it seems
When IBM announced its intention to acquire HashiCorp for $6.4 billion on Wednesday at market close, it was easy to conclude that the two companies should fit well together, but a deal comes down to more than strategy. In his meeting with analysts after Wednesday's announcement, IBM CEO Arvind Krishna said he sees HashiCorp as a critical piece of IBM's hybrid cloud management strategy, especially as it relates to generative AI. “As generative AI deployment accelerates alongside traditional workloads, developers are working with increasingly heterogeneous, dynamic and complex infrastructure strategies," Krishna told analysts.
18h ago
Yahoo Life
Can climbing stairs help you live longer? 4 takeaways from this week's health news.
News you can use about gratitude, aging and getting out in the great outdoors.
3h ago
Yahoo Sports
2024 NFL Draft: Fantasy football winners and losers after Rounds 2 and 3
Scott Pianowski examines the potential fantasy impact of intriguing receivers and running backs taken on Day 2 of the NFL Draft.
7h ago
TechCrunch
Photo-sharing community EyeEm will license users' photos to train AI if they don't delete them
EyeEm, the Berlin-based photo-sharing community that exited last year to Spanish company Freepik after going bankrupt, is now licensing its users' photos to train AI models. Earlier this month, the company informed users via email that it was adding a new clause to its Terms & Conditions that would grant it the rights to upload users' content to "train, develop, and improve software, algorithms, and machine-learning models." Users were given 30 days to opt out by removing all their content from EyeEm's platform.
15h ago
Yahoo Sports
NFL Draft: Panthers trade up, make Jonathon Brooks 1st running back off the board
Running backs weren't a part of a record first round for offensive players.
12h ago

News

Life

Entertainment

Finance

Sports

New on Yahoo

AI-generated nonsense is leaking into scientific journals

Unreliable AI-generated claims could lead to more retractions

Journals should agree on common standards around generative AI

Recommended Stories

Snowflake releases a flagship generative AI model of its own

Amazon wants to host companies' custom generative AI models

Adobe Photoshop's latest beta makes AI-generated images from simple text prompts

Adobe's working on generative video, too

Meta’s Oversight Board will rule on AI-generated sexual images

Amazon debuts a generative AI-powered playlist feature

Generative AI is coming for healthcare, and not everyone's thrilled

US bill proposes AI companies list what copyrighted materials they use

The US and UK are teaming up to test the safety of AI models

Meta plans to more broadly label AI-generated content

Google.org launches $20M generative AI accelerator program

Meta AI tested: Doesn't quite justify its own existence, but free is free

Quiz: How well do you know RFK Jr.'s policy positions?

Apple has reportedly resumed talks with OpenAI to build a chatbot for the iPhone

Hyundai will build more hybrids to supplement slowing EV demand

The IBM-HashiCorp coupling could be more complicated than it seems

Can climbing stairs help you live longer? 4 takeaways from this week's health news.

2024 NFL Draft: Fantasy football winners and losers after Rounds 2 and 3

Photo-sharing community EyeEm will license users' photos to train AI if they don't delete them

NFL Draft: Panthers trade up, make Jonathon Brooks 1st running back off the board