OpenAI, the AI research powerhouse with popular projects like the GPT series, Codec, DALL-E, and Whisper, might be rushing through its AI deployment without adequate protections.
According to a Financial Times report, the ChatGPT maker is now assigning staff and third-party groups only a few days to assess the risks and performance of its latest large language models (LLMs) as compared to several months they were given earlier.
This could possibly have to do with the push for faster model release and a shift in focus towards inference (generating new data) rather than just training models.
“AI is becoming a very competitive field with all tech companies launching their models at breath-taking speed,” said Pareekh Jain, CEO and lead analyst at Parekh Consulting. “OpenAI’s edge has been that it was an early player in this race and they must be wanting to maintain that edge and accelerate production by slashing testing time.”
Testers say they had more time before
OpenAI has scaled back its safety testing efforts, dedicating fewer resources and less time to risk assessments, according to eight people FT cited in its report who are familiar with OpenAI’s testing processes.
“We had more thorough safety testing when it was less important,” the FT report said quoting one of their sources that was testing OpenAI’s upcoming o3 model, while referring to the LLM technology.
OpenAI’s approach to safety testing for its GPT models has varied over time. For GPT-4, the company dedicated over six months to safety evaluations before its public release. For the GPT-4 Omni model, however, OpenAI condensed the testing phase into just one week to meet a May 2024 launch deadline.
Reduced testing could compromise model integrity
Reducing the safety testing time could severely impact the quality of the launching model, experts add.
“If there are cases of any hallucination or damage due to model outputs, then OpenAI will lose people’s trust and face derailed adoption,” Jain added. “It can be blamed on slashing testing time. Already, OpenAI has an image problem by converting it from a non-profit to a profit enterprise. Any bad incident can further tarnish its image that, for profit, they are sacrificing responsible testing.”
One of the sources called the reduction in testing time “reckless,” and a “recipe for disaster.” Another involved in GPT-4 testing said some dangerous capabilities were only discovered two months into testing.
While OpenAI did not immediately respond to requests for comment, the LLM giant has had experience dealing with such allegations in the past.
Responding to a similar backlash, in September 2024, OpenAI turned its Safety and Security Committee into an independent “Board oversight committee” with the power to delay model launches over safety concerns.
Improved AI could be pushing faster tests
While few obvious fingers point at escalated tests as dangerous to model integrity, there’s one rare way of looking at it. Jain hinted at the possibility of OpenAI being actually capable of speeding up tests without compromising security.
“OpenAI must be using a lot of AI in their internal processes also,” he said. “They must be drinking their own champagne to convince the world that, with AI, they could do fast testing. We should give them the benefit of the doubt if they are trying to accelerate their model launch with more AI use.” Backing this thought is a claim from OpenAI from December 2024, where they said their testing models are becoming more capable quickly with AI.