OpenAI's o1 Has Some 'Scheming' Issues

Chatbots can be deceiving.

It happened again this week with the full version release of OpenAI’s o1, its most advanced artificial intelligence (AI) model yet with “complex reasoning” capabilities.

AI safety testers discovered the model’s reasoning abilities can at times mislead, even deceive, humans at a higher rate than OpenAI’s GPT-4o, as well as leading AI models from Alphabet Inc.’s Google, Facebook parent Meta Platforms Inc. and Amazon.com Inc.-backed Anthropic.

“While we find it exciting that reasoning can significantly improve the enforcement of our safety policies, we are mindful that these new capabilities could form the basis for dangerous applications,” OpenAI researchers acknowledged on Thursday.

“One particular safety concern is that AI agents might covertly pursue misaligned goals, hiding their true capabilities and objectives — also known as scheming,” Apollo Research added in a separate paper.

In one example depicted in a video provided by OpenAI, a user uploads a picture of a wooden birdhouse and asks the model for advice on how to build it. The model “thinks” for a short period and then spits out a comprehensive set of instructions that, when closely reviewed, offer useless instructions.

When researchers pressed the o1 model on whether it was trying to mislead them, it denied it had done anything wrong and would “fabricate false explanations” about 99% of the time.

The fear among researchers is that a scheming AI model could circumvent human control, and with access to enough resources and agentic capabilities, could create organizational problems as more companies deploy and use AI agents.

“Subjectively, Apollo Research believes that it is unlikely that such instances would lead to catastrophic outcomes as o1 agentic capabilities do not appear sufficient, but their evaluations were not designed to directly assess this risk,” OpenAI said.

Before OpenAI releases its own AI agents, as it will reportedly do in 2025, the company will need to retest its AI models, according to researchers and industry analysts.

Cautionary tales of AI models are nothing new. Deception issues with o1 models are just the latest example of an AI product demo doing the opposite of its intended purpose. An advertisement for Google’s new Bard chatbot last year mistakenly credited the James Webb telescope with a discovery. An updated version of a similar Google tool informed some it was safe to eat rocks, and that they could use glue to stick cheese to their pizza.

Earlier this week, it was widely reported that OpenAI’s ChatGPT appears to be restricted from saying some names, including David Mayer. Other names verboten to the system include Brian Hood, Jonathan Turley, Jonathan Zittrain, David Faber, and Guido Scorza. Conspiracy theories abound, but OpenAI has been mum for the banned names.

OpenAI’s o1 Has Some ‘Scheming’ Issues

SHARE THIS STORY

FOLLOW US

OpenAI’s o1 Has Some ‘Scheming’ Issues

TECHSTRONG TV

Tech Field Day Events

TECHSTRONG AI PODCAST

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP