Lyin' and Cheatin', AI Models Playing a Game
OpenAI, Apollo Research Find Models Hide Misalignment; Training Cuts Deception
Frontier artificial intelligence models are learning to hide their true intentions to pursue hidden agendas, said OpenAI and Apollo Research. Researchers say the risk of deception needs to be tackled now, especially as AI systems take on more complex, real-world responsibilities.
Frontier artificial intelligence models are learning to hide their true intentions to pursue hidden agendas, said OpenAI and Apollo Research. Researchers say the risk of deception needs to be tackled now, especially as AI systems take on more complex, real-world responsibilities.