Apollo Research is an AI safety organization. We specialize in auditing high-risk failure modes, particularly deceptive alignment, in large AI models. Our primary objective is to minimize catastrophic risks associated with advanced AI systems that may exhibit deceptive behavior, where misaligned models appear aligned in order to pursue their own objectives.

Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.

AI/ML

Research

Technology

Apollo Research

Apollo Research's headquarters in London, United Kingdom

Marius Hobbhahn

Christopher Akin

The AI Research Team at Apollo Research focuses on advancing the safety and reliability of AI systems by conducting in-depth investigations into high-risk failure modes, with a particular emphasis on deceptive alignment. This team conducts fundamental research on interpretability and behavioral evaluations, applying their findings to audit real-world AI models. By developing and implementing interpretability tools, they aim to enhance safety assurances and mitigate the risks associated with advanced AI technologies.

About

People (6)

Alexander Meinke

Jake Mendel

Jérémy Scheurer

Mikita Balesni

Nicholas Goldowsky - Dill

Rusheb Shah

Jobs