Mikita Balesni

Research Scientist at Apollo Research

Mikita Balesni is a research scientist at Apollo Research, focusing on studying strategic deception in frontier AI models. They have experience in ML safety research at Stanford Existential Risks Initiative and as a visiting ML researcher at the University of Tartu. With a background in computer science and cybersecurity, Mikita has also worked as a machine learning engineer and research intern, showcasing a diverse skill set in technology and research.

Location

London, United Kingdom

Links


Org chart

This person is not in the org chart


Teams


Offices


Apollo Research

Apollo Research is an AI safety organization. We specialize in auditing high-risk failure modes, particularly deceptive alignment, in large AI models. Our primary objective is to minimize catastrophic risks associated with advanced AI systems that may exhibit deceptive behavior, where misaligned models appear aligned in order to pursue their own objectives. Our approach involves conducting fundamental research on interpretability and behavioral model evaluations, which we then use to audit real-world models. Ultimately, our goal is to leverage interpretability tools for model evaluations, as we believe that examining model internals in combination with behavioral evaluations offers stronger safety assurances compared to behavioral evaluations alone.


Industries

Employees

1-10

Links