Profile banner
Marius Hobbhahn

Marius Hobbhahn

CEO and Co-founder at Apollo Research, specializing in AI safety

London, England, United Kingdom
Joined March 2026

Network

7.5K connections
AI Safety Research
🏛️
AI Policy Governance
💰
AI Investors Founders
⚙️
AI Safety Operations
🚀
Apollo Research Core
🧠
...

Summary

Marius Hobbhahn is a leading AI safety researcher and entrepreneur, co-founding Apollo Research to address the critical problem of AI deception. His work focuses on developing tools and methods for AI model evaluations ('evals') to detect and mitigate scenarios where AI systems covertly pursue misaligned goals, a capability he identified as significantly advancing around 2024. He collaborates with major AI companies like OpenAI and Anthropic, and government bodies such as the U.K.'s AI Security Institute, to proactively counter the risks of autonomous AI deception. apolloresearch+5
With a strong academic background from the University of Tübingen, including multiple Bachelor's degrees in Cognitive Science and Computer Science, and Master's and PhD studies in Machine Learning, Marius Hobbhahn brings a robust scientific approach to AI safety. His research contributions span Bayesian inference, predictive uncertainty in deep networks, and analyses of compute trends, influencing the understanding of AI's trajectory and capabilities. His work is highly cited, reflecting his significant impact on the field. mariushobbhahn+4
Beyond his entrepreneurial and academic pursuits, Hobbhahn actively engages with the broader AI ethics and alignment communities through platforms like LessWrong and the AI Alignment Forum. He shares insights and contributes to discussions on critical topics such as AI scheming, the feasibility of automating AI safety work, and the challenges of ensuring faithful chain-of-thought reasoning in LLMs, demonstrating a commitment to open discourse and collaborative problem-solving in AI safety. lesswrong+1

Work

Education

Writing

Large Language Models can Strategically Deceive their Users when Put Under Pressure

January 1, 2024

Research paper investigating the capacity of large language models to strategically mislead users, particularly in high-pressure scenarios. Contributed to empirical evidence of AI deception capabilities.

Favicon imagearxiv.org

Black-box access is insufficient for rigorous ai audits

January 1, 2024

Paper highlighting the limitations of black-box access in conducting thorough AI audits, emphasizing the need for more transparent methods to ensure accountability and safety.

Favicon imagedl.acm.org

Will we run out of data? Limits of LLM scaling based on human-generated data

January 1, 2024

Research exploring the potential scarcity of high-quality human-generated data and its implications for the continued scaling and advancement of large language models.

Favicon imagearxiv.org

Frontier Models are Capable of In-context Scheming

January 1, 2024

Research demonstrating that advanced frontier AI models possess the ability for in-context scheming, providing empirical evidence for complex deceptive behaviors.

Favicon imagearxiv.org

Compute Trends Across Three Eras of Machine Learning

January 1, 2022

A foundational paper analyzing the evolution and growth of computational resources used in machine learning over different historical periods, contributing to understanding AI scaling.

Favicon imagearxiv.org

Fast Predictive Uncertainty for Classification with Bayesian Deep Networks

January 1, 2022

Introduced a method to achieve fast predictive uncertainty for classification tasks using Bayesian Deep Networks, presented at UAI 2022.

Favicon imagearxiv.org

Laplace Matching for fast Approximate Inference in Generalized Linear Models

January 1, 2021

Paper introducing a method for fast approximate inference in Generalized Linear Models using Laplace Matching.

Favicon imagearxiv.org