I am a Member of Technical Staff at Anthropic, where I lead a team focused on Adversarial Robustness. We’re focused on developing technical solutions to mitigating catastrophic risks from AI misuse, including adversarial defence, understanding jailbreaks, and automated red-teaming. I also offer mentorship through MATS, exploring similar topics. You can find my latest research on my Google Scholar profile, and some highlighted papers below.
…
Before joining Anthropic, I obtained my PhD from the University of Oxford in the Autonomous Intelligent Machines and Systems programme. I had the good fortune of being supervised by Tom Rainforth, Eric Nalisnick, and Yee Whye Teh. The first portion of my PhD was spend developing Bayesian models to evaluate the effects of nonpharmaceutical interventions on COVID-19 transmission. My work on this topic has been cited in federal legislation, presented to the Africa CDC modelling group, and shared with the UK’s Scientific Advisory Group for Emergencies. Following that, I worked on accelerating model training using ideas from probabilistic modelling, rethinking Bayesian neural networks, and understanding sycophancy in language models.
Even before that, I obtained a Masters in Information and Computer Engineering from the University of Cambridge, where I graduated top of my cohort. My final year project was on Bayesian Inference and Differential Privacy.