
I am a Ph.D. candidate in Computer Science at University College London, where my work is grounded in multivariate information theory and its applications to modern machine learning.
Research
My research is motivated by multivariate information theory — a framework for reasoning about how information is shared, decomposed, and transformed across complex systems. The methods I develop are theoretically rooted but broadly applicable, with applications in:
- Feature selection, engineering, and interpretation
- Reinforcement learning
- Neural network pruning and compression
- Variational inference and representation learning
Recent work has been published at SIGKDD, AISTATS, and ICML.
Experience
- Ph.D. Candidate · University College London 2021 — Present
Applications of partial information decomposition to AI. Supervised by Mirco Musolesi and Stephen Hailes.
- Research Assistant · UCL Computer Science May — Jun 2026
Applying partial information decomposition to understanding agents. Supervised by Mirco Musolesi and Stephen Hailes.
- MATS Scholar · Machine Learning Alignment & Theory Scholars Jun 2025 — Mar 2026
Improving steganography, its detection, and understanding its theoretical limits. Supervised by Fernando Rosas and Keivan Navaie.
Publications
Conference & Journal
Hide and Seek in Embedding Space: Geometry-based Steganography and Detection in Large Language Models
ICML 2026
A geometry-based steganography scheme that hides secrets in fine-tuned LLM outputs via embedding-space hyperplanes, together with a linear-probe detector that exposes it more reliably than traditional steganalysis.
Feature Selection for Network Intrusion Detection
SIGKDD 2025
An information-theoretic feature selection method (FSNID) that drops uninformative inputs for network intrusion detection while preserving classifier performance.
Partial Information Decomposition for Data Interpretability and Feature Selection
AISTATS 2025
PIDF replaces a single feature-importance score with three: how much information a feature shares with the target, how much arises only in combination with others (synergy), and how much is redundant with what other features already carry.
Preprints
A Generalized Information Bottleneck Theory of Deep Learning
arXiv preprint, 2025
Recasts the Information Bottleneck through synergy — information that only appears when features are processed jointly — yielding interpretable compression phases in ReLU networks, CNNs, and Transformers where standard IB struggles.
Mutual Information Preserving Neural Network Pruning
arXiv preprint, 2024
A structured pruning method that keeps the nodes carrying mutual information between adjacent layers, with a guarantee that the pruned upstream activations can still be mapped to the downstream layer — so the network remains retrainable.
Information-theoretic State Variable Selection for Reinforcement Learning
arXiv preprint, 2024
The Transfer Entropy Redundancy Criterion (TERC): an information-theoretic test that provably drops state variables with no effect on agent performance, improving sample efficiency across Q-learning, Actor-Critic, and PPO.
Talks
- Feature Selection for Network Intrusion Detection: An Information-Theoretic Approach
- Partial Information Decomposition for Data Interpretability and Feature Selection