Charles Westphal

Charles Westphal

PhD Student in ML · UCL · London

I am a Ph.D. candidate in Computer Science at University College London, where my work is grounded in multivariate information theory and its applications to modern machine learning.

Research

My research is motivated by multivariate information theory — a framework for reasoning about how information is shared, decomposed, and transformed across complex systems. The methods I develop are theoretically rooted but broadly applicable, with applications in:

  • Feature selection, engineering, and interpretation
  • Reinforcement learning
  • Neural network pruning and compression
  • Variational inference and representation learning

Recent work has been published at SIGKDD, AISTATS, and ICML.

Experience

  • Ph.D. Candidate · University College London 2021 — Present

    Applications of partial information decomposition to AI. Supervised by Mirco Musolesi and Stephen Hailes.

  • Research Assistant · UCL Computer Science May — Jun 2026

    Applying partial information decomposition to understanding agents. Supervised by Mirco Musolesi and Stephen Hailes.

  • MATS Scholar · Machine Learning Alignment & Theory Scholars Jun 2025 — Mar 2026

    Improving steganography, its detection, and understanding its theoretical limits. Supervised by Fernando Rosas and Keivan Navaie.

Publications

Conference & Journal

Preprints

A Generalized Information Bottleneck Theory of Deep Learning

Charles Westphal, Stephen Hailes, Mirco Musolesi

arXiv preprint, 2025

Recasts the Information Bottleneck through synergy — information that only appears when features are processed jointly — yielding interpretable compression phases in ReLU networks, CNNs, and Transformers where standard IB struggles.

Mutual Information Preserving Neural Network Pruning

Charles Westphal, Stephen Hailes, Mirco Musolesi

arXiv preprint, 2024

A structured pruning method that keeps the nodes carrying mutual information between adjacent layers, with a guarantee that the pruned upstream activations can still be mapped to the downstream layer — so the network remains retrainable.

\[\mathcal{X}_* \in \Bigl\{ \mathcal{P} \in \mathscr{P}(\mathcal{X}) \;:\; |\mathcal{P}| = \min_{H(A \mid \mathcal{P}) = H(A \mid \mathcal{X})} |\mathcal{P}| \;\;\&\;\; H(A \mid \mathcal{P}) = H(A \mid \mathcal{X}) \Bigr\}\]

Information-theoretic State Variable Selection for Reinforcement Learning

Charles Westphal, Stephen Hailes, Mirco Musolesi

arXiv preprint, 2024

The Transfer Entropy Redundancy Criterion (TERC): an information-theoretic test that provably drops state variables with no effect on agent performance, improving sample efficiency across Q-learning, Actor-Critic, and PPO.

Talks

  • Feature Selection for Network Intrusion Detection: An Information-Theoretic Approach KDD 2025 · Toronto, Canada · May 2025
  • Partial Information Decomposition for Data Interpretability and Feature Selection AiStats 2025 · Splash Beach Resort, Mai Khao, Phuket, Thailand · Apr 2025

CV

Download my CV (PDF)

Contact

charles.westphal.21@ucl.ac.uk