Oliver Sieberling
osieberl@mit.edu
I am a first-year PhD student at MIT, advised by Yoon Kim. My research focuses on the pretraining of large neural networks. I am particularly interested in efficient sequence modeling, scalable architectures, and hardware-algorithm co-design.
I received my BSc in Computer Science from ETH Zurich in 2025. I also spent a semester abroad at Princeton University. For my bachelor’s thesis I worked with Johannes Lengler on discrete stochastic processes. During my undergrad I also worked with Dan Alistarh on LLM efficiency and with Johannes von Oswald on linear-time sequence modeling.
publications
- ICLRMesaNet: Sequence Modeling by Locally Optimal Test-Time TrainingIn International Conference on Learning Representations (ICLR), 2026
- NeurIPSQuartet: Native FP4 Training Can Be Optimal for Large Language ModelsIn Advances in Neural Information Processing Systems (NeurIPS), 2025
- ICMLEvoPress: Accurate Dynamic Model Compression via Evolutionary SearchIn International Conference on Machine Learning (ICML), 2025
- Preprint
- AlgorithmicaPlus Strategies Are Exponentially Slower for Planted Optima of Random HeightAlgorithmica, 2026. Conference version at GECCO 2024
- SN Comp. Sci.Hardest Monotone Functions for Evolutionary AlgorithmsSN Computer Science, 2025. Conference version at EvoCOP 2024 (Best Paper Award Nomination)
* equal contribution · αβ alphabetical ordering