Publications

(2023). ODEFormer: Symbolic Regression of Dynamical Systems with Transformers.

ArXiv Code Demo Twitter

(2023). Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400.

ArXiv

(2023). Boolformer: Symbolic Regression of Logic Functions with Transformers. arXiv preprint arXiv:2309.12207.

ArXiv Code Demo Twitter

(2022). Optimal learning rate schedules in high-dimensional non-convex optimization problems. arXiv preprint arXiv:2202.04509.

ArXiv Code Twitter

(2022). End-to-end symbolic regression with transformers. Advances in Neural Information Processing Systems.

ArXiv Demo Code Talk Twitter

(2022). Deep symbolic regression for recurrence prediction. International Conference on Machine Learning.

ArXiv Yannic Kilcher Demo Code Talk Twitter

(2021). Transformed CNNs: recasting pre-trained convolutional layers with self-attention. arXiv preprint arXiv:2106.05795.

ArXiv

(2021). On the interplay between data structure and loss function in classification problems. Advances in Neural Information Processing Systems.

ArXiv NeurIPS Code Talk

(2021). ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. Internation Conference on Machine Learning.

ArXiv ICML Blog post Code Long talk Short talk Twitter

(2021). Align, then memorise: the dynamics of learning with feedback alignment. International Conference on Machine Learning.

Arxiv ICML J. Phys. A Code Talk

(2020). Triple descent and the two kinds of overfitting: where and why do they appear?. Advances in Neural Information Processing Systems.

ArXiv NeurIPS J. Stat Code Talk

(2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment.

ArXiv J. Stat. Mech

(2020). Double Trouble in Double Descent: Bias and Variance (s) in the Lazy Regime. International Conference on Machine Learning.

ArXiv ICML Medium Code Talk

(2020). Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems. International Conference on Statistical Language and Speech Processing.

ArXiv Springer Code

(2019). Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Physical Review E.

Arxiv Phys. Rev. E Code

(2019). Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias. Advances in Neural Information Processing Systems.

ArXiv NeurIPS Slides Code

(2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical.

ArXiv J. Phys. A Code

(2018). Electromagnetic Emission from Supermassive Binary Black Holes Approaching Merger. The Astrophysical Journal.

ArXiv Ap. J. NASA press release Video