Publications

(2022). Optimal learning rate schedules in high-dimensional non-convex optimization problems. arXiv preprint arXiv:2202.04509.

ArXiv Code

(2022). End-to-End Symbolic Regression with Transformers. arXiv preprint arXiv:2204.10532.

ArXiv Demo

(2022). Deep Symbolic Regression for Recurrent Sequences. arXiv preprint arXiv:2201.04600.

ArXiv Yannic Kilcher video Demo

(2021). Transformed CNNs: recasting pre-trained convolutional layers with self-attention. arXiv preprint arXiv:2106.05795.

ArXiv

(2021). On the interplay between data structure and loss function in classification problems. Advances in Neural Information Processing Systems.

ArXiv NeurIPS Code

(2021). ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. Internation Conference on Machine Learning.

ArXiv ICML Blog post Code W & B reading group ICML presentation

(2021). Align, then memorise: the dynamics of learning with feedback alignment. International Conference on Machine Learning.

Arxiv ICML J. Phys. A Code ICML presentation

(2020). Triple descent and the two kinds of overfitting: where and why do they appear?. Advances in Neural Information Processing Systems.

PDF ArXiv Slides NeurIPS J. Stat Code DeepMath NeurIPS presentation

(2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment.

ArXiv J. Stat. Mech

(2020). Double Trouble in Double Descent: Bias and Variance (s) in the Lazy Regime. International Conference on Machine Learning.

ArXiv ICML Medium Code DeepMath ICML presentation

(2020). Conditioned Text Generation with Transfer for Closed-Domain Dialogue Systems. International Conference on Statistical Language and Speech Processing.

ArXiv Springer Code

(2019). Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Physical Review E.

Arxiv Phys. Rev. E Code

(2019). Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias. Advances in Neural Information Processing Systems.

ArXiv NeurIPS Slides Code

(2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical.

ArXiv J. Phys. A Code

(2018). Electromagnetic Emission from Supermassive Binary Black Holes Approaching Merger. The Astrophysical Journal.

ArXiv Ap. J. NASA press release Video