(2021). Transformed CNNs: recasting pre-trained convolutional layers with self-attention. arXiv preprint arXiv:2106.05795.


(2021). More data or more parameters? Investigating the effect of data structure on generalization. arXiv preprint arXiv:2103.05524.


(2021). ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases. arXiv preprint arXiv:2103.10697.

ArXiv Source code

(2020). Triple descent and the two kinds of overfitting: where & why do they appear?. Advances in Neural Information Processing Systems.

PDF ArXiv Slides

(2020). The dynamics of learning with feedback alignment. arXiv preprint arXiv:2011.12428.

Arxiv Slides

(2020). Scaling description of generalization with number of parameters in deep learning. Journal of Statistical Mechanics: Theory and Experiment.

ArXiv J. Stat. Mech

(2020). Double Trouble in Double Descent: Bias and Variance (s) in the Lazy Regime. International Conference on Machine Learning.

ArXiv ICML ICML presentation Slides Medium

(2019). Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Physical Review E.

Arxiv Phys. Rev. E

(2019). Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias. Advances in Neural Information Processing Systems.

ArXiv NeurIPS Slides

(2019). Conditioned Query Generation for Task-Oriented Dialogue Systems. arXiv preprint arXiv:1911.03698.

ArXiv Source Code

(2019). A jamming transition from under-to over-parametrization affects generalization in deep learning. Journal of Physics A: Mathematical and Theoretical.


(2018). Electromagnetic Emission from Supermassive Binary Black Holes Approaching Merger. The Astrophysical Journal.

ArXiv Ap. J.