Triple descent and the two kinds of overfitting: where & why do they appear?
Stéphane d' Ascoli
Advances in Neural Information Processing Systems
Transformed CNNs: recasting pre-trained convolutional layers with self-attention
ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases
More data or more parameters? Investigating the effect of data structure on generalization
Finding the Needle in the Haystack with Convolutions: on the benefits of architectural bias
Scaling description of generalization with number of parameters in deep learning