Can new architectures reach public SOTA?

Statistical Methods for Machine Learning - UniMi 2022

Since 2017, Transformers have revolutionized everything Deep Learning related. First applied to text (e.g., BERT-like, T5-like, and GPT-like architectures), then successfully to images (e.g., ViT, Swin Transformer, DeIT etc.), and even on graphs (e.g., Graph Transformer), CNNs have been left behind in terms of performance. With ConvNeXt and other new architectures, the research community has tried to apply to ResNets the secret sauce that characterize Transformers, reaching state-of-the-art on ImageNet22k with all-CNN-based NNs. Can these new architectures help us reach SOTA on the Cats vs. Dogs Image Classification dataset?

banda-larga/smml2022

Can new architectures reach public SOTA?