Learning to mask and permute visual tokens for vision transformer pre-training
Published in Computer Vision and Image Understanding, 2025
A novel approach for pre-training Vision Transformers using masking and permutation strategies.
Recommended citation: L. Baraldi, R. Amoroso, M. Cornia, A. Pilzer, R. Cucchiara (2025). "Learning to mask and permute visual tokens for vision transformer pre-training." Computer Vision and Image Understanding, 104294.
Download Paper