Test-Time Training with Masked Autoencoders
 
Yossi Gandelsman* 1,
 
Yu Sun* 1,
 
Xinlei Chen 2,
 
Alexei A. Efros 1
UC Berkeley1
Meta AI2

* Equal contribution


NeurIPS 2022   [paper]   [BibTeX]   [code]

We train an MAE to reconstruct each test image at test time, masking 75% of the input patches. The three reconstructed images on the right visualize the progress of this one-sample learning problem. Loss of the main task (green) - object recognition - keeps dropping even after 500 steps of gradient descent, while the network continues to optimize for reconstruction (red).


Abstract

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.


Paper