Test-Time Training with Masked Autoencoders

Yossi Gandelsman* ¹,

Yu Sun* ¹,

Xinlei Chen ²,

Alexei A. Efros ¹

UC Berkeley¹

Meta AI²

* Equal contribution

NeurIPS 2022 [paper] [BibTeX] [code]

We train an MAE to reconstruct each test image at test time, masking 75% of the input patches. The three reconstructed images on the right visualize the progress of this one-sample learning problem. Loss of the main task (green) - object recognition - keeps dropping even after 500 steps of gradient descent, while the network continues to optimize for reconstruction (red).

Abstract

Test-time training adapts to a new test distribution on the fly by optimizing a model for each test input using self-supervision. In this paper, we use masked autoencoders for this one-sample learning problem. Empirically, our simple method improves generalization on many visual benchmarks for distribution shifts. Theoretically, we characterize this improvement in terms of the bias-variance trade-off.

Paper