Abstract: Transformer-based video generation models have demonstrated significant potential in content creation. However, the current state-of-the-art model employing “ 3 D full attention” encounters ...
To further test the robustness of the model against background interference, we propose an ImageNet background interference test set, ImageNet-Bg, based on the ImageNet validation set with 48,285 ...