This module implements a Differential Attention Vision Transformer. The key idea is to replace the standard softmax attention with a differential attention mechanism as described in the paper: ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results