This module implements a Differential Attention Vision Transformer. The key idea is to replace the standard softmax attention with a differential attention mechanism as described in the paper: ...
% D = DIFF2(V) is the first-order difference (1xN) of the series data in % vector V (1xN) and the first element is zero. % D = DIFF2(A) is the first-order difference (MxN) of the series data in % each ...