Abstract: Due to the high-quality semantic information provided by the text modality, text-driven models have become the dominant approach for Multimodal Sentiment Analysis (MSA) in recent years.