Abstract: Image-text retrieval aims to align image regions with textual words for semantic matching, facilitating bidirectional retrieval between images and texts. While significant progress has been ...