Our core idea is to treat different retrieval models for multimodal data, such as text and images, as a process analogous to multi-sensor fusion. When independent, heterogeneous modality-specific ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results