Multi-modality Alignment and Fusion

PV2TEA: Patching Visual Modality to Textual-Established Information Extraction

Attribute value extraction, as a fundamental task in e-Commerce services, has been extensively studied and formulated as text-based extraction. However, many attributes can benefit from image-based extraction, like the product color, shape, pattern, among others.