Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration

The overall framework of our proposed COACHER.
The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases

Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning, i.e., the ability to associate similar entities and infer similar relations based on general understanding ofthe world. To fill this gap, we propose CommOnsense-integrAted sCene grapH rElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to modelthe neighborhoods and paths around entities in an external common-sense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative casestudies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach.