Explainable local and global models for fine-grained multimodal product recognition
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]
Grocery product recognition techniques are emerging in the retail sector and are used to provide automatic checkout counters, reduce self-checkout fraud, and support inventory management. However, recognizing grocery products using machine learning models is challenging due to the vast number of products, their similarities, and changes in appearance. To address these challenges, more complex models are created by adding additional modalities, such as text from product packages. But these complex models pose additional challenges in terms of model interpretability. Machine learning experts and system developers need tools and techniques conveying interpretations to enable the evaluation and improvement of multimodal production recognition models.
In this work, we propose thus an approach to provide local and global explanations that allow us to assess multimodal models for product recognition. We evaluate this approach on a large fine-grained grocery product dataset captured from a real-world environment. To assess the utility of our approach, experiments are conducted for three types of multimodal models.
The results show that our approach provides fine-grained local explanations while being able to aggregate those into global explanations for each type of product. In addition, we observe a disparity between different multimodal models, in what type of features they learn and what modality each model focuses on. This provides valuable insight to further improve the accuracy and robustness of multimodal product recognition models for grocery product recognition.
Place, publisher, year, edition, pages
2023.
Keywords [en]
Multimodal classification, Explainable AI, Grocery product recognition, LIME, Fine-grained recognition, Optical character recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:hj:diva-62382OAI: oai:DiVA.org:hj-62382DiVA, id: diva2:1794027
Conference
Multimodal KDD 2023, International Workshop on Multimodal Learning, in conjunction with 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023), August 6–10, 2023, Long Beach, CA, USA
2023-09-042023-09-042024-07-16Bibliographically approved