Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explainable local and global models for fine-grained multimodal product recognition
Jönköping University, School of Engineering. University of Skövde; ITAB Shop Products AB Sweden.
Jönköping University, School of Engineering, JTH, Department of Computer Science and Informatics.ORCID iD: 0000-0003-2900-9335
Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).ORCID iD: 0000-0003-0274-9026
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Grocery product recognition techniques are emerging in the retail sector and are used to provide automatic checkout counters, reduce self-checkout fraud, and support inventory management. However, recognizing grocery products using machine learning models is challenging due to the vast number of products, their similarities, and changes in appearance. To address these challenges, more complex models are created by adding additional modalities, such as text from product packages. But these complex models pose additional challenges in terms of model interpretability. Machine learning experts and system developers need tools and techniques conveying interpretations to enable the evaluation and improvement of multimodal production recognition models.

In this work, we propose thus an approach to provide local and global explanations that allow us to assess multimodal models for product recognition. We evaluate this approach on a large fine-grained grocery product dataset captured from a real-world environment. To assess the utility of our approach, experiments are conducted for three types of multimodal models.

The results show that our approach provides fine-grained local explanations while being able to aggregate those into global explanations for each type of product. In addition, we observe a disparity between different multimodal models, in what type of features they learn and what modality each model focuses on. This provides valuable insight to further improve the accuracy and robustness of multimodal product recognition models for grocery product recognition.

Place, publisher, year, edition, pages
2023.
Keywords [en]
Multimodal classification, Explainable AI, Grocery product recognition, LIME, Fine-grained recognition, Optical character recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:hj:diva-62382OAI: oai:DiVA.org:hj-62382DiVA, id: diva2:1794027
Conference
Multimodal KDD 2023, International Workshop on Multimodal Learning, in conjunction with 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023), August 6–10, 2023, Long Beach, CA, USA
Available from: 2023-09-04 Created: 2023-09-04 Last updated: 2024-07-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Full-text

Authority records

Riveiro, MariaLöfström, Tuwe

Search in DiVA

By author/editor
Riveiro, MariaLöfström, Tuwe
By organisation
School of EngineeringJTH, Department of Computer Science and InformaticsJönköping AI Lab (JAIL)
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 98 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf