Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal fine-grained grocery product recognition using image and OCR text
Jönköping University, School of Engineering. ITAB Shop Products AB; University of Skövde.ORCID iD: 0000-0001-8880-7965
Jönköping University, School of Engineering, JTH, Department of Computer Science and Informatics.ORCID iD: 0000-0003-2900-9335
Jönköping University, School of Engineering, JTH, Department of Computing, Jönköping AI Lab (JAIL).ORCID iD: 0000-0003-0274-9026
2024 (English)In: Machine Vision and Applications, ISSN 0932-8092, E-ISSN 1432-1769, Vol. 35, no 4, article id 79Article in journal (Refereed) Published
Abstract [en]

Automatic recognition of grocery products can be used to improve customer flow at checkouts and reduce labor costs and store losses. Product recognition is, however, a challenging task for machine learning-based solutions due to the large number of products and their variations in appearance. In this work, we tackle the challenge of fine-grained product recognition by first extracting a large dataset from a grocery store containing products that are only differentiable by subtle details. Then, we propose a multimodal product recognition approach that uses product images with extracted OCR text from packages to improve fine-grained recognition of grocery products. We evaluate several image and text models separately and then combine them using different multimodal models of varying complexities. The results show that image and textual information complement each other in multimodal models and enable a classifier with greater recognition performance than unimodal models, especially when the number of training samples is limited. Therefore, this approach is suitable for many different scenarios in which product recognition is used to further improve recognition performance. The dataset can be found at https://github.com/Tubbias/finegrainocr.

Place, publisher, year, edition, pages
Springer, 2024. Vol. 35, no 4, article id 79
Keywords [en]
Grocery product recognition, Multimodal classification, Fine-grained recognition, Optical character recognition
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hj:diva-64774DOI: 10.1007/s00138-024-01549-9ISI: 001243616100001Scopus ID: 2-s2.0-85195555790Local ID: HOA;;955228OAI: oai:DiVA.org:hj-64774DiVA, id: diva2:1867071
Funder
Swedish Research Council, 2018-05973Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Riveiro, MariaLöfström, Tuwe

Search in DiVA

By author/editor
Pettersson, TobiasRiveiro, MariaLöfström, Tuwe
By organisation
School of EngineeringJTH, Department of Computer Science and InformaticsJönköping AI Lab (JAIL)
In the same journal
Machine Vision and Applications
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 130 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf