Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Neuro-symbolic Visual Graph Question Answering with LLMs for language parsing
Vienna University of Technology, Vienna, Austria.
Vienna University of Technology, Vienna, Austria.
Vienna University of Technology, Vienna, Austria.
Vienna University of Technology, Vienna, Austria.ORCID iD: 0000-0002-9902-7662
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Images containing graph-based structures are an ubiquitous and popular form of data representation that, to the best of our knowledge, have not yet been considered in the domain of Visual Question Answering (VQA). We provide arespective novel dataset and present a modular neuro-symbolic approach as a first baseline. Our dataset extends CLEGR, an existing dataset for question answering on graphs inspired by metro networks. Notably, the graphs there are given in symbolic form, while we consider the more challenging problem of taking images of graphs as input. Our solution combines optical graph recognition for graph parsing, a pre-trained optical character recognition neural network for parsing node labels, and answer-set programming for reasoning. The model achieves an overall average accuracy of 73% on the dataset. While regular expressions are sufficient to parse the natural language questions, we also study various large-language models to obtain a more robust solution that also generalises well to variants of questions that are not part of the dataset. Our evaluation provides further evidence of the potential of modular neuro-symbolic systems, in particular with pre-trained models, to solve complex VQA tasks.

Place, publisher, year, edition, pages
2023.
Keywords [en]
neuro-symbolic computation, answer-set programming, visual question answering, large-language models
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hj:diva-63647OAI: oai:DiVA.org:hj-63647DiVA, id: diva2:1839555
Conference
TAASP 2023, Workshop on Trends and Applications of Answer Set Programming, November 20-21, 2023, Potsdam, Germany
Available from: 2024-02-21 Created: 2024-02-21 Last updated: 2024-02-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Oetsch, Johannes

Search in DiVA

By author/editor
Oetsch, Johannes
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 83 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf