Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scaling the Growing Neural Gas for Visual Cluster Analysis
School of Informatics, University of Skövde, Sweden.
Department of Computer Science and Media Technology, Linnaeus University, Sweden.
Faculty of Computer Science, Dalhousie University, Canada.
Jönköping University, School of Engineering, JTH, Department of Computer Science and Informatics.ORCID iD: 0000-0003-2900-9335
2021 (English)In: Big Data Research, ISSN 2214-5796, E-ISSN 2214-580X, article id 100254Article in journal (Refereed) Published
Abstract [en]

The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the most populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, disclosing cluster patterns in datasets. It is common, however, for GNG to result in highly connected graphs when trained on high-dimensional data, which in turn leads to highly cluttered 2D representations that may fail to disclose meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside.

This paper presents two methods that improve GNG for the visualization of cluster patterns in large-scale and high-dimensional datasets. The first one focuses on providing more accurate and meaningful 2D visual representations for cluster patterns of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology which may, in turn, result in overplotting and clutter. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model.

Quantitative and qualitative evaluations show that the first method leads to the creation of lower-dimensional graph structures that provide more meaningful (and sometimes more accurate) cluster representations with less overplotting and clutter; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in large-scale and distributed settings.

Place, publisher, year, edition, pages
Elsevier, 2021. article id 100254
Keywords [en]
Growing neural gas, Big data, Visual analytics, Unsupervised learning, Exploratory data analysis
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hj:diva-54283DOI: 10.1016/j.bdr.2021.100254ISI: 000710458600012Scopus ID: 2-s2.0-85113545584Local ID: HOA;intsam;758509OAI: oai:DiVA.org:hj-54283DiVA, id: diva2:1586175
Available from: 2021-08-19 Created: 2021-08-19 Last updated: 2024-07-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Riveiro, Maria

Search in DiVA

By author/editor
Riveiro, Maria
By organisation
JTH, Department of Computer Science and Informatics
In the same journal
Big Data Research
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 100 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf