Scaling the Growing Neural Gas for Visual Cluster Analysis
2021 (English)In: Big Data Research, ISSN 2214-5796, E-ISSN 2214-580X, article id 100254Article in journal (Refereed) Published
Abstract [en]
The growing neural gas (GNG) is an unsupervised topology learning algorithm that models a data space through interconnected units that stand on the most populated areas of that space. Its output is a graph that can be visually represented on a two-dimensional plane, disclosing cluster patterns in datasets. It is common, however, for GNG to result in highly connected graphs when trained on high-dimensional data, which in turn leads to highly cluttered 2D representations that may fail to disclose meaningful patterns. Moreover, its sequential learning limits its potential for faster executions on local datasets, and, more importantly, its potential for training on distributed datasets while leveraging from the computational resources of the infrastructures in which they reside.
This paper presents two methods that improve GNG for the visualization of cluster patterns in large-scale and high-dimensional datasets. The first one focuses on providing more accurate and meaningful 2D visual representations for cluster patterns of high-dimensional datasets, by avoiding connections that lead to high-dimensional graphs in the modeled topology which may, in turn, result in overplotting and clutter. The second method presented in this paper enables the use of GNG on big and distributed datasets with faster execution times, by modeling and merging separate parts of a dataset using the MapReduce model.
Quantitative and qualitative evaluations show that the first method leads to the creation of lower-dimensional graph structures that provide more meaningful (and sometimes more accurate) cluster representations with less overplotting and clutter; and that the second method preserves the accuracy and meaning of the cluster representations while enabling its execution in large-scale and distributed settings.
Place, publisher, year, edition, pages
Elsevier, 2021. article id 100254
Keywords [en]
Growing neural gas, Big data, Visual analytics, Unsupervised learning, Exploratory data analysis
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hj:diva-54283DOI: 10.1016/j.bdr.2021.100254ISI: 000710458600012Scopus ID: 2-s2.0-85113545584Local ID: HOA;intsam;758509OAI: oai:DiVA.org:hj-54283DiVA, id: diva2:1586175
2021-08-192021-08-192024-07-16Bibliographically approved