Multi-assignment clustering: Machine learning from a biological perspectiveShow others and affiliations
2021 (English)In: Journal of Biotechnology, ISSN 0168-1656, E-ISSN 1873-4863, Vol. 326, p. 1-10Article in journal (Refereed) Published
Abstract [en]
A common approach for analyzing large-scale molecular data is to cluster objects sharing similar characteristics. This assumes that genes with highly similar expression profiles are likely participating in a common molecular process. Biological systems are extremely complex and challenging to understand, with proteins having multiple functions that sometimes need to be activated or expressed in a time-dependent manner. Thus, the strategies applied for clustering of these molecules into groups are of key importance for translation of data to biologically interpretable findings. Here we implemented a multi-assignment clustering (MAsC) approach that allows molecules to be assigned to multiple clusters, rather than single ones as in commonly used clustering techniques. When applied to high-throughput transcriptomics data, MAsC increased power of the downstream pathway analysis and allowed identification of pathways with high biological relevance to the experimental setting and the biological systems studied. Multi-assignment clustering also reduced noise in the clustering partition by excluding genes with a low correlation to all of the resulting clusters. Together, these findings suggest that our methodology facilitates translation of large-scale molecular data into biological knowledge. The method is made available as an R package on GitLab (https://gitlab.com/wolftower/masc).
Place, publisher, year, edition, pages
Elsevier, 2021. Vol. 326, p. 1-10
Keywords [en]
Clustering, K-means, annotation enrichment, multiple cluster assignment, pathways, transcriptomics
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:hj:diva-51257DOI: 10.1016/j.jbiotec.2020.12.002ISI: 000616124700001PubMedID: 33285150Scopus ID: 2-s2.0-85097644109Local ID: HOAOAI: oai:DiVA.org:hj-51257DiVA, id: diva2:1510961
Funder
Knowledge Foundation, 2014/0301, 2017/03022020-12-172020-12-172021-03-15Bibliographically approved