You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Clust

From EverybodyWiki Bios & Wiki


Clust is a clustering algorithm that was designed in order to extract biologically meaningful clusters from gene expression data[1]. Key features of this algorithm are based on the philosophical argument that the authors of clust presented regarding the biological expectations of the application of cluster analysis to gene expression data. The main argument is that, given some gene expression dataset that covers a specific observation window (e.g. some time-points, biological conditions, or developmental stages), not all of the genes in the dataset are expected to behave in a coordinate manner. Therefore, not all genes in the dataset should be included in one of the generated clusters[2][3]. Therefore, the authors argue that gene expression clustering is a cluster extraction problem and not a data partitioning problem. In other words, the algorithm should extract good clusters from the given dataset in contrast to partitioning the entire given dataset into a set of clusters[4]. Accordingly, clust was designed and validated against seven mainstream clustering methods over 100 real gene expression datasets[1].

Key features[edit]

Amongst the key features provided by clust are:

  • Automatic normalization of data: when a user does not specifically dictate which normalization techniques should be applied to their dataset(s), clust automatically detects the most suitable normalization and applies it.
  • Automatic identification of number of clusters.
  • Automatic filtering of data.
  • Cluster tightness can be tuned by users if needed.
  • Ability to analyze multiple datasets simultaneously.
  • Ability to analyze cross-species datasets simultaneously.
  • Ability to analyze datasets from different technologies (e.g. microarrays and RNA-seq) simultaneously.

Availability[edit]

Clust is available as an open-source command line package [1]. A beta web-interface is available for users to upload data, run clust, and download results with no need of any command-line experience [2].

Simultaneous clustering of multiple datasets[edit]

Clust offers the capability of applying cluster analysis to more than one dataset simultaneously. In this case, clust finds the groups (clusters) of genes which are consistently co-expressed in each one of the given datasets. Both the command-line based clust package and the web-interface allow users to submit multiple datasets to clust.

Cross-species clustering[edit]

If the multiple datasets to be clustered simultaneously are from different species (e.g. human and mouse), the user must provide clust with a gene-ID map file that defines which genes in one of the species are orthologous to which genes from the other species. This orthology information can be readily downloaded from relevant repositories such as the NCBI HomoloGene database [3] and the Phytozome portal [4], or can be generated for any given set of species by tools such as the OrthoFinder algorithm [5] as long as their proteomes are available. This capability can be utilized for cross-species comparative analysis.

Workflow[edit]


This article "Clust" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Clust. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

  1. 1.0 1.1 Abu-Jamous, Basel; Kelly, Steven (2018-10-25). "Clust: automatic extraction of optimal co-expressed gene clusters from gene expression data". Genome Biology. 19 (1): 172. doi:10.1186/s13059-018-1536-8. ISSN 1474-760X. PMC 6203272. PMID 30359297.
  2. Nilsson, Roland; Schultz, Iman J.; Pierce, Eric L.; Soltis, Kathleen A.; Naranuntarat, Amornrat; Ward, Diane M.; Baughman, Joshua M.; Paradkar, Prasad N.; Kingsley, Paul D.; Culotta, Valeria C.; Kaplan, Jerry; Palis, James; Paw, Barry H.; Mootha, Vamsi K. (2009-08-06). "Discovery of Genes Essential for Heme Biosynthesis through Large-Scale Gene Expression Analysis". Cell Metabolism. 10 (2): 119–130. doi:10.1016/j.cmet.2009.06.012. ISSN 1550-4131. PMC 2745341. PMID 19656490.
  3. Pierson, Emma; Koller, Daphne; Battle, Alexis; Mostafavi, Sara (2015-05-13). "Sharing and Specificity of Co-expression Networks across 35 Human Tissues". PLOS Computational Biology. 11 (5): e1004220. doi:10.1371/journal.pcbi.1004220. ISSN 1553-7358. PMC 4430528. PMID 25970446.
  4. Kerr, G.; Ruskin, H.J.; Crane, M.; Doolan, P. (2008-03-01). "Techniques for clustering gene expression data". Computers in Biology and Medicine. 38 (3): 283–293. CiteSeerX 10.1.1.152.8499. doi:10.1016/j.compbiomed.2007.11.001. ISSN 0010-4825. PMID 18061589.