Poor distribution of cluster sizes
yashuap opened this issue · 4 comments
We are trying to cluster together genes with similar expression patterns. We tried using the clusterExpressionPatterns function, but it resulted in a 76 clusters with the biggest cluster being 11575 genes large, the second biggest cluster being 1331 genes large, and the smallest cluster being 5 genes big.
Is there any way to set up the clustering to both merge smaller clusters together and split up the larger cluster?
Hi,
tradeSeq is using RSEC beneath the hood for clustering. I would advise that you check the vignette to modify RSEC's arguments and recover fewer clusters.
Secondly, we usually recommand performing clustering on just the top 500 or 1000 differentially expressed genes, to avoid having too much noise.
Let me know if this helps,
Replying to comment above:
We tried modifying the RSEC by giving thresholds (eg: minimum of 120 genes in a cluster) which helps in reducing the total number of clusters generated but doesn't help with the fact that many generated clusters look very similar. Is there a way to avoid generating very similar clusters?
Hi @rayajallad
If you're not satisfied with the RSEC clusters, you can pick any clustering method of your choosing to perform the clustering and tune the clusters as such. Please see the following section in our vignette: https://bioconductor.org/packages/release/bioc/vignettes/tradeSeq/inst/doc/tradeSeq.html#extracting-fitted-values-to-use-with-any-clustering-method
Closing due to inactivity, feel free to reopen if needed.