Poor distribution of cluster sizes

Question

Poor distribution of cluster sizes

yashuap opened this issue a year ago · 4 comments

We are trying to cluster together genes with similar expression patterns. We tried using the clusterExpressionPatterns function, but it resulted in a 76 clusters with the biggest cluster being 11575 genes large, the second biggest cluster being 1331 genes large, and the smallest cluster being 5 genes big.

Is there any way to set up the clustering to both merge smaller clusters together and split up the larger cluster?

Answer 1 · 2024-02-03T16:34:37.000Z

Hi,
tradeSeq is using RSEC beneath the hood for clustering. I would advise that you check the vignette to modify RSEC's arguments and recover fewer clusters.

Secondly, we usually recommand performing clustering on just the top 500 or 1000 differentially expressed genes, to avoid having too much noise.

Let me know if this helps,

Answer 2 · 2024-02-06T16:00:45.000Z

Replying to comment above:
We tried modifying the RSEC by giving thresholds (eg: minimum of 120 genes in a cluster) which helps in reducing the total number of clusters generated but doesn't help with the fact that many generated clusters look very similar. Is there a way to avoid generating very similar clusters?

Answer 3 · 2024-03-01T11:17:01.000Z

Hi @rayajallad

If you're not satisfied with the RSEC clusters, you can pick any clustering method of your choosing to perform the clustering and tune the clusters as such. Please see the following section in our vignette: https://bioconductor.org/packages/release/bioc/vignettes/tradeSeq/inst/doc/tradeSeq.html#extracting-fitted-values-to-use-with-any-clustering-method

Answer 4 · 2024-06-21T13:09:02.000Z

Closing due to inactivity, feel free to reopen if needed.