KwatMDPhD/GSEA.jl

Logic error in selecting which sets to plot

ACastanza opened this issue · 9 comments

It appears that the sets selected for plotting on the positive and negative sides are (independently) off by one. I.e. only the top 19 and bottom 19 plots are being produced when "number_of_extreme_gene_sets_to_plot": 20, was set in the json.

Additionally, if there are fewer than 20 (19) sets on one "side" of the enrichment (i.e. there are only 10 sets enriched in the negative phenotype) then the remainder will be plotted selected from the "bottom" sets of the positive phenotype.

See here where three sets at the very bottom of this list are being plotted:

Screen Shot 2022-08-11 at 1 30 06 PM

This is occurring because there are only 16 sets enriched on the other side of the distribution:

Screen Shot 2022-08-11 at 1 38 43 PM

(The presence of the "Details" link indicates that there is a plot present for this set)

If you open those Details..., do the plot title match the gene-set name? If not, then this could be downstream error. Meanwhile, I'm looking into this upstream.

Yes, the plot titles match the set names in the table

I checked the setting number and the number of plots in plot/ and they match for all cases. I don't know what is going on here.

Okay, this is odd, somehow HALLMARK_E2F_TARGETS, and HALLMARK_IL6_JAK_STAT3_SIGNALING were removed from my results table. So at least that part of this is on me. The wrapping around to plot the bottom sets of another phenotype is the only issue then.

Can you try deleting the output and try again? gsea does not delete.

Huh? These are clean runs in a clean docker container every time. The correct numebr of plots are being created, but if there aren't N>nsets to plot on one side of the distribution it's wrapping around.

Ah I see. I don't think I take into account the "side" now. I just pick the top and bottom. I can fix this. I know what to do. By the way can you check discord?

The fix for those sets being dropped ended up being simple, I was changing from 0 based to 1 based indexing incorrectly in creating the results html table

This should also solve #65, or vice versa.