find_markers error

Question

find_markers error

dengyihan1464 opened this issue 2 years ago · 2 comments

Thank you for the tool!

I have met some problems when running:

$ wgbstools find_markers -g groups.csv --betas GSE186458_RAW/*.hg38.beta -b blocks..bed.gz --min_cpg 5 --min_bp 10 --max_bp 1500 -c 10

The error is:

py:87: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  blocks_df[b] = dres[b]
Invalid input argument
`popmean.shape[axis]` must equal 1.

I would be greatly appreciated if you could spend some of your time check the process for me!

Answer 1 · 2023-03-28T09:09:47.000Z

The possible cause led to the issue is numpy's version.

Answer 2 · 2023-04-05T14:20:20.000Z

There is a simple code fix for this, in find_markers.py on lines 246-249

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values.flatten(), axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values.flatten(), axis=1, nan_policy='omit')

The flatten() call needs to be removed:

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values, axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values, axis=1, nan_policy='omit')