nloyfer/wgbs_tools

find_markers error

dengyihan1464 opened this issue · 2 comments

Thank you for the tool!

I have met some problems when running:

$ wgbstools find_markers -g groups.csv --betas GSE186458_RAW/*.hg38.beta -b blocks..bed.gz --min_cpg 5 --min_bp 10 --max_bp 1500 -c 10

The error is:

py:87: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  blocks_df[b] = dres[b]
Invalid input argument
`popmean.shape[axis]` must equal 1.

I would be greatly appreciated if you could spend some of your time check the process for me!

The possible cause led to the issue is numpy's version.

GWW commented

There is a simple code fix for this, in find_markers.py on lines 246-249

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values.flatten(), axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values.flatten(), axis=1, nan_policy='omit')

The flatten() call needs to be removed:

            if len(self.tg_names) == 1:
                r = ttest_1samp(tf[self.bg_names], tf[self.tg_names].values, axis=1, nan_policy='omit')
            elif len(self.bg_names) == 1:
                r = ttest_1samp(tf[self.tg_names], tf[self.bg_names].values, axis=1, nan_policy='omit')