maclandrol/FisherExact

capi_return is NULL/Call-back cb_f2pystop_in_prterr__user__routines failed

Closed this issue · 7 comments

Hello -

Thanks so much for developing this module :)

When I invoke fisher_exact def (running python 2.7 on mac - FisherExact installed with pip, gfotran is installed) I get the following errors, and then a python segmentation fault:

...
Call-back cb_f2pystop_in_prterr__user__routines failed.
capi_return is NULL
Call-back cb_f2pystop_in_prterr__user__routines failed.
capi_return is NULL
Call-back cb_f2pystop_in_prterr__user__routines failed.
capi_return is NULL
Call-back cb_f2pystop_in_prterr__user__routines failed.
Segmentation fault: 11

Here is my contingency table (2x113):

p_row = [89, 66, 33, 32, 28, 15, 16, 21, 12, 20, 17, 0, 3, 6, 3, 1, 7, 7, 10, 7, 6, 6, 5, 3, 2, 1, 0, 13, 7, 5, 0, 11, 3, 3, 1, 1, 0, 0, 0, 6, 5, 3, 3, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 10, 4, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
c_row = [90, 58, 44, 22, 18, 17, 15, 14, 13, 12, 12, 10, 9, 8, 6, 6, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

from FisherExact import Fisher
Fisher.fisher_exact([p_row, c_row])

Thanks so much for and advice you might have!

I am using numpy version 1.12.0.

Thank you!

The total sum of your input table is 1024. At some point, there is a need to compute at least factorial(1024), which is too large. At this moment, there isn't any way to compute the exact pvalue of your table without running out of memory (I also tried the fisher.test function in R).

I will try to fix the segmentation fault and raise an error instead (in order to not kill python, which is the desired action in interactive mode).

For the moment, try using the monte-carlo simulation instead, with at least 10000 replicates. You will have to increase the workspace too (10000 should be more than sufficient).

I thought that the input for the function had to be a file. Will it work with an array? Thanks!

If you use the binary (fexact from a terminal) then it should be a file. Otherwise, you should provide a mxn contingency table (numpy array or list of lists) as input, if you are using it inside a python script (see https://github.com/maclandrol/FisherExact/blob/master/README.md#use-as-a-module )

Example :
from FisherExact import Fisher
Fisher.fisher_exact(table)

Right now, your data seems to be in a single column. You should start by reformating it in order to highlight the different categories (ex : absence/presence of phenotype/mutations in each ancestry, etc) see http://udel.edu/~mcdonald/statfishers.html for an overview of the input table format. The default parameters should be sufficient.

In the following, I assume that your data should be reformated as follow:

a b c d e f
X 31 1 5 0 11 9
Y 9223 927 1065 740 119 2687
In  [2]: import FisherExact
In  [3]: c = [[31,1,5,0,11,9],[9223,927,1065,740,119,2687]]
In  [4]: FisherExact.fisher_exact(c)
Out [4]: 2.731091179948461e-11

the last value is the p-value (two-sided)

Thanks so much for your help! Using the monte-carlo simulation worked for me. I appreciate the advice and very quick responses.