Oshlack/ALLSorts

ValueError: Length of passed values is 0, index implies 19.

Closed this issue · 5 comments

Run successfully with demo data, run with my own real data to report errors.

1)inputfile(reads count on 20 genes)
,DSC3,IGF2BP1,KCNK3,IGJ,PTPRM,KCNN1,CD99,HAP1,AGAP1,CHST2,SLC2A5,IFITM1,ABHD3,PCLO,TNS1,NCF2,MIB1,TCFL5,ARHGEF4,RGMA
hezhiyan,0,1,0,0,0,0,8,0,0,1,0,0,0,2,0,5,0,6,7,0
LIANGXIAOLAN,0,0,0,9,0,0,0,1,0,6,0,1,2,6,0,0,0,1,10,0
LIAOJIFENG,0,0,9,0,12,0,0,0,13,207,0,2,0,1,0,2,0,11,108,0
lilingyi,0,0,0,0,14,1,0,0,16,85,0,0,1,0,0,4,4,26,2,0
LXINZHE,4,37,0,0,1,23,9,13,1,1,0,0,13,23,0,8,1,6,3,4
NGR200625001,1,0,9,10,1,0,125,5,1,9,35,13,2,1,13,21,12,18,0,0
qianli,1,0,0,0,1,1,0,0,1,10,0,5,0,4,0,4,0,3,2,1
qinpeixiang,0,0,0,88,1,1,109,1,8,177,5,203,7,0,408,83,20,26,8,10
luozhimin,0,0,0,1,0,2,504,1,0,0,48,168,1,8,29,44,20,78,0,0
luoyangying,0,0,0,8,4,23,301,4,11,30,28,310,10,7,55,200,3,27,10,4
huangzhegnhan,7,1,0,7,4,5,979,4,79,9,230,75,22,16,158,89,84,288,1,1
chenziyang,7,4,0,2,8,70,86,1,0,14,22,52,28,70,9,26,26,291,16,3
zhuxiangyan,0,0,0,0,3,0,486,0,0,2,114,97,5,1,10,17,7,94,1,2
zhaojinsheng,0,0,0,4,0,1,88,0,0,47,60,9,2,0,10,21,2,27,1,0
qinzuofa,0,0,0,0,1,0,219,0,0,25,26,145,2,0,3,177,4,50,0,0
huanglianbin,0,0,0,0,0,0,267,0,0,4,6,24,3,0,4,25,5,28,1,0
xuxunjia,0,0,0,0,0,2,5,0,0,0,2,4,2,0,0,9,0,0,0,0
xuyongyi,1,0,0,0,0,5,6,0,1,7,0,12,3,7,1,14,2,0,4,0
ZYZ,0,0,0,0,0,0,2,2,1,0,0,0,5,2,1,4,0,0,6,0

2)command line
python ALLSorts -samples reverse.gene.forRF.classifier.HTseqCount.xls -destination test/

3)Error message

Prediction Mode

Loading classifier...
Saving predictions...
/home/hanxl/TEST/ExpressionProfile/LIMINGZE/ph-like_method/AllSorts/tools/MoRP/morp/morp.py:86: RuntimeWarning: divide by zero encountered in true_divide
np.divide(counts_normalised,
/home/hanxl/TEST/ExpressionProfile/LIMINGZE/ph-like_method/AllSorts/tools/MoRP/morp/morp.py:86: RuntimeWarning: invalid value encountered in true_divide
np.divide(counts_normalised,
/home/hanxl/TEST/ExpressionProfile/LIMINGZE/ph-like_method/AllSorts/tools/MoRP/morp/morp.py:109: RuntimeWarning: invalid value encountered in true_divide
if (original/scaler) == normalised:
/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/numpy/lib/function_base.py:3942: RuntimeWarning: invalid value encountered in multiply
x2 = take(ap, indices_above, axis=axis) * weights_above
/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/numpy/lib/nanfunctions.py:1115: RuntimeWarning: All-NaN slice encountered
r, k = function_base._ureduce(a, func=_nanmedian, axis=axis, out=out,
Traceback (most recent call last):
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "ALLSorts/main.py", line 20, in
allsorts.run()
File "ALLSorts/allsorts.py", line 90, in run
probabilities = allsorts_clf.predict_proba(ui.samples, parents=ui.parents)
File "ALLSorts/pipeline.py", line 115, in predict_proba
Xt = transform.transform(Xt)
File "ALLSorts/stages/feature_creation.py", line 310, in transform
self._iamp21Feature(counts),
File "ALLSorts/stages/feature_creation.py", line 200, in _iamp21Feature
temp = bin_counts.apply(median_filter, mode="constant", size=11, axis=1)
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/pandas/core/frame.py", line 6878, in apply
return op.get_result()
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/pandas/core/apply.py", line 180, in get_result
return self.apply_empty_result()
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/pandas/core/apply.py", line 220, in apply_empty_result
return self.obj._constructor_sliced(r, index=self.agg_axis)
File "/home/hanxl/Software/anaconda3/envs/allsorts/lib/python3.8/site-packages/pandas/core/series.py", line 291, in init
raise ValueError(
ValueError: Length of passed values is 0, index implies 19.

Thanks for giving ALLSorts a go! Let's see if we can get it working for you.

Please try converting your .xls file to a .csv (it should be fine to just export from excel) and giving it another go.

In addition, ALLSorts uses a large set of genes in order to make predictions. If your example above is what you are inputting (20 genes), the classifier will likely run into an error as it needs a minimum set to work. Refer to the wiki entry 0. Counts matrix format for more specific instructions regarding this.

If you need a hand, don't hesitate to ask! Otherwise, let me know how you go.

Edit: I have realised that there is a direct link to this repository from a recent paper that used the first version, hence the confusion. My fault, I should have realised. Thank you for highlighting this.

Good to see you got it working, let me address your questions:

The description in the literature made me think that only 20 genes can be used for classification.

Noticing that your input file is named forRF.classifier I think that you may have mistaken this version of ALLSorts for the previous attempt which was a Random Forest Classifier (found here). This version is distinct from that one and does not use Random Forest - it is a complete reimagining. Apologies if that has been a source for confusion! I will try to highlight this better in the welcome page.

in fact, there are more than 10,000 genes

In this version, a large set of genes is required for input as there are some custom features which are created from many. This is very different from how the original method worked, which I think only did use ~20 genes (I did not build that method personally).

What is the difference between Ph group, Ph and Ph-like classification in the result

Ph and Ph-like share a similar transcriptional signal, but Ph-like is defined as lacking the BCR:ABL1 fusion gene.
Ph Group is a meta-subtype that encapsulates both Ph and Ph-like.

This version of ALLSorts uses a hierarchical classification approach. In the case of Ph/Ph-like/Ph Group, the classifier will first attempt to attribute a sample to according to Ph Group, then it will attempt to classify between Ph and Ph-like.

The subtypes available within this classifier are listed via the reference below, with some differences (e.g. the inclusion of KMT2A/ZNF384/Ph/High signature groups and the lack of CRLF2(non-Ph-like)). I am still in the process of including this information within this Wiki.

Gu, Z., Churchman, M. L., Roberts, K. G., Moore, I., Zhou, X., Nakitandwe, J., … Mullighan, C. G. (2019). PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nature Genetics, 51(2), 296–307.

It seems that the predictions of KMT2A and TCF3-PBX1 categories are relatively accurate, and the other accuracy is not enough

Keen to understand this more. Are the samples you're referring to being classified as "Unclassified" or are they being attributed to a different subtype than you expected?

If you cannot go into details here but you do want to discuss your results, feel free to e-mail me at breon.schmidt@petermac.org and we can look at the results privately.

No worries, closing the issue.

All the best.