Can´t load own Data using DTCR_SS.Get_Data
junyho486 opened this issue · 7 comments
TRB.txt
I have TCRseq Data which was annotated by IGB and preprocessed for DeepTCR as indicated in the tutorial.
I have 9 Samples with many TCRs, here is an excerpt of the Data for one Sample:
cdr3_aa v_call d_call j_call Count
ASSARQDLQQY TRBV2*01 TRBD1*01 TRBJ2-7*01 39890
ASKDRALLRAV TRBV21-1*01 TRBD1*01 TRBJ2-7*01 32323
ASSFSATNTGELF TRBV5-1*01 TRBD2*01 TRBJ2-2*01 26637
ASSPGEQNTGELF TRBV7-8*01 TRBD2*01 TRBJ2-2*01 26258
ASSGAGTGGYNEQF TRBV12-3*01 TRBD1*01 TRBJ2-1*01 16692
ASSFSGHTGELF TRBV7-2*01 TRBD2*01 TRBJ2-2*01 13838
ASSVETGTEKY TRBV7-9*01 TRBD1*01 TRBJ2-3*01 13831
PPVIWTATSST TRBV24-1*01 TRBD1*01 TRBJ2-7*01 13819
ASSSGLAGAYEQY TRBV7-2*02 TRBD2*01 TRBJ2-7*01 13216
ASSFGVSGANVLT TRBV7-9*03 TRBD2*01 TRBJ2-6*01 11449
ASSGLAGGPGTGELF TRBV9*01 TRBD2*02 TRBJ2-2*01 11292
ASSPLAGGVAQF TRBV7-6*01 TRBD2*02 TRBJ2-1*01 11019
ASSSTGQGNSYEQY TRBV28*01 TRBD1*01 TRBJ2-7*01 10466
If I run the Tutorial using the example Data from the Repository for supervised Sequence Classification, loading Data, cluster etc. works perfectly (except for DTCR_SS.Train() which throws:
[AttributeError: 'DeepTCR_SS' object has no attribute 'test_pred']()
DTCR_SS.Monte_Carlo_CrossVal, DTCR_SS.K_Fold_CrossVal etc. work.
If I then replace the Folders in Data/Murine_Antigens with my Samples, DTCR_SS.Get_Data() which usually takes just a moment to load the data gets stuck (stopped it after 40min).
Even after only using TCRs >= 1000 Reads which results in Tables between 50-80 rows, does not resolve the issue.
import sys
sys.path.append('../../')
from DeepTCR.DeepTCR import DeepTCR_SS
# Instantiate training object
DTCR_SS = DeepTCR_SS('Tutorial')
#Load Data from directories
DTCR_SS.Get_Data(directory='../../Data/TRB',Load_Prev_Data=False,aggregate_by_aa=True,
aa_column_beta=0,count_column=4,v_beta_column=1,j_beta_column=3)
Output:
Loading Data ...
Is there anything that could cause this kind of Bug?
Attached you will find the data for one Sample for TCR-seqs > 1000 (as .txt file saved .tsv)
Thank you in Advance for your help!
that bug should be fixed now in v 2.1.17.
As for the Get_Data method, it takes files that are csv/tsv. just change the extension of the file and it should work.
Well, I did a clean install today, like so:
- conda create -n DEEPTCR python=3.8.0
- conda activate DEEPTCR
- pip3 install DeepTCR // pip3 install git+https://github.com/sidhomj/DeepTCR.git -> same bug
- conda install ipykernel
- ipykernel install --user DEEPTCR
The Files do have .tsv format, I just changed to .txt for uploading them on github.
Sorry. are both bugs still there? or just the latter?
Thank you for the quick response!
I installed DeepTCR several times today trying to find the bug. Currently I am running the stable installation and see both bugs.
Edit: I reinstalled into a new env using pip3 install git+https://github.com/sidhomj/DeepTCR.git and the first bug seems to be resolved, but the second one persists.
second bug fixed. it was an issue with the expected order of columns in the files. I fixed the loading function so the order does not matter anymore. let me know if it works now!
Thank you so much! I was struggling with this one all day...
Now both issues are resolved for the unsupervised and supervised model!
Ps: Also congrats for creating DeepTCR it is a very impressive tool!