Problem while conversion from IMPTE2PEDMAP format code
Closed this issue · 5 comments
Hi, I trying to convert Shapeit phased files to plink format my command-line was:
Step1: completed successfully
python convert_shapeit2_to_impute2.py CHR2.Phased.haps CHR2.Phased.sample impute.chr2.haps impute.chr2.legend impute.chr2.sample
Step2: (Error)
python convert_impute2_to_PEDMAP.py impute.chr2.haps impute.chr2.legend CHR2.Phased.sample chr2 2
It prompts an error message:
Error:
Traceback (most recent call last):
File "/share_bio/unisvx3/zengchq_group/sohail/softwares/SHAPEIT2PLINK/SHAPEIT_to_PLINK-master/convert_impute2_to_PEDMAP.py", line 251, in
p_id = [x[3] for x in sample_info]
IndexError: list index out of range
Can anyone please help me to resolve this issue?
Thanks!
-sohail
Hi,
The error hints at some sort of problem/inconsistency in the SHAPEIT2 .sample
file (see line 246, quoted below):
sample_info = [x.replace('\n', '').split() for x in open(sys.argv[3], 'r').readlines()[2:]]
because the list sample_info
is populated after reading that file and is used later in line 251 to populate the list p_id
which is the origin of the error message.
Are you sure your original CHR2.Phased.sample
file adheres to the SHAPEIT2 file format (see here)? It could be that recent versions of SHAPEIT have changed the file format (I am not sure about this though!).
Hi,
My sample file format is:
ID_1 ID_2 missing
0 0 0
103-RQ 103-RQ 0
109-EP 109-EP 0
16-EJ 16-EJ 0
31-CE 31-CE 0
40-JI 40-JI 0
43-AM 43-AM 0
50-JB 50-JB 0
I uploaded the file here: https://www.dropbox.com/s/4o4rtsmpu3cm82f/CHR2.Phased.sample?dl=0
It appears differently in windows notepad and here.
I wonder do we only require three columns in the sample file as in the code there might be indicated many as shown below (Line 246):
sample_info = [x.replace('\n', '').split() for x in open(sys.argv[3], 'r').readlines()[2:]]
sample_names = [x[1] for x in sample_info]
family_id = [x[0] for x in sample_info]
p_id = [x[3] for x in sample_info]
m_id = [x[4] for x in sample_info]
gender = [x[5] for x in sample_info]
pheno = [x[6] for x in sample_info]
Please have a look!
Thanks
Thanks for uploading the file. Yes, the problem is that your .sample
file has only three columns. I am guessing that my phased data had more columns, corresponding to indices 3
to 6
, and that is why you are getting the error message. You can comment out the following lines, as follows:
# p_id = [x[3] for x in sample_info]
# m_id = [x[4] for x in sample_info]
# gender = [x[5] for x in sample_info]
# pheno = [x[6] for x in sample_info]
and change lines 256-273 to the following:
returned = Convert_impute2_to_PEDMAP(
sys.argv[5], # chromosome number
sys.argv[2], # .legend file
sys.argv[1], # .haps file
sample_names,
None,
family_id,
None,
# p_id,
# None,
# m_id,
# None,
# gender,
# None,
# pheno,
# None,
sys.argv[4] # output file name
)
and lines 6-23 to the following
def Convert_impute2_to_PEDMAP(
chromosome = None,
legend_file = None,
haplotypes_file = None,
sample_names = None,
sample_names_filename = None,
family_id = None,
family_id_filename = None,
# p_id = None,
# p_id_filename = None,
# m_id = None,
# m_id_filename = None,
# gender = None,
# gender_filename = None,
# pheno = None,
# pheno_filename = None,
output = None,
):
and change line 57 to the following
pedInfo.append([family_id[currentIndiv], sample_names[currentIndiv]])#, p_id[currentIndiv], m_id[currentIndiv], gender[currentIndiv], pheno[currentIndiv]])
as a temporary fix.
@muhammadsohailraza did my previous reply fix your issue?
Hi @baharian
Actually, i was in hurry and i simply preferred to add extra columns in the sample file rather than changing the code and it works perfectly fine.. (there should be 7 columns starting from line 2)
For instance:
ID_1 ID_2 missing
0 0 0 0 0 0 0
103-RQ 103-RQ 0 0 0 0 0
109-EP 109-EP 0 0 0 0 0
16-EJ 16-EJ 0 0 0 0 0
31-CE 31-CE 0 0 0 0 0
40-JI 40-JI 0 0 0 0 0
43-AM 43-AM 0 0 0 0 0
50-JB 50-JB 0 0 0 0 0