Error on calling TOP_strand genotypes
Closed this issue · 2 comments
Hi thanks for providing this python library for array genotype calling.
I wanted to include top_strand genotypes as a part of the final report by including top_strand_genotypes = gtc.get_base_calls()
in the gtc_final_report.py as shown below.
for gtc_file in samples:
sys.stderr.write("Processing " + gtc_file + "\n")
gtc_file = os.path.join(args.gtc_directory, gtc_file)
gtc = GenotypeCalls(gtc_file)
genotypes = gtc.get_genotypes()
top_strand_genotypes = gtc.get_base_calls()
plus_strand_genotypes = gtc.get_base_calls_plus_strand(manifest.snps, manifest.ref_strands)
forward_strand_genotypes = gtc.get_base_calls_forward_strand(manifest.snps, manifest.source_strands)
normalized_intensities = gtc.get_normalized_intensities(manifest.normalization_lookups)
b_allele_freq = gtc.get_ballele_freqs()
logr_ratio = gtc.get_logr_ratios()
assert len(genotypes) == len(manifest.names)
for (name, chrom, map_info, genotype, top_strand_genotype, ref_strand_genotype, source_strand_genotype, (x_norm, y_norm), b_freq, log_r_ratio) in zip(manifest.names, manifest.chroms, manifest.map_infos, genotypes, top_strand_genotypes, plus_strand_genotypes, forward_strand_genotypes, normalized_intensities, b_allele_freq, logr_ratio):
output_handle.write(delim.join([name, os.path.basename(gtc_file)[:-4], chrom, str(map_info), code2genotype[genotype], top_strand_genotype, ref_strand_genotype, source_strand_genotype, str(x_norm), str(y_norm), str(b_freq), str(log_r_ratio)]) + "\n")
However, I encountered the issue below.
Traceback (most recent call last):
File "gtc_gp2_final_report.py", line 57, in <module>
output_handle.write(delim.join([name, os.path.basename(gtc_file)[:-4], chrom, str(map_info), code2genotype[genotype], top_strand_genotype, ref_strand_genotype, source_strand_genotype, str(x_norm), str(y_norm), str(b_freq), str(log_r_ratio)]) + "\n")
TypeError: sequence item 5: expected str instance, bytes found
When I removed the parts related to top_strand_genotype, the script worked.
I am not sure what went wrong and how to modify it.
If someone has experience to call TOP genotypes, I would appreciate your input on this matter.
Thanks in advance.
Zih-Hua
@zihhuafang I was able to reproduce this. Looks like it can return a list of byte arrays on https://github.com/Illumina/BeadArrayFiles/blob/develop/module/GenotypeCalls.py#L399 vs. a list of strings from
BeadArrayFiles/module/GenotypeCalls.py
Line 307 in dc4eb37
For a quick fix, I think you can just cast top genotypes to a string when you write it. i.e. str(top_strand_genotype)
in your example.
@jjzieve Thanks a lot for the quick fix. I ended up with str(top_strand_genotype,'UTF-8')
to print the genotypes without b'XX'.
Thanks again!