Illumina/BeadArrayFiles

Error on calling TOP_strand genotypes

Closed this issue · 2 comments

Hi thanks for providing this python library for array genotype calling.

I wanted to include top_strand genotypes as a part of the final report by including top_strand_genotypes = gtc.get_base_calls() in the gtc_final_report.py as shown below.

for gtc_file in samples:
        sys.stderr.write("Processing " + gtc_file + "\n")
        gtc_file = os.path.join(args.gtc_directory, gtc_file)
        gtc = GenotypeCalls(gtc_file)
        genotypes = gtc.get_genotypes()
        top_strand_genotypes = gtc.get_base_calls()
        plus_strand_genotypes = gtc.get_base_calls_plus_strand(manifest.snps, manifest.ref_strands)
        forward_strand_genotypes = gtc.get_base_calls_forward_strand(manifest.snps, manifest.source_strands)
        normalized_intensities = gtc.get_normalized_intensities(manifest.normalization_lookups)
        b_allele_freq = gtc.get_ballele_freqs()
        logr_ratio = gtc.get_logr_ratios()

        assert len(genotypes) == len(manifest.names)
        for (name, chrom, map_info, genotype, top_strand_genotype, ref_strand_genotype, source_strand_genotype, (x_norm, y_norm), b_freq, log_r_ratio) in zip(manifest.names, manifest.chroms, manifest.map_infos, genotypes, top_strand_genotypes, plus_strand_genotypes, forward_strand_genotypes, normalized_intensities, b_allele_freq, logr_ratio):
            output_handle.write(delim.join([name, os.path.basename(gtc_file)[:-4], chrom, str(map_info), code2genotype[genotype], top_strand_genotype, ref_strand_genotype, source_strand_genotype, str(x_norm), str(y_norm), str(b_freq), str(log_r_ratio)])  + "\n")

However, I encountered the issue below.

Traceback (most recent call last):
  File "gtc_gp2_final_report.py", line 57, in <module>
    output_handle.write(delim.join([name, os.path.basename(gtc_file)[:-4], chrom, str(map_info), code2genotype[genotype], top_strand_genotype, ref_strand_genotype, source_strand_genotype, str(x_norm), str(y_norm), str(b_freq), str(log_r_ratio)])  + "\n")
TypeError: sequence item 5: expected str instance, bytes found

When I removed the parts related to top_strand_genotype, the script worked.
I am not sure what went wrong and how to modify it.
If someone has experience to call TOP genotypes, I would appreciate your input on this matter.

Thanks in advance.
Zih-Hua

@zihhuafang I was able to reproduce this. Looks like it can return a list of byte arrays on https://github.com/Illumina/BeadArrayFiles/blob/develop/module/GenotypeCalls.py#L399 vs. a list of strings from

def get_base_calls_generic(self, snps, strand_annotations, report_strand, unknown_annotation):

For a quick fix, I think you can just cast top genotypes to a string when you write it. i.e. str(top_strand_genotype) in your example.

@jjzieve Thanks a lot for the quick fix. I ended up with str(top_strand_genotype,'UTF-8') to print the genotypes without b'XX'.
Thanks again!