knights-lab/BURST

Strange encoding characters in output file

Closed this issue · 5 comments

Hi,
my blast style output is including some very unusual text codes, e.g.

���

����

i.e.

(ENQ)(ENQ)(ENQ)(LF)
(LF)
(ENQ)(BS)(ENQ)(SOH)

This prevents any downstream processing of the files and they are difficult to remove automatically.
Is ths a bug that can be fixed easily?

Thanks,

Theo

I added this to my pipeline for burst.. removed all special text codes and lines starting with tabs or newlines.

sed -i 's/[^[:print:]\t]//g' out.burst
sed -i '/^$/d' out.burst
sed -i '/^\t/ d' out.burst

Thanks, on further checking I think this is a problem with how I am saving reads with no alignment.. more difficult than I had thought.

I've had to reopen this issue. The problem definitely is with the BURST output file but it is very difficult to replicate. Random output files contain the bad text codes. I may have to upload the full db and input files causing the problem. I will update when I can nail it down.

Thanks,

Theo

This is not resolved, but it is not possible to replicate.