TimoLassmann/kalign

Cannot pass sequences via standard input to `kalign`.

Closed this issue · 5 comments

Hi,

I recently upgraded from kalign 2.04 to kalign 3.1.1 and now my program that used kalign is no longer working. I am guessing that this is because kalign 3.1.1 does not support passing sequence in via stdin.

Here is an example of kalign 2.04 working:

$ echo '>108885075
MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGVNFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISMDNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTVPVLSGVLLTGSDSGLTISGFDYEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFSLPTMAVEDYPTLPTLPDETGTLPSDVFAEAIGQVAIAAGRDYTLPMLTGIRIEISGDTVVLAATDRFRLAVRELKWSVLSSDFEASVLVPAKTLVEVAKAGTDGSGVCLSLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDVAELTEAIKLVALVADRGAQVRMEFGDGILRLSAGADDVGRAEEDLAVAFTGEPLTIAFNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMPVRLPG
' | ./kalign -f fasta

Kalign version 2.04, Copyright (C) 2004, 2005, 2006 Timo Lassmann

        Kalign is free software. You can redistribute it and/or modify
        it under the terms of the GNU General Public License as
        published by the Free Software Foundation.

reading from STDIN: found 2 sequences
Aligning 2 protein sequences with these parameters:
        54.94940948	gap open penalty
        8.52492046	gap extension
        4.42409992	terminal gap penalty
        0.20000000	bonus
Alignment will be written to stdout.

Distance Calculation:
     100 percent done
Alignment:
     100 percent done
>108885075
------------MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYAN
NEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSIN
LNTINVNEFPRIRF--NEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREIS-SKFNGV
NFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLL----SFINSFNPEEDKSIV
FYYRKDNKDSFSTEMLISMDNFM--------ISYTSVNEKFPEVNYFFEFEPETKIVVQK
NELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNIS
FNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQ---------------ILVP
SR---
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTV-PVLSGVLLTGSDSGLTISGF
D--YEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFS
LPTMAVEDYPTLPTLPDETGTLPS--------DVFAEAIGQV--AIAAGRDYTLPMLTGI
RIE-ISGDTVVLAATDRFRLAVRELKWSVLSSDF--EASVLVPAKTLVEVAKAGTDGSGV
CL-------SLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDV
AELTEAIKLVALVA--DRGAQVRMEFGDGILRLSAGADDVGRAEEDLA-VAFTGEPLTIA
FNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMP
VRLPG

Here is an example of 3.1.1 not working:

$ echo '>108885075
MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGVNFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISMDNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTVPVLSGVLLTGSDSGLTISGFDYEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFSLPTMAVEDYPTLPTLPDETGTLPSDVFAEAIGQVAIAAGRDYTLPMLTGIRIEISGDTVVLAATDRFRLAVRELKWSVLSSDFEASVLVPAKTLVEVAKAGTDGSGVCLSLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDVAELTEAIKLVALVADRGAQVRMEFGDGILRLSAGADDVGRAEEDLAVAFTGEPLTIAFNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMPVRLPG
' | kalign -f fasta

Kalign (3.1.1)

Copyright (C) 2006,2019 Timo Lassmann

This program comes with ABSOLUTELY NO WARRANTY; for details type:
`kalign -showw'.
This is free software, and you are welcome to redistribute it
under certain conditions; consult the COPYING file for details.

Please cite:
  Lassmann, Timo.
  "Kalign 3: multiple sequence alignment of large data sets."
  Bioinformatics (2019) 
  https://doi.org/10.1093/bioinformatics/btz795


WARNING: AVX2 instruction set not found!
         Kalign will not run optimally.

[2020-02-13 10:54:22] :     LOG : No infiles

I am passing sequences via stdin to avoid the filesystem overhead of writing sequences to a file, since the orthology software I maintain invokes kalign many times.

Do you have any suggestions for how I can upgrade to 3.1.1 without writing an input file for every time I want to call kalign? Thanks!

Hi Todd,

I made a new version supporting reading from stdin and combining multiple input files:

Passing sequences via stdin:

cat input.fa | kalign -f fasta > out.afa 

Combining multiple input files:

kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa 

Let me know if this works for you!

Hi,

Recently I use the command line:
conda install -c bioconda kalign3 to install kalign3.2.2, and it seems doesn't work whether I run the command line:
cat input.fa | kalign -f fasta > out.afa or
kalign -i input.fasta -f fasta -o out.fasta

The output show as the below:

Kalign (3.2.2)

Copyright (C) 2006,2019,2020 Timo Lassmann

This program comes with ABSOLUTELY NO WARRANTY; for details type:
`kalign -showw'.
This is free software, and you are welcome to redistribute it
under certain conditions; consult the COPYING file for details.

Please cite:
Lassmann, Timo.
"Kalign 3: multiple sequence alignment of large data sets."
Bioinformatics (2019)
https://doi.org/10.1093/bioinformatics/btz795

WARNING: AVX2 instruction set not found!
Kalign will not run optimally.

[2021-04-10 00:30:22] : LOG : kalign -f fasta
[2021-04-10 00:30:22] : LOG : Detected protein sequences.
[2021-04-10 00:30:22] : LOG : Done reading input sequences in 0.066175 seconds.
[2021-04-10 00:30:22] : LOG : Detected: 210 sequences.
[2021-04-10 00:30:22] : LOG : Calculating pairwise distances

In fact, my fasta sequences are DNA, but the output shows that it detected protein sequences. Could you please tell me what is the problem?

Thanks!

Can you try with the current version on github (v3.3)?

Hi,

Thanks for your reply. I have known what my problem is. I got that error because my DNA fasta file has other alphabets such as "N" or "R", and my Linux version is too old.

Have a nice day!