Cannot pass sequences via standard input to `kalign`.
todddeluca opened this issue · 5 comments
Hi,
I recently upgraded from kalign 2.04 to kalign 3.1.1 and now my program that used kalign is no longer working. I am guessing that this is because kalign 3.1.1 does not support passing sequence in via stdin.
Here is an example of kalign 2.04 working:
$ echo '>108885075
MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGVNFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISMDNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTVPVLSGVLLTGSDSGLTISGFDYEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFSLPTMAVEDYPTLPTLPDETGTLPSDVFAEAIGQVAIAAGRDYTLPMLTGIRIEISGDTVVLAATDRFRLAVRELKWSVLSSDFEASVLVPAKTLVEVAKAGTDGSGVCLSLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDVAELTEAIKLVALVADRGAQVRMEFGDGILRLSAGADDVGRAEEDLAVAFTGEPLTIAFNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMPVRLPG
' | ./kalign -f fasta
Kalign version 2.04, Copyright (C) 2004, 2005, 2006 Timo Lassmann
Kalign is free software. You can redistribute it and/or modify
it under the terms of the GNU General Public License as
published by the Free Software Foundation.
reading from STDIN: found 2 sequences
Aligning 2 protein sequences with these parameters:
54.94940948 gap open penalty
8.52492046 gap extension
4.42409992 terminal gap penalty
0.20000000 bonus
Alignment will be written to stdout.
Distance Calculation:
100 percent done
Alignment:
100 percent done
>108885075
------------MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYAN
NEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSIN
LNTINVNEFPRIRF--NEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREIS-SKFNGV
NFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLL----SFINSFNPEEDKSIV
FYYRKDNKDSFSTEMLISMDNFM--------ISYTSVNEKFPEVNYFFEFEPETKIVVQK
NELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNIS
FNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQ---------------ILVP
SR---
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTV-PVLSGVLLTGSDSGLTISGF
D--YEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFS
LPTMAVEDYPTLPTLPDETGTLPS--------DVFAEAIGQV--AIAAGRDYTLPMLTGI
RIE-ISGDTVVLAATDRFRLAVRELKWSVLSSDF--EASVLVPAKTLVEVAKAGTDGSGV
CL-------SLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDV
AELTEAIKLVALVA--DRGAQVRMEFGDGILRLSAGADDVGRAEEDLA-VAFTGEPLTIA
FNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMP
VRLPG
Here is an example of 3.1.1 not working:
$ echo '>108885075
MKILINKSELNKILKKMNNVIISNNKIKPHHSYFLIEAKEKEINFYANNEYFSVKCNLNKNIDILEQGSLIVKGKIFNDLINGIKEEIITIQEKDQTLLVKTKKTSINLNTINVNEFPRIRFNEKNDLSEFNQFKINYSLLVKGIKKIFHSVSNNREISSKFNGVNFNGSNGKEIFLEASDTYKLSVFEIKQETEPFDFILESNLLSFINSFNPEEDKSIVFYYRKDNKDSFSTEMLISMDNFMISYTSVNEKFPEVNYFFEFEPETKIVVQKNELKDALQRIQTLAQNERTFLCDMQINSSELKIRAIVNNIGNSLEEISCLKFEGYKLNISFNPSSLLDHIESFESNEINFDFQGNSKYFLITSKSEPELKQILVPSR
>15826867
MDLAKTNVGCSDLKFCLARESFASAVSWVAKYLPTRPTVPVLSGVLLTGSDSGLTISGFDYEVSAEVQVAAEIASSGSVLVSGRLLSDITRALPNKPVHFYVDGNRVALTCGSARFSLPTMAVEDYPTLPTLPDETGTLPSDVFAEAIGQVAIAAGRDYTLPMLTGIRIEISGDTVVLAATDRFRLAVRELKWSVLSSDFEASVLVPAKTLVEVAKAGTDGSGVCLSLGAGVGVGKDGLFGISGGGKRSTTRLLDAEFPKFRQLLPAEHTAVATIDVAELTEAIKLVALVADRGAQVRMEFGDGILRLSAGADDVGRAEEDLAVAFTGEPLTIAFNPNYLTDGLASVHSERVSFGFTTPSKPALLRPTSNDDVHPTHDGPFPALPTDYVYLLMPVRLPG
' | kalign -f fasta
Kalign (3.1.1)
Copyright (C) 2006,2019 Timo Lassmann
This program comes with ABSOLUTELY NO WARRANTY; for details type:
`kalign -showw'.
This is free software, and you are welcome to redistribute it
under certain conditions; consult the COPYING file for details.
Please cite:
Lassmann, Timo.
"Kalign 3: multiple sequence alignment of large data sets."
Bioinformatics (2019)
https://doi.org/10.1093/bioinformatics/btz795
WARNING: AVX2 instruction set not found!
Kalign will not run optimally.
[2020-02-13 10:54:22] : LOG : No infiles
I am passing sequences via stdin to avoid the filesystem overhead of writing sequences to a file, since the orthology software I maintain invokes kalign
many times.
Do you have any suggestions for how I can upgrade to 3.1.1 without writing an input file for every time I want to call kalign
? Thanks!
Hi Todd,
I made a new version supporting reading from stdin and combining multiple input files:
Passing sequences via stdin:
cat input.fa | kalign -f fasta > out.afa
Combining multiple input files:
kalign seqsA.fa seqsB.fa seqsC.fa -f fasta > combined.afa
Let me know if this works for you!
Hi,
Recently I use the command line:
conda install -c bioconda kalign3
to install kalign3.2.2, and it seems doesn't work whether I run the command line:
cat input.fa | kalign -f fasta > out.afa
or
kalign -i input.fasta -f fasta -o out.fasta
The output show as the below:
Kalign (3.2.2)
Copyright (C) 2006,2019,2020 Timo Lassmann
This program comes with ABSOLUTELY NO WARRANTY; for details type:
`kalign -showw'.
This is free software, and you are welcome to redistribute it
under certain conditions; consult the COPYING file for details.
Please cite:
Lassmann, Timo.
"Kalign 3: multiple sequence alignment of large data sets."
Bioinformatics (2019)
https://doi.org/10.1093/bioinformatics/btz795
WARNING: AVX2 instruction set not found!
Kalign will not run optimally.
[2021-04-10 00:30:22] : LOG : kalign -f fasta
[2021-04-10 00:30:22] : LOG : Detected protein sequences.
[2021-04-10 00:30:22] : LOG : Done reading input sequences in 0.066175 seconds.
[2021-04-10 00:30:22] : LOG : Detected: 210 sequences.
[2021-04-10 00:30:22] : LOG : Calculating pairwise distances
In fact, my fasta sequences are DNA, but the output shows that it detected protein sequences. Could you please tell me what is the problem?
Thanks!
Can you try with the current version on github (v3.3)?
Hi,
Thanks for your reply. I have known what my problem is. I got that error because my DNA fasta file has other alphabets such as "N" or "R", and my Linux version is too old.
Have a nice day!