An algorithm for computing flowtigs from DNA reads of a metagenome.
The steps to compute flowtigs from reads are the following:
First, install Rust if not yet installed.
Run the following code snippet in your terminal window
$ curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh
Follow this link and follow the instructions to install rustup.
ggcat takes reads as input and outputs a file that can be used by flowtigs, see example of the input and output format of ggcat. Download ggcat with the following commands
git clone https://github.com/algbio/ggcat --recursive
cd ggcat/
git checkout a91ecc97f286b737b37195c0a86f0e11ad6bfc3b
cargo update time
cargo install --path crates/cmdline/ --locked --features "kmer-counters"
Then, ggcat is run with
ggcat build -k <k> -j <threads> -e -s <minimum multiplicity> '<input file name>' -o '<output file name>'
where
<k>
represents the desired k-value, which is the same that will be used by flowtigs<threads>
represents the number of threads on which ggcat will run<minimum multiplicity>
minimum multiplicity needed for a k-mer to occur<input file>
represents the path to the input file which contains the reads<output file>
represents the path to the desired output file, which will be the input file for flowtigs
If you get the error message Command 'ggcat' not found
, either add $HOME/.cargo/bin
to your $PATH
variable, or run ggcat with the following command intead.
~/.cargo/bin/ggcat build -k <k> -j <threads> -e -s <minimum multiplicity> '<input file name>' -o '<output file name>'
The input of flowtigs should be a file in the same format as the output of bcalm or ggcat. An example can be seen here.
Clone this project with the following commands
git clone https://github.com/elieling/flowtigs.git
cd flowtigs
cargo install --path . --locked
Then, run flowtigs with the folowing code in the project directory
flowtigs --input "<input file>" -k <k> -t <threshold> --output "<output file>"
where
<input file>
represents the path to the input file<k>
represents the desired k-value<threshold>
represents the desired threshold for filtering. To run flowtigs without filtering, use threshold 0<output file>
represents the path to the desired output file
If you get the error message Command 'flowtigs' not found
, either add $HOME/.cargo/bin
to your $PATH
variable, or run flowtigs with the following command intead.
~/.cargo/bin/flowtigs --input "<input file>" -k <k> -t <threshold> --output "<output file>"
The output of flowtigs is a FASTA file, which contains the safe maximal string sequences named by an index from 0 to <total number of sequences> - 1
. See example here.