INSaFLU (“INSide the FLU”) is a free bioinformatics web-based suite (https://insaflu.insa.pt/) that deals with primary data (reads) towards the automatic generation of the output data that are the core first-line “genetic requests” for effective and timely viral influenza and SARS-CoV-2 laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants annotation, alignments and phylogenetic trees).
INSaFLU is available for free at https://insaflu.insa.pt
Documentation (latest) for each INSaFLU module is provided at http://insaflu.readthedocs.io/
For an easy and rapid local installation using docker see here https://github.com/INSaFLU/docker.
INSaFLU (“INSide the FLU”) is a free web-based bioinformatics suite that deals with primary NGS data (reads) towards the automatic generation of the output data that are actually the core first-line “genetic requests” for effective and timely influenza laboratory surveillance (e.g., type and sub-type, gene and whole-genome consensus sequences, variants annotation, alignments and phylogenetic trees). This platform is influenza-oriented, but can be applicable to amplicon-derived NGS data of other pathogens, such as the novel coronavirus SARS-CoV-2.
Highlights / Main advantages
- open to all, free of charge, user-restricted accounts
- applicable to NGS data collected from any amplicon-based schema
- allows advanced, multi-step software intensive analyses in a user-friendly manner without previous advanced training in bioinformatics
- allows integrating data in a cumulative manner, thus fitting the analytical dynamics underlying the continuous epidemiological surveillance during flu epidemics
- outputs are provided in nomenclature-stable and standardized formats and can be explored in situ or through multiple compatible downstream applications for data analysis and visualization
Main outputs INSaFLU yields:
- influenza type and subtype/lineage
- gene and whole-genome consensus sequences
- annotation of variants and intra-host minor variants
- gene, protein and genome alignments
- gene- and genome-scale phylogenetic trees
Other features: INSaFLU also automatically provides:
- raw NGS data quality analysis and improvement
- a rapid snapshot of whole-genome backbone of each virus (draft assembled contigs are assigned to each viral segment and to close related reference influenza viruses).
- coverage statistics
- detection of putative mixed infections
NOTE: As of March 2020, INSaFLU also performs rapid classification and contigs assignment of five Human Betacoronavirus - BetaCoV, including the novel coronavirus SARS-CoV-2. The publicly available SARS-CoV-2 reference genome sequence (MN908947) is available for Mapping in the default INSaFLU reference database.
If you use INSaFLU in your work, please cite Borges V, Pinheiro M et al. Genome Medicine (2018) 10:46, https://doi.org/10.1186/s13073-018-0555-0
Miguel Pinheiro, Vitor Borges
For an easy and rapid installation using docker see here https://github.com/INSaFLU/docker.
This installation is oriented for Ubuntu Server 16.04 and Centos 7.X. There are several steps and packages to install, so, please, be patient. First, it is necessary to install and configure all bioinformatics software, then the database, batch-queuing system and, finally, the web site.
The user "flu_user" is used in all operations and it is going to be the user to run the apache web server.
###Some general packages to install in Ubuntu 18.04:
$ sudo apt install binutils libproj-dev gdal-bin dos2unix parallel
$ sudo apt install postgresql-10
$ sudo apt install postgresql-10-postgis-2.4
$ sudo apt install postgresql-10-postgis-scripts
$ sudo apt install python3
$ sudo apt install libdatetime-perl libxml-simple-perl libdigest-md5-perl git default-jre bioperl
###Some general packages to install in Centos 7.X:
$ sudo yum install gdal gdal-devel dos2unix parallel
$ sudo yum install postgis-10
$ sudo yum install postgresql-devel
$ sudo yum install python3
$ sudo yum install perl-Time-Piece perl-XML-Simple perl-Digest-MD5 git java perl-CPAN perl-Module-Build
$ sudo cpan -i Bio::Perl
# sudo yum install https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.7.1/ncbi-blast-2.7.1+-1.x86_64.rpm
The software can be installed in this directory "/usr/local/software/insaflu". If you choose other directory it is necessary to edit the file "constants/software_names.py" and set the variable "DIR_SOFTWARE".
$ sudo mkdir -p /usr/local/software/insaflu
$ sudo chown flu_user:flu_user /usr/local/software/insaflu
Software to install:
- IGVTools 2.3.98
- SPAdes 3.11.1
- Abricate 0.8-dev
- FastQC 0.11.9
- Trimmomatic 0.27
- Bamtools 2.5
- Prokka 1.12
- Mauve 2.4.0, Feb 13 2015
- Mafft 7.313
- seqret (EMBOSS) 6.6.0.0
- FastTreeDbl 2.1.10 Double precision
- freebayes v1.1.0-54-g49413aa - Also need some scripts available in freebays
- Snippy 3.2-dev
- samtools 1.3
- bgzip 1.3
- tabix 1.3
- snpEff 4.1l - Important, it's necessary to use this version. Recent versions have a problem when variants involve more than one base.
- freebayes v1.1.0-54-g49413aa
Some scripts to install:
- convertAlignment.pl
- this script need to be installed in <SoftwareNames.DIR_SOFTWARE>/scripts/convertAlignment.pl
- Fastq-tools 0.8
$ vi <install software path>/FastQC/0.11.9/FastQC/fastqc
and change the line my $memory = 250 * $threads;
to my $memory = 1000 * $threads;
bin/snippy-vcf_to_tab
to bin/snippy-vcf_to_tab_add_freq
and do this change:
$ cd /usr/local/software/insaflu/snippy/bin
$ cp snippy-vcf_to_tab snippy-vcf_to_tab_add_freq
$ vi snippy-vcf_to_tab_add_freq
and change the line 57 from:
print join("\t", qw(CHROM POS TYPE REF ALT EVIDENCE), @ANNO), "\n";
to
print join("\t", qw(CHROM POS TYPE REF ALT FREQ), @ANNO), "\n";
#!/usr/bin/env python
to #!/usr/bin/env python3
.
#xpto@brazil:/usr/local/software/insaflu/snippy/bin$ diff snippy snippy~
90c90
< parse_version( 'snpEff -version', 4.1, qr/(\d+\.\d+)/ );
---
> parse_version( 'snpEff -version', 4.3, qr/(\d+\.\d+)/ );
* postgresql 10.X
* create a database and a user. Then reflect these names in ".env" file in root path of web site.
Software:
* gzip
* [Sun Grid Engine/Open Grid Engine](https://arc.liv.ac.uk/downloads/SGE/releases)
* [download 8.1.9 version](https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge_8.1.9.tar.xz)
* queues that will be created:
* all.q - generic queue
* fast.q - to run quick process
* queue_1.q and queue_2.q - to run slow process
Install SGE/OGE tips
$ sudo groupadd -g 58 gridware
$ sudo useradd -u 63 -g 58 -d /opt/sge sgeadmin
$ cd ~
$ mkdir sge; cd sge
$ wget https://arc.liv.ac.uk/downloads/SGE/releases/8.1.9/sge_8.1.9.tar.xz
$ tar -xJvf sge_8.1.9.tar.xz
$ cd sge-8.1.9/source
$ scripts/bootstrap.sh
### centos version
$ sudo yum install hwloc-devel openssl-devel pam-devel libXt-devel motif motif-devel readline-devel
### ubuntu
$ sudo apt-get install libhwloc-dev libssl-dev
$ ./aimk -no-java -no-jni
### caveat in last command if something like this `ed.screen.c:(.text+0x247c): undefined reference to `tputs'` appears in the screen
$ cd 3rdparty/qtcsh/LINUXAMD64
Add "-lreadline -lncurses" to the end of command that fails
$ cd ../../..
$ ./aimk -no-java -no-jni
### end caveat
$ sudo su
# export SGE_ROOT=/opt/sge
# scripts/distinst -local -allall -noexit
# chown -R sgeadmin:gridware /opt/sge
# cd $SGE_ROOT
# ./install_qmaster
# . /opt/sge/default/common/settings.sh
# ./install_execd
### create a file to set the environment variables to SGE
$ sudo vi /etc/profile.d/sun-grid-engine.sh
## add the follow line to the file
. /opt/sge/default/common/settings.sh
Configure queues
Go to the folder example_script_sge_add_queue
and change second line in the files "grid_add_all_hosts.txt"
and replace 'brazil' to your computer name. Mine is 'brazil'
Get your computer name:
$ uname -a
Linux brazil 4.15.0-50-generic #54-Ubuntu SMP Mon May 6 18:46:08 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Add your name to manage list, to obtain permissions to change SGE configurations:
$ sudo qconf -am <your name>
qconf not found
or SGE_ROOT not set
, do something like this:
### this need to be improved
$ sudo locate settings.sh
$ sudo chmod a+x /opt/sge/default/common/settings.sh
$ /opt/sge/default/common/settings.sh
$ env | grep sge
$ sudo su
# PATH= ##### /opt/sge/bin:/opt/sge/bin/lx-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin ### the path from last env
# SGE_ROOT= ### /opt/sge ### the path from last env
# export SGE_ROOT;
# qconf -am <your name>
Then run:
$ cd example_script_sge_add_queue
$ qconf -Ahgrp grid_add_all_hosts.txt
Show all groups:
$ qconf -shgrpl
$ qconf -shgrp_resolved @allhosts
brazil
If you want to delete a group name:
$ qconf -dhgrp <a group name>
To add the queues:
$ qconf -Aq grid_add_queue_all.txt
$ qconf -Aq grid_add_queue_fast.txt
$ qconf -Aq grid_add_queue_queue_1.txt
$ qconf -Aq grid_add_queue_queue_2.txt
If you want to delete a queue:
$ qconf -dq <a queue name>
Show all info
$ qconf -sq <queue name>
Edit queues. If you want to change slots change the number in 'slots'.
$ qconf -mq <queue name>
Change the default schedule_interval
from 0:0:15
to 0:0:5
. This setting specifies how often the scheduler checks for new jobs.
$ qconf -msconf
After the OGE/SGE configuration you need to have these queue names in your system.
$ qstat -f
queuename qtype resv/used/tot. load_avg arch states
---------------------------------------------------------------------------------
all.q@brazil BIP 0/0/2 1.19 lx26-amd64
---------------------------------------------------------------------------------
fast.q@brazil BIP 0/0/1 1.19 lx26-amd64
---------------------------------------------------------------------------------
queue_1.q@brazil BIP 0/0/1 1.19 lx26-amd64
---------------------------------------------------------------------------------
queue_2.q@brazil BIP 0/0/1 1.19 lx26-amd64
brazil
is the name of the computer where the installation is. You have other certainly. The computer name need to be in /etc/hosts
with the IP address and not with localhost
to SGE work properly.
Example:
$ cat /etc/hosts
127.0.0.1 localhost
::1 ip6-localhost ip6-loopback
192.168.1.14 brazil
Of course you have a different IP address from '192.168.1.14'
More help to configure queues.
$ sudo mkdir -p /usr/local/web_site
$ sudo mkdir -p /var/log/insaFlu
$ sudo chown flu_user:flu_user /usr/local/web_site
$ sudo chown flu_user:flu_user /var/log/insaFlu
$ cd /usr/local/web_site
$ git clone https://github.com/INSaFLU/INSaFLU.git
$ cd INSaFLU
$ sudo pip3 install -r requirements.txt
$ cp .env_model .env
Edit the file ".env" and config all variables. Define also a backend to the email. I have defined a posix server.
To create the database
$ python3 manage.py migrate
To create a super user, it is going to be the administrator user account
$ python3 manage.py createsuperuser
To join all files, in "static_all" path, that is necessary to run the web site and then read default databases. All data that belong to databases are in "/static_all/db/..."
$ python3 manage.py collectstatic
$ python3 manage.py load_default_files
Test if all bioinformatic tolls are installed
$ cd /usr/local/web_site
$ python3 manage.py test constants.tests_software_names
Test everything
$ cd /usr/local/web_site
$ python3 manage.py test
If all tests passed you can test immediately it is working:
$ cd /usr/local/web_site
$ python3 manage.py runserver
Go to your internet explorer and write the ip of the computer where the web site is installed ":8000". If it is in same computer can be "localhost:8000". If it is working let's go to install in a Apache web server. If you prefer, can be in a Nginx web server too.
###Config apache2 in Centos 7.X:
Add flu_user
to the apache
group and add insaflu.conf
to apache2.
$ sudo usermod -a -G flu_user apache
$ sudo vi /etc/httpd/conf.d/insaflu.conf
<VirtualHost *:80>
# General setup for the virtual host, inherited from global configuration
ServerName insaflu.pt
Alias /media /usr/local/web_site/INSaFLU/media
Alias /static /usr/local/web_site/INSaFLU/static_all
<Directory "/usr/local/web_site/INSaFLU/static_all">
Require all granted
</Directory>
<Directory "/usr/local/web_site/media">
Options FollowSymLinks
AllowOverride None
Require all granted
</Directory>
#### for log files
<Directory "/var/log/insaFlu">
Require all granted
</Directory>
<Directory "/usr/local/web_site/INSaFLU/fluwebvirus">
<Files "wsgi.py">
Require all granted
</Files>
</Directory>
WSGIDaemonProcess flu_user.insa.pt user=flu_user group=flu_user python-path=/usr/local/web_site/INSaFLU/fluwebvirus;/usr/lib/python3.<minor version of your python>/site-packages
WSGIProcessGroup flu_user.insa.pt
WSGIScriptAlias / /usr/local/web_site/INSaFLU/fluwebvirus/wsgi.py
# Use separate log files for the SSL virtual host; note that LogLevel
# is not inherited from httpd.conf.
ErrorLog /var/log/httpd/insaflu_error.log
TransferLog /var/log/httpd/insaflu_transfer.log
LogLevel warn
</VirtualHost>
$ sudo yum install httpd-devel mod_wsgi
$ sudo updatedb
## START small caveat...
$ mv /etc/httpd/modules/mod_wsgi.so /etc/httpd/
OR
$ locate mod_wsgi.so
$ sudo ln -s <last hit for the locate> /etc/httpd/modules/mod_wsgi.so
## END small caveat...
$ sudo systemctl restart httpd
$ sudo systemctl status httpd
###Config apache2 in Ubuntu 16.X:
Add flu_user
to the apache
group and add insaflu.conf
to apache2.
$ sudo usermod -a -G flu_user apache
$ sudo apt install libapache2-mod-wsgi-py3
$ sudo vi /etc/apache2/sites-available/insaflu.conf
<VirtualHost *:80>
# General setup for the virtual host, inherited from global configuration
ServerName insaflu.pt
Alias /media /usr/local/web_site/media
Alias /static /usr/local/web_site/static_all
<Directory "/usr/local/web_site/static_all">
Require all granted
</Directory>
<Directory "/usr/local/web_site/media">
Options FollowSymLinks
AllowOverride None
Require all granted
</Directory>
#### for log files
<Directory "/var/log/insaFlu">
Require all granted
</Directory>
<Directory "/usr/local/web_site/insaflu">
<Files "wsgi.py">
Require all granted
</Files>
</Directory>
WSGIDaemonProcess flu_user.insa.pt user=flu_user group=flu_user python-path=/usr/local/web_site/insaflu;/usr/lib/python3.<minor version of your python>/site-packages
WSGIProcessGroup flu_user.insa.pt
WSGIScriptAlias / /usr/local/web_site/insaflu/wsgi.py
# Use separate log files for the SSL virtual host; note that LogLevel
# is not inherited from httpd.conf.
ErrorLog /var/log/apache2/insaflu_error.log
TransferLog /var/log/apache2/insaflu_transfer.log
LogLevel warn
</VirtualHost>
$ sudo a2ensite insaflu.conf
$ sudo systemctl restart apache2
$ sudo systemctl status apache2
Go to your internet explorer and put this address http://127.0.0.1:80/admin/
Make the authentication with your superuser credentials and in AUTHENTICATION AND AUTHORIZATION
you can create new accounts.
You can remove the original fastq.gz files from system because they are not used anymore. The Trimmomatic result fastq files are the ones that are going to be used. You can can also remove files that belong to the samples, references, uploaded in batch and project samples that were deleted in web site by the users. This operation will save several GB in your hard drives.
To identify the files that can be removed:
$ cd <where your INSaFLU is installed>
$ python3 manage.py run_remove_files --only_identify_files true
A log file will be created with this information in /var/log/insaflu/remove_files.log
To remove the files permanently from file system: :warning: The files can't be recovered.
$ cd <where your INSaFLU is installed>
$ python3 manage.py run_remove_files --only_identify_files false
Tip:
You can create a cron job to run this task every week.