Nanocorr Error correction for oxford nanopore reads Requires: Blast to be in path SGE or similar scheduler Installation: Clone the repository to a shared filesysem on a cluster >git clone https://github.com/jgurtowski/nanocorr >cd nanocorr Create a virtual environment to install python dependencies >virtualenv nanocorr_ve >source nanocorr_ve/bin/activate install the following packages using pip: pip install git+https://github.com/cython/cython pip install numpy pip install h5py pip install git+https://github.com/jgurtowski/pbcore_python pip install git+https://github.com/jgurtowski/pbdagcon_python pip install git+https://github.com/jgurtowski/jbio pip install git+https://github.com/jgurtowski/jptools #Finally install the nanocorr package itself > python setup.py install Running: Make sure you are in the virtualenv >source nanocorr/nanocorr_ve/bin/activate Partition your reads for distributed processing >python partition.py 100 500 nanopore_reads.fa A series of directories will be created by the partitioning [0001,0002,...]. In each directory run the nanocorr.py script on SGE or similar system that sets SGE_TASK_ID environment variable. Set the -t parameter to the number of files in the directory. >qsub -cwd -v PATH,LD_LIBRARY_PATH -t 1:500 -j y -o nanocorr_out /path/to/nanocorr.py query.fa reference.fa The query file will be "blasted" against each previously partitioned read. This query file can be anything useful for correction. Illumina data is what is used right now. The corrected reads will be in the resulting "fa" files in the partition directories. If you supply a reference genome, the corrected reads will be blasted against that and a ".refblast6.q" file will be created for each partition. This will be the corrected reads aligned to the reference. Just make sure the blast db has been created for the reference. Non-SGE Environment: If you don't have SGE installed you can use GNU parallel to run nanocorr on a single machine. Although not the recommended method, as alignment can be very compute intensive, for small genomes (bacteria), this method can be tractable. For each of the directories created by the partition script (0001..000N), cd into the directory and run: $>for j in {1..500}; do echo "SGE_TASK_ID=$j TMPDIR=/tmp nanocorr.py query.fa reference.fa"; done | parallel -j <# of compute cores>