linux101

#01 Rename

cat ala.cds.fasta | awk '{ gsub(/alaSesamum_al;/, "ala_"); print }' > ala.cds.formatted.fasta

#02 split a multifasta file in each set with their corresponding header


cat yourfile.fasta | awk '{
if (substr($0, 1, 1)==">") {filename=(substr($0,2) ".fasta")}
print $0 > filename }'


#03 make a tab separated file as in put for PREP suite online tool

cat ind_accD.fasta | awk 'NR==2{print "indicum"}2' | awk 'NR==3{print "accD"}3' | awk 'NR==4{print "1"}4' | awk 'NR==5{print "0.8"}5'| awk '{if (NR!=1) {print}}' | awk -F'\n' '{$1=$1} 1' RS='\n\n' OFS='\t' > accD.rna.ready

Hello everyone.

Here is a bash script for long-reads assembly using CANU.

#!/bin/bash 
set -e export 
PATH='/home/raymond/devel/canu/canu/Linux-amd64/bin/':$PATH 
inputFile=$1 
outputDir=$2 
name='Epau' 
genomeSize=160kb 
threads=40 
gnuplotPath=/home/raymond/devel/gnuplot/install/bin/gnuplot 
echo $inputFile 
echo $outputDir 
canu \ -p $name \ 
-d $outputDir \ 
correctedErrorRate=$correctedErrorRate \ genomesize=$genomeSize \ 
-nanopore-raw $inputFile\
 gnuplot=$gnuplotPath \ 
useGrid=false \ maxThreads=$threads 

I would like to incorporate a command line in order to get CPU and memort usage every 5 mn from the begining to the end of the CANU assembly.

How could I get this info in a specific file as an output since CANU running deals with a bunch of multiple processes?

Thanks.

There are lots of different ways to attack this problem.

System stats are all in files in /proc, so to get memory usage you might parse /proc/meminfo. For CPU utilization /proc/loadavg might be what you want. Here is a link to some documentation. You could write a bash script to get these values and write to a file.

You could also add in a middle man and use the psutil package for Python and just use that

There are probably lots of other ways to do this as well

f its running as a systems process, firefox for example, use

top| grep firefox --line-buffered > outfile.txt I couldn't figure out how to get my cut to work in-line though, it seems the delimiters are scrambled after field 2.

answer depends on OS and environment to some extent-- is this a machine with few other users or a dedicated VM, or a shared host, and what's the OS/distribution you're running?

quick and dirty would be to run top in batch mode if that's available. Reports of the host's memory usage are available via free and vmstat -- those are useful to see if your process is eating all available memory and then failing. Unlikely with such a small genome.

Thank you for your reply. I am working on Centos 7 OS.

Probably easiest is calling top -b -n 99 -s 13 or something similar with parameters appropriate to your work.

If you are hoping for a machine-readable log and a way to summariza a proces tree by a piece of code, then more work will be necessary, but this will give you output good for human eyes to understand what's happening.

A4

git1 git2

online1 | Reponse 16

online