Output not clear
carlos-sarabia opened this issue · 2 comments
Hi,
I have cloned the ngsPSMC into my computer and I have ran a test with the chr38 of a dog. As recommended in README.md, I ran:
$ ../../angsd/angsd -i dog.qatar.chr38.bam -dopsmc 1 -out chr38 -gl 1 -minq 20 -minmapq 30
with output filenames chr38.arg; chr38.psmc.gz, chr38.psmc.poz.gz, chr38.psmc.idx
After running ngsPSMC in this file as recommended in README.md:
../ngsPSMC chr38.psmc.idx -p "14+252+14+16"
-dospline 0 -nthreads 8 -nIter 20 -init 1
-theta 0.000233095 -rho 0.005357 > chr38.psmc.ml.txt
The file seems correct:
$ more chr38.psmc.ml.txt MM ../../ngsPSMC/ngsPSMC chr38.psmc.psmc.idx -p 1*4+25*2+1*4+1*6 -dospline 0 -nthreads 8 -nIt er 20 -init 1 -theta 0.000233095 -rho 0.005357 MM -> ngsPSMC version: v0.02-54-gbcda080 (htslib: 1.9) build(Oct 14 2019 15:57:42) MM TR is theta rho, theta in this context is given by the persite theta, and not the per wind ow theta, this is different from the original PSMC MM filereading took: (wall(min),cpu(min)):(0.050000,0.042487) RD 0 LK -30553205.977373 QD 0.000000 -> 0.000000 RI ? TR 0.000233 0.005357 MT 23.861429 MM buildhmm(wall(min),cpu(min)):(3.450000,3.458922) tk_l:64 RS 0 0.000000 1.000000 1000000.0 1000000.0 1000000.0 RS 1 0.009086 1.000000 1000000.0 1000000.0 1000000.0 RS 2 0.018998 1.000000 1000000.0 1000000.0 1000000.0 RS 3 0.029811 1.000000 1000000.0 1000000.0 1000000.0
But there is no specifications as what to do next. I tried to decompress chr38.psmc.gz into chr38.psmc and run psmc_plot.py:
$ python3.7 psmc_plot.py -psmc ../../test_ngspsmc/testchr38/chr38.psmc
Units: mutation rate = 1.25e-08 binsize = 100.0 N0 = 10000.0 generation time = 29.0
Argument should have 4 or 5 parameters.
Am I doing something wrong or is the code still under development?
Maybe I can directly use the psmc_plot.py code of PSMC (/utils/psmc_plot.py) to run in my chr38?
Thanks
Carlos
Hi Carlos,
there is short built-in help for psmc_plot.py. You should run it like
python3.7 psmc_plot.py -psmc chr38 ../../test_ngspsmc/testchr38/chr38.psmc 0 0
Here chr38 will be the name used for the legend on your plot. First 0 means that your sample is modern (e.g. for ancient samples you can provide time to shift your trajectory relatively to modern samples). The second 0 is the loss of heterozygosity (not required for ngsPSMC, but it is needed for low/medium coverage samples with original PSMC).
Dear Vladimir,
Thank you very much for your comment! I just tried it as you recommended me and this is the output:
python3.5 psmc_plot.py -psmc chr38 ../../tests/test_ngspsmc/testchr38/chr38.psmc 0 0
Units: mutation rate = 1.25e-08 binsize = 100.0 N0 = 10000.0 generation time = 29.0
Traceback (most recent call last):
File "psmc_plot.py", line 219, in
ReadPSMC(psmcs)
File "psmc_plot.py", line 137, in ReadPSMC
data = ReadPSMCFile(psmc.file, psmc.rd)
File "psmc_plot.py", line 98, in ReadPSMCFile
for line in f:
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 9: invalid start byte
I wonder if there was some incorrect step when I decompressed my psmc.gz or if I did another mistake.
Thanks and best regards
Carlos