lh3/biofast

shorter run time because of Crystal V1.2.1 Release

orangeSi opened this issue · 0 comments

Crystal lang had release V1.2.1 recently,so I tested fqcnt_cr1_klib.cr and bedcov_cr1_klib.cr(with nothing modify these two files) in my computer below showed:

   For fqcnt: the run time of plain txt is from 1.5s to 0.9s!But the time of gzip file is still 9s.
   For bedcov: g2r cost 6.9s instead of 8.8s,  r2g cost 10.8s instead of 14.8s!

Above shorter runtime maybe because of computer hardware difference OR Crystal version difference, so I rerun with biofast-bin-20200520-a7af6d8.tar.bz2 in my computer:

## fqcnt
$ hyperfine --warmup 3 ' ../biofast-bin-20200520-a7af6d8/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq'
Benchmark 1:  ../biofast-bin-20200520-a7af6d8/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):     891.0 ms ±  49.2 ms    [User: 649.7 ms, System: 213.5 ms]
  Range (min … max):   864.6 ms … 1028.0 ms    10 runs

$ hyperfine --warmup 3 ' ../biofast-bin-20200520-a7af6d8/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq.gz'
Benchmark 1:  ../biofast-bin-20200520-a7af6d8/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq.gz
  Time (mean ± σ):      8.938 s ±  0.042 s    [User: 8.719 s, System: 0.096 s]
  Range (min … max):    8.894 s …  9.041 s    10 runs

## bedcov
$ hyperfine --warmup 3 ' ../biofast-bin-20200520-a7af6d8/bedcov/bedcov_cr1_klib   ex-rna.bed ex-anno.bed  # g2r'
Benchmark 1:  ../biofast-bin-20200520-a7af6d8/bedcov/bedcov_cr1_klib   ex-rna.bed ex-anno.bed  # g2r
  Time (mean ± σ):      7.927 s ±  0.034 s    [User: 7.272 s, System: 0.525 s]
  Range (min … max):    7.878 s …  7.976 s    10 runs

$ hyperfine --warmup 3 ' ../biofast-bin-20200520-a7af6d8/bedcov/bedcov_cr1_klib  ex-anno.bed ex-rna.bed  # r2g'
Benchmark 1:  ../biofast-bin-20200520-a7af6d8/bedcov/bedcov_cr1_klib  ex-anno.bed ex-rna.bed  # r2g
  Time (mean ± σ):     17.731 s ±  0.069 s    [User: 14.906 s, System: 2.579 s]
  Range (min … max):   17.632 s … 17.810 s    10 runs

So with the latest Crystal V1.2.1(take the computer hardware difference into consideration):

      For fqcnt, cost more a little time.
      For bedcov, cost less a little time(especially for r2g).


Detail as below:

system and crystal version

$ lscpu|grep -E  'Model name|CPU family'
CPU family:          6
Model name:          Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz

$ cat /etc/os-release |grep PRETTY_NAME
PRETTY_NAME="Ubuntu 18.04.6 LTS"

$ crystal -v
Crystal 1.2.1 [4e6c0f26e] (2021-10-21)

LLVM: 10.0.0
$ git clone https://github.com/lh3/biofast.git

fqcnt with Crystal 1.2.1

$ crystal build fqcnt_cr1_klib.cr --release

$ ll biofast-data-v1/*fq
-rw-rw-r-- 1 ubuntu ubuntu 1396487030 Oct 23 10:50 biofast-data-v1/M_abscessus_HiSeq.fq

$ hyperfine --warmup 3 '~/biofast/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq'
Benchmark 1: ~/biofast/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):     968.0 ms ±   8.0 ms    [User: 743.2 ms, System: 206.8 ms]
  Range (min … max):   960.7 ms … 981.4 ms    10 runs

# update LLVM from V10 to V12 and then recompile  fqcnt_cr1_klib.cr
$ crystal_llvm12 build fqcnt_cr1_klib.cr -o fqcnt_cr1_klib_llvm12 --release

$ hyperfine --warmup 3 '~/biofast/fqcnt/fqcnt_cr1_klib_llvm12 biofast-data-v1/M_abscessus_HiSeq.fq'
Benchmark 1: ~/biofast/fqcnt/fqcnt_cr1_klib_llvm12 biofast-data-v1/M_abscessus_HiSeq.fq
  Time (mean ± σ):     931.0 ms ±   6.0 ms    [User: 716.9 ms, System: 197.2 ms]
  Range (min … max):   923.5 ms … 940.3 ms    10 runs


$ gzip biofast-data-v1/M_abscessus_HiSeq.fq
$ ll -sh biofast-data-v1/*gz
465M -rw-r--r-- 1 ubuntu ubuntu 465M May  4  2020 biofast-data-v1/M_abscessus_HiSeq.fq.gz

$ hyperfine --warmup 3 '~/biofast/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq.gz'
Benchmark 1: ~/biofast/fqcnt/fqcnt_cr1_klib biofast-data-v1/M_abscessus_HiSeq.fq.gz
  Time (mean ± σ):      9.100 s ±  0.068 s    [User: 8.853 s, System: 0.107 s]
  Range (min … max):    9.030 s …  9.259 s    10 runs

$ hyperfine --warmup 3 '~/biofast/fqcnt/fqcnt_cr1_klib_llvm12 biofast-data-v1/M_abscessus_HiSeq.fq.gz'
Benchmark 1: ~/biofast/fqcnt/fqcnt_cr1_klib_llvm12 biofast-data-v1/M_abscessus_HiSeq.fq.gz
  Time (mean ± σ):      9.082 s ±  0.023 s    [User: 8.848 s, System: 0.099 s]
  Range (min … max):    9.046 s …  9.119 s    10 runs

bedcov with Crystal 1.2.1

$ hyperfine --warmup 3 './bedcov_cr1_klib ex-rna.bed ex-anno.bed   # g2r'
Benchmark 1: ./bedcov_cr1_klib ex-rna.bed ex-anno.bed   # g2r
  Time (mean ± σ):      6.921 s ±  0.023 s    [User: 6.587 s, System: 0.222 s]
  Range (min … max):    6.887 s …  6.954 s    10 runs

$ hyperfine --warmup 3 './bedcov_cr1_klib_llvm12 ex-rna.bed ex-anno.bed   # g2r'
Benchmark 1: ./bedcov_cr1_klib_llvm12 ex-rna.bed ex-anno.bed   # g2r
  Time (mean ± σ):      6.827 s ±  0.047 s    [User: 6.501 s, System: 0.216 s]
  Range (min … max):    6.756 s …  6.943 s    10 runs


$ hyperfine --warmup 3 './bedcov_cr1_klib ex-anno.bed ex-rna.bed  # r2g'
Benchmark 1: ./bedcov_cr1_klib ex-anno.bed ex-rna.bed  # r2g
  Time (mean ± σ):     10.846 s ±  0.067 s    [User: 10.524 s, System: 0.139 s]
  Range (min … max):   10.739 s … 10.956 s    10 runs

$ hyperfine --warmup 3 './bedcov_cr1_klib_llvm12 ex-anno.bed ex-rna.bed  # r2g'
Benchmark 1: ./bedcov_cr1_klib_llvm12 ex-anno.bed ex-rna.bed  # r2g
  Time (mean ± σ):     10.637 s ±  0.166 s    [User: 10.339 s, System: 0.138 s]
  Range (min … max):   10.498 s … 11.079 s    10 runs