desihub/speclite

speclite slow to import

Closed this issue · 13 comments

On both my laptop and at NERSC, speclite takes ~10x longer to import than other packages, even heavyweight ones like numpy. e.g. on my laptop:

[master*] speclite $ time python -c "import numpy"

real    0m0.197s
user    0m0.145s
sys 0m0.046s
[master*] speclite $ time python -c "import speclite"

real    0m1.763s
user    0m1.534s
sys 0m0.220s

At NERSC, speclite can take 5-10 seconds for a single process to import it.

Within speclite/__init__.py, the primary culprit is import filters, though from ._astropy_init import * is also fairly slow. Within filters.py, these astropy imports take much of the time:

import astropy.table
import astropy.units
import astropy.utils.data

Can anything be done to make this faster, e.g. lazy loading or using pkg_resources instead of astropy.utils.data routines?

This might be at the core of why our batch pipeline startup times are much slower than NERSC benchmarks indicate they should be, since desispec.io imports speclite.filters, while NERSC benchmarks are focused on importing numpy and scipy.

Thanks for raising this issue, which I was completely unaware of. The master branch no longer does import filters but this didn't get into v0.5. The remaining time is all due to _astropy_init:

In [1]: %time from speclite._astropy_init import *
CPU times: user 376 ms, sys: 72.7 ms, total: 449 ms
Wall time: 525 ms

In [2]: %time import speclite
CPU times: user 6 µs, sys: 0 ns, total: 6 µs
Wall time: 8.11 µs

I inherited the _astropy_init import from their affiliated pkg template, but I will investigate speeding it up.

I asked the astropy experts for advice at astropy/package-template#196.

It looks like the irreducible overhead is from the following import chain, which is triggered by the import of any astropy package:

astropy => astropy.logger => astropy.utils.console => astropy.units

On the positive side, you only have to pay this price once because of python's internal import logic, so import speclite is much faster once any other astropy package has been imported. For example:

In [1]: %time import astropy.table
CPU times: user 762 ms, sys: 118 ms, total: 880 ms
Wall time: 969 ms

In [2]: %time import speclite
CPU times: user 1.88 ms, sys: 5.88 ms, total: 7.76 ms
Wall time: 118 ms

This means that importing speclite should have negligible time impact in a module that also imports anything from astropy.

The remaining 118ms wall clock time (but only ~8ms CPU!) is spent getting the git hash via a subprocess call in speclite.version, so that might be a clue to the problems at NERSC @tskisner?

Could someone try running this profile on cori and compare with what I get on my laptop, to investigate the subprocess performance:

>>> import cProfile, pstats, speclite.version
>>> cProfile.run('speclite.version.get_git_devstr()', 'subproc_profile')
>>> pstats.Stats('subproc_profile').strip_dirs().sort_stats('cumulative').print_stats(15)
Wed Sep 28 10:39:48 2016    subproc_profile

         177 function calls in 0.020 seconds

   Ordered by: cumulative time
   List reduced from 83 to 15 due to restriction <15>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.020    0.020 <string>:1(<module>)
        1    0.000    0.000    0.020    0.020 version.py:54(get_git_devstr)
        1    0.000    0.000    0.020    0.020 version.py:98(run_git)
        1    0.000    0.000    0.011    0.011 subprocess.py:769(communicate)
        1    0.000    0.000    0.011    0.011 subprocess.py:1408(_communicate)
        1    0.000    0.000    0.011    0.011 subprocess.py:1441(_communicate_with_poll)
        2    0.011    0.006    0.011    0.006 {built-in method poll}
        1    0.000    0.000    0.008    0.008 subprocess.py:651(__init__)
        1    0.000    0.000    0.008    0.008 subprocess.py:1199(_execute_child)
        1    0.005    0.005    0.005    0.005 {posix.fork}
        4    0.002    0.001    0.002    0.001 {posix.read}
        2    0.000    0.000    0.002    0.001 subprocess.py:473(_eintr_retry_call)
        1    0.000    0.000    0.000    0.000 version.py:12(_decode_stdio)
        1    0.000    0.000    0.000    0.000 version.py:151(_get_repo_path)
        4    0.000    0.000    0.000    0.000 subprocess.py:1173(pipe_cloexec)

On edison, I get "all zeros" for those numbers:

[edison ~]$ module load desi-conda

[edison ~]$ python
Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016,
17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

import cProfile, pstats, speclite.version
cProfile.run('speclite.version.get_git_devstr()',
'subproc_profile')

pstats.Stats('subproc_profile').strip_dirs().sort_stats('cumulative').print_stats(15)
Wed Sep 28 10:58:07 2016 subproc_profile

     39 function calls in 0.000 seconds

Ordered by: cumulative time
List reduced from 27 to 15 due to restriction <15>

ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 0.000 0.000 :1()
1 0.000 0.000 0.000 0.000
version.py:54(get_git_devstr)
1 0.000 0.000 0.000 0.000
version.py:151(_get_repo_path)
3 0.000 0.000 0.000 0.000 {posix.stat}
1 0.000 0.000 0.000 0.000 genericpath.py:23(exists)
1 0.000 0.000 0.000 0.000 posixpath.py:358(abspath)
1 0.000 0.000 0.000 0.000 genericpath.py:34(isfile)
1 0.000 0.000 0.000 0.000 posixpath.py:329(normpath)
2 0.000 0.000 0.000 0.000 posixpath.py:120(dirname)
1 0.000 0.000 0.000 0.000 posixpath.py:61(join)
1 0.000 0.000 0.000 0.000 genericpath.py:46(isdir)
1 0.000 0.000 0.000 0.000 {posix.getcwd}
1 0.000 0.000 0.000 0.000 stat.py:49(S_ISREG)
1 0.000 0.000 0.000 0.000 posixpath.py:52(isabs)
1 0.000 0.000 0.000 0.000 {isinstance}
<pstats.Stats instance at 0x2aaabe4fea70>

So that is very fast ;-)

-Ted

On Wed, Sep 28, 2016 at 10:45 AM, David Kirkby
notifications@github.com wrote:

Could someone try running this profile on cori and compare with what
I get on my laptop, to investigate the subprocess performance:

import cProfile, pstats, speclite.version
cProfile.run('speclite.version.get_git_devstr()',
'subproc_profile')

pstats.Stats('subproc_profile').strip_dirs().sort_stats('cumulative').print_stats(15)
Wed Sep 28 10:39:48 2016 subproc_profile

     177 function calls in 0.020 seconds

Ordered by: cumulative time
List reduced from 83 to 15 due to restriction <15>

ncalls tottime percall cumtime percall
filename:lineno(function)
1 0.000 0.000 0.020 0.020 :1()
1 0.000 0.000 0.020 0.020
version.py:54(get_git_devstr)
1 0.000 0.000 0.020 0.020 version.py:98(run_git)
1 0.000 0.000 0.011 0.011
subprocess.py:769(communicate)
1 0.000 0.000 0.011 0.011
subprocess.py:1408(_communicate)
1 0.000 0.000 0.011 0.011
subprocess.py:1441(_communicate_with_poll)
2 0.011 0.006 0.011 0.006 {built-in method poll}
1 0.000 0.000 0.008 0.008
subprocess.py:651(init)
1 0.000 0.000 0.008 0.008
subprocess.py:1199(_execute_child)
1 0.005 0.005 0.005 0.005 {posix.fork}
4 0.002 0.001 0.002 0.001 {posix.read}
2 0.000 0.000 0.002 0.001
subprocess.py:473(_eintr_retry_call)
1 0.000 0.000 0.000 0.000
version.py:12(_decode_stdio)
1 0.000 0.000 0.000 0.000
version.py:151(_get_repo_path)
4 0.000 0.000 0.000 0.000
subprocess.py:1173(pipe_cloexec)

You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Ok, so creating a single subprocess from python shouldn't be a particular bottleneck on cori?

cori is offline, but edison and cori should be the same. The anaconda
stack is installed to /global/common, which is a fast disk and mounted
read-only on the compute nodes. I ran the requested test on an edison
login node. Spawning subprocesses from setup.py at install time is
fine.

However, if you are spawning subprocesses from inside a package, and
that package is being run by one process belonging to a larger mpi4py
workflow, then we have seen problems in some cases. We have also
written extensive test cases of doing this spawning where everything
works fine!

I'm assuming that the git commands to get the version only happen at
install time, correct? Installation is done outside the context of
MPI, so this should be fine.

-Ted

On Wed, Sep 28, 2016 at 11:17 AM, David Kirkby
notifications@github.com wrote:

Ok, so creating a single subprocess from python shouldn't be a
particular bottleneck on cori?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

No, this is at runtime when you first import the package into a python interpreter.

But speclite being run from an installed location, outside of a git
checkout... How can it be using git commands then? Guess I should go
look at the source.

On Wed, Sep 28, 2016 at 11:31 AM, David Kirkby
notifications@github.com wrote:

No, this is at runtime when you first import the package into a
python interpreter.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

Astropy packages enhance the usual version attribute with a githash attribute:

> import speclite
> speclite.__version__, speclite.__githash__
(u'0.6dev442.dev442', u'cce9b026494702db52792f44ff47df53ae508c9b')

The way this is implemented is that the import runs code that creates a subprocess to call git rev-parse HEAD (or something like that) and save the result. The relevant code is in speclite/version.py, which you won't find on github since it is created at install time.

If I understand that code correctly, it doesn't actually spawn a subprocess command unless it finds a .git directory. So for an installed copy it does spend time walking the directory tree but not spawning commands. This could still be a slowdown factor for N>>1 parallel processes at NERSC hitting the same disk.

However, I suspect that a "python setup.py develop" install would find a .git directory and spawn the git command, meaning that development installs of speclite could have very different python startup performance at NERSC for big jobs. speclite changes infrequently enough that we've put it in the stable core full installation suite so that usually shouldn't be an issue.

It looks like importing speclite should have a negligible impact on the desispec startup time, since desispec.io already imports 265 (!) astropy modules:

> import sys, desispec.io
> sorted(filter(lambda p: p.startswith('astropy'), sys.modules))
['astropy',
 'astropy.__future__',
 'astropy._compiler',
...]

I can confirm this empirically since commenting out the line import speclite.filters in desispec.filters does not change the time (~1.25s) required to import desispec.io on my laptop.

Closing since I think we agreed that speclite has no incremental startup penalty, beyond the overhead of importing any astropy pkg. Please re-open if there is new info.