pex-tool/pex

Why does creating a virtualenv for torch take about 7 seconds?

zmanji opened this issue · 4 comments

This is a question about pex's performance when creating virtualenvs. The example below is extracted from a Dockerfile where I am trying to understand if I can speed up the slowest step.

I created a lockfile for torch by running:

PEX_SCRIPT=pex3 pex lock create -v --resolver-version pip-2020-resolver --pip-version 22.3 --no-build -o torch.lock --platform 'linux_x86_64-cp-311-cp311' --platform 'macosx_11_0_arm64-cp-311-cp311' --platform 'linux_x86_64-cp-310-cp310' 'torch==2.1.2'

Then later I create venv by running:

PEX_SCRIPT='pex3' pex venv create -v --force -d ./tenv  --no-build --lock torch.lock 'torch'

This takes about 7 seconds on my machine even when all of the wheels have been downloaded and extracted.

The output shows:

PEX_SCRIPT='pex3' pex venv create -v --force -d ./tenv  --no-build --lock torch.lock 'torch'
pex: Using cached artifact at /home/zmanji/.pex/downloads/57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c for FileArtifact(url='https://files.pythonhosted.org/packages/81/54/84d42a0bee35edba99dee7b59a8d4970eccdd44b99fe728ed912106fc781/filelock-3.13.1-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='57dbda9b35157b05fb3e58ee91448612eb674172fab98ee235ccb0b5bee19a1c'), verified=False, filename='filelock-3.13.1-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/3a871edd6c02dae77ad810335c0833391c1a4ce49af21ea8cf0f6a5d2096eea8 for FileArtifact(url='https://files.pythonhosted.org/packages/03/f1/13137340776dd5d5bcfd2574c9c6dfcc7618285035cd77240496e5c1a79b/torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='3a871edd6c02dae77ad810335c0833391c1a4ce49af21ea8cf0f6a5d2096eea8'), verified=False, filename='torch-2.1.2-cp310-cp310-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/d800d87f72189a745fa3d6b033b9dc4a34ad069f60ca60b943a63599f5501960 for FileArtifact(url='https://files.pythonhosted.org/packages/70/25/fab23259a52ece5670dcb8452e1af34b89e6135ecc17cd4b54b4b479eac6/fsspec-2023.12.2-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='d800d87f72189a745fa3d6b033b9dc4a34ad069f60ca60b943a63599f5501960'), verified=False, filename='fsspec-2023.12.2-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/6088930bfe239f0e6710546ab9c19c9ef35e29792895fed6e6e31a023a182a61 for FileArtifact(url='https://files.pythonhosted.org/packages/bc/c3/f068337a370801f372f2f8f6bad74a5c140f6fda3d9de154052708dd3c65/Jinja2-3.1.2-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='6088930bfe239f0e6710546ab9c19c9ef35e29792895fed6e6e31a023a182a61'), verified=False, filename='Jinja2-3.1.2-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/f18c69adc97877c42332c170849c96cefa91881c99a7cb3e95b7c659ebdc1ec2 for FileArtifact(url='https://files.pythonhosted.org/packages/d5/f0/8fbc882ca80cf077f1b246c0e3c3465f7f415439bdea6b899f6b19f61f70/networkx-3.2.1-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='f18c69adc97877c42332c170849c96cefa91881c99a7cb3e95b7c659ebdc1ec2'), verified=False, filename='networkx-3.2.1-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/ee53ccca76a6fc08fb9701aa95b6ceb242cdaab118c3bb152af4e579af792728 for FileArtifact(url='https://files.pythonhosted.org/packages/37/6d/121efd7382d5b0284239f4ab1fc1590d86d34ed4a4a2fdb13b30ca8e5740/nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='ee53ccca76a6fc08fb9701aa95b6ceb242cdaab118c3bb152af4e579af792728'), verified=False, filename='nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/339b385f50c309763ca65456ec75e17bbefcbbf2893f462cb8b90584cd27a1c2 for FileArtifact(url='https://files.pythonhosted.org/packages/b6/9f/c64c03f49d6fbc56196664d05dba14e3a561038a81a638eeb47f4d4cfd48/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='339b385f50c309763ca65456ec75e17bbefcbbf2893f462cb8b90584cd27a1c2'), verified=False, filename='nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/794e3948a1aa71fd817c3775866943936774d1c14e7628c74f6f7417224cdf56 for FileArtifact(url='https://files.pythonhosted.org/packages/86/94/eb540db023ce1d162e7bea9f8f5aa781d57c65aed513c33ee9a5123ead4d/nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='794e3948a1aa71fd817c3775866943936774d1c14e7628c74f6f7417224cdf56'), verified=False, filename='nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/5ccb288774fdfb07a7e7025ffec286971c06d8d7b4fb162525334616d7629ff9 for FileArtifact(url='https://files.pythonhosted.org/packages/ff/74/a2e2be7fb83aaedec84f391f082cf765dfb635e7caa9b49065f73e4835d8/nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='5ccb288774fdfb07a7e7025ffec286971c06d8d7b4fb162525334616d7629ff9'), verified=False, filename='nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/e54fde3983165c624cb79254ae9818a456eb6e87a7fd4d56a2352c24ee542d7e for FileArtifact(url='https://files.pythonhosted.org/packages/7e/00/6b218edd739ecfc60524e585ba8e6b00554dd908de2c9c66c1af3e44e18d/nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='e54fde3983165c624cb79254ae9818a456eb6e87a7fd4d56a2352c24ee542d7e'), verified=False, filename='nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/6e258468ddf5796e25f1dc591a31029fa317d97a0a94ed93468fc86301d61e40 for FileArtifact(url='https://files.pythonhosted.org/packages/eb/d5/c68b1d2cdfcc59e72e8a5949a37ddb22ae6cade80cd4a57a84d4c8b55472/nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='6e258468ddf5796e25f1dc591a31029fa317d97a0a94ed93468fc86301d61e40'), verified=False, filename='nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/f3b50f42cf363f86ab21f720998517a659a48131e8d538dc02f8768237bd884c for FileArtifact(url='https://files.pythonhosted.org/packages/65/5b/cfaeebf25cd9fdec14338ccb16f6b2c4c7fa9163aefcf057d86b9cc248bb/nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='f3b50f42cf363f86ab21f720998517a659a48131e8d538dc02f8768237bd884c'), verified=False, filename='nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/8a7ec542f0412294b15072fa7dab71d31334014a69f953004ea7a118206fe0dd for FileArtifact(url='https://files.pythonhosted.org/packages/bc/1d/8de1e5c67099015c834315e333911273a8c6aaba78923dd1d1e25fc5f217/nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='8a7ec542f0412294b15072fa7dab71d31334014a69f953004ea7a118206fe0dd'), verified=False, filename='nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/9d264c5036dde4e64f1de8c50ae753237c12e0b1348738169cd0f8a536c0e1e0 for FileArtifact(url='https://files.pythonhosted.org/packages/44/31/4890b1c9abc496303412947fc7dcea3d14861720642b49e8ceed89636705/nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='9d264c5036dde4e64f1de8c50ae753237c12e0b1348738169cd0f8a536c0e1e0'), verified=False, filename='nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/1a6c4acefcbebfa6de320f412bf7866de856e786e0462326ba1bac40de0b5e71 for FileArtifact(url='https://files.pythonhosted.org/packages/a4/05/23f8f38eec3d28e4915725b233c24d8f1a33cb6540a882f7b54be1befa02/nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='1a6c4acefcbebfa6de320f412bf7866de856e786e0462326ba1bac40de0b5e71'), verified=False, filename='nvidia_nccl_cu12-2.18.1-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/66439923a30d5d48399b08a9eae10370f6c261a5ec864a64983bae63152d39d7 for FileArtifact(url='https://files.pythonhosted.org/packages/4d/22/91a8af421c8a8902dde76e6ef3db01b258af16c53d81e8c0d0dc13900a9e/triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='66439923a30d5d48399b08a9eae10370f6c261a5ec864a64983bae63152d39d7'), verified=False, filename='triton-2.1.0-0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/dc21cf308ca5691e7c04d962e213f8a4aa9bbfa23d95412f452254c2caeb09e5 for FileArtifact(url='https://files.pythonhosted.org/packages/da/d3/8057f0587683ed2fcd4dbfbdfdfa807b9160b809976099d36b8f60d08f03/nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='dc21cf308ca5691e7c04d962e213f8a4aa9bbfa23d95412f452254c2caeb09e5'), verified=False, filename='nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/af72aea155e91adfc61c3ae9e0e342dbc0cba726d6cba4b6c72c1f34e47291cd for FileArtifact(url='https://files.pythonhosted.org/packages/b7/f4/6a90020cd2d93349b442bfcb657d0dc91eee65491600b2cb1d388bc98e6b/typing_extensions-4.9.0-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='af72aea155e91adfc61c3ae9e0e342dbc0cba726d6cba4b6c72c1f34e47291cd'), verified=False, filename='typing_extensions-4.9.0-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/c3588cd4295d0c0f603d0f2ae780587e64e2efeedb3521e46b9bb1d08d184fa5 for FileArtifact(url='https://files.pythonhosted.org/packages/d2/05/e6600db80270777c4a64238a98d442f0fd07cc8915be2a1c16da7f2b9e74/sympy-1.12-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='c3588cd4295d0c0f603d0f2ae780587e64e2efeedb3521e46b9bb1d08d184fa5'), verified=False, filename='sympy-1.12-py3-none-any.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/65c1a9bcdadc6c28eecee2c119465aebff8f7a584dd719facdd9e825ec61ab52 for FileArtifact(url='https://files.pythonhosted.org/packages/12/b3/d9ed2c0971e1435b8a62354b18d3060b66c8cb1d368399ec0b9baa7c0ee5/MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='65c1a9bcdadc6c28eecee2c119465aebff8f7a584dd719facdd9e825ec61ab52'), verified=False, filename='MarkupSafe-2.1.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/64335a8088e2b9d196ae8665430bc6a2b7e6ef2eb877a9c735c804bd4ff6467c for FileArtifact(url='https://files.pythonhosted.org/packages/1e/07/bf730d44c2fe1b676ad9cc2be5f5f861eb5d153fb6951987a2d6a96379a9/nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl', fingerprint=Fingerprint(algorithm='sha256', hash='64335a8088e2b9d196ae8665430bc6a2b7e6ef2eb877a9c735c804bd4ff6467c'), verified=False, filename='nvidia_nvjitlink_cu12-12.3.101-py3-none-manylinux1_x86_64.whl')
pex: Using cached artifact at /home/zmanji/.pex/downloads/a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c for FileArtifact(url='https://files.pythonhosted.org/packages/43/e3/7d92a15f894aa0c9c4b49b8ee9ac9850d6e63b03c9c32c0367a13ae62209/mpmath-1.3.0-py3-none-any.whl', fingerprint=Fingerprint(algorithm='sha256', hash='a0b2b9fe80bbcd81a6647ff13108738cfb482d481d826cc0e02f5b35e5c88d2c'), verified=False, filename='mpmath-1.3.0-py3-none-any.whl')
pex: Resolving distributions :: Resolving requirements from lock file torch.lock :: Building 0 artifacts and installing 22 :: Calculating project names for direct requirements:
  PyPIRequirement(line=LogicalLine(raw_text='torch', processed_text='torch', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<Spex: Resolving distributions :: Resolving requirements from lock file torch.lock :: Building 0 artifacts and installing 22 :: Installing 22 distributions                                                                                                                                                                                                                                                             pex: Resolving distributions: 3175.2ms                                                                                                                   
pex:   Parsing lock torch.lock: 31.1ms
pex:   Resolving requirements from lock file torch.lock: 3143.6ms
pex:     Parsing requirements: 0.5ms
pex:     Resolving urls to fetch for 1 requirements from lock torch.lock: 10.3ms
pex:     Hashing pex: 28.2ms
pex:     Isolating pex: 0.0ms
pex:     Downloading 22 distributions to satisfy 1 requirements: 8.5ms
pex:     Categorizing 22 downloaded artifacts: 0.2ms
pex:     Building 0 artifacts and installing 22: 3094.9ms
pex:       Calculating project names for direct requirements:
  PyPIRequirement(line=LogicalLine(raw_text='torch', processed_text='torch', source='<string>', start_line=1, end_line=1), requirement=Requirement(name='torch', url=None, extras=frozenset(), specifier=<SpecifierSet('')>, marker=None), editable=False): 0.0ms
pex:       Installing 22 distributions: 3026.8ms
pex: Re-writing /home/zmanji/tmp/tenv/bin/convert-caffe2-to-onnx
pex: Re-writing /home/zmanji/tmp/tenv/bin/convert-onnx-to-caffe2
pex: Re-writing /home/zmanji/tmp/tenv/bin/isympy
pex: Re-writing /home/zmanji/tmp/tenv/bin/torchrun
pex: Installing 22 wheels in venv at ./tenv: 3585.3ms

I see two seconds that appear to take some time:

Building 0 artifacts and installing 22: 3142.4ms

and

Installing 22 wheels in venv at ./tenv: 3650.0ms

Adding more -v flags doesn't shed more light on this. Is there any obvious reason why these steps would take multiple seconds?

For the first line I would expect that since the wheels are already installed to .pex/installed_wheels it should be a no-op.

For the second line since installing wheels into a venv is just creating hardlinks from the .pex/installed_wheels directory and updating the RECORD from the wheel I would expect it to be really fast.

For the second line since installing wheels into a venv is just creating hardlinks ...

Negative. When externalizing PEX never uses symlinks or hard links back into the PEX cache, it always copies. ... or it should. Digging a bit deeper here, but also, you appear to be missing updates from the last few releases that offer --no-pre-install-wheels and --max-install-jobs / PEX_MAX_INSTALL_JOBS to help deal with ridiculous distributions like Torch.

I got pretty much all of that wrong!

Ok, for the last line pex: Installing 22 wheels in venv at ./tenv: 3585.3ms - that is actually the time it takes to hardlink the site-packages tree. If I switch the symlink=False here to True I go from ~3.5s myself to ~30ms:
https://github.com/pantsbuild/pex/blob/a58848cbee4eff915a978e340c608fd9b50bfbb0/pex/cli/commands/venv.py#L311-L317

A quick experiment though shows that the times here are ~10x larger than they should be; so I need to drill into this aspect.

Next though is the pex: Building 0 artifacts and installing 22: 3094.9ms line. That appears to be all due to re-hashing as part of this optimization:
https://github.com/pantsbuild/pex/blob/7a2cee3126d05408252c8d0b103adaf13eff8fe6/pex/resolver.py#L433-L500

So I think both of these can likely be optimized, but I won't be back at a keyboard to dive in until ~January 4th.

Ok, so it turns out both of these cases of ~3s are all consumed in hashing distribution files. So the time taken at least makes sense now. This is hashing ~4GB of files x2. The trick is to see if the hashing can be avoided or amortized or parallelized.

Ok, #2315 addresses "Building 0 artifacts and installing 22: 3142.4ms" and takes it to ~10ms. There is still "Installing 22 wheels in venv at ./tenv: 3585.3ms" to improve. The issue there is the same, re-hashing all files in an installed wheel chroot, but instead of doing that to get a single chroot hash, it's being done to create a compliant RECORD for the venv (for interoperability; e.g. so you can run pip uninstall X in the venv) with a hashed / sized entry for each file in the wheel. I think I can just re-use the hash and size values from the original .whl file RECORD but I need to think through the chain of custody / security implications a bit before charging ahead there. I'll leave this issue open for that work.