Python toolchain does not seem to find libstdc++ when building pip package
fagg opened this issue · 4 comments
Describe the bug
I am trying to build detectron2 from a pip package, and I have configured the following toolchains:
nixpkgs_cc_configure(
name = "nixpkgs_config_cc",
repository = "@nixpkgs",
attribute_path = "gcc10",
)
nixpkgs_python_configure(
python3_attribute_path = "python310",
repository = "@nixpkgs",
)
I am able to successfully build pure Python code, as well as pure C++ code. However, when building detectron2 from pip, I consistently get errors related to libstdc++.
I have configured my pip_parse target as such:
pip_parse(
name = "pip_deps",
requirements_lock = "//third_party/python:requirements.txt",
extra_pip_args = [
"--extra-index-url", "https://download.pytorch.org/whl/cu118",
],
isolated = False,
python_interpreter_target = "@nixpkgs_python_toolchain_python3//:bin/python",
)
Namely, I am building detectron2 @ git+https://github.com/fagg/detectron2
(this is my own vendored version that does nothing substantive other than add a pyproject.toml
file).
Upon building @pip_deps_detectron2//:all
, I get this error:
amalthea$ bazel build @pip_deps_detectron2//:all [150/1923]INFO: Repository pip_deps_detectron2 instantiated at: /home/ajf/perception/WORKSPACE:380:13: in <toplevel> /home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pip_deps/requirements.bzl:49:20: in install_deps Repository rule whl_library defined at: /home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/rules_python/python/pip_install/pip_repository.bzl:744:30: in <toplevel> ERROR: An error occurred during the fetch of repository 'pip_deps_detectron2': Traceback (most recent call last): File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/rules_python/python/pip_install/pip_repository.bzl", line 605, column 13, in _whl_library_impl fail("whl_library %s failed: %s (%s) error code: '%s'" % (rctx.attr.name, result.stdout, result.stderr, result.return_code)) Error in fail: whl_library pip_deps_detectron2 failed: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118 Collecting detectron2@ git+https://github.com/fagg/detectron2 (from -r /tmp/nix-shell.5lOsih/tmpyrg49nh_ (line 1)) Cloning https://github.com/fagg/detectron2 to /tmp/nix-shell.5lOsih/pip-wheel-eqlz_i2k/detectron2_db9b3bf07b754b798468d58b1bd4977a Resolved https://github.com/fagg/detectron2 to commit dadf02f9b74afcd0a8369a2d416a45babe1703d3 Installing build dependencies: started Installing build dependencies: still running... Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'error' ( Running command git clone --filter=blob:none --quiet https://github.com/fagg/detectron2 /tmp/nix-shell.5lOsih/pip-wheel-eqlz_i2k/detectron2_db9b3bf07b754b798468d58b1bd4977a
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [25 lines of output]
Traceback (most recent call last):
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 480, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
exec(code, locals())
File "<string>", line 10, in <module>
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 234, in <module>
_load_global_deps()
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 193, in _load_global_deps
raise err
File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/nix/store/rc9cz7z4qlgmsbwvpw2acig5g2rdws46-python3-3.10.5/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libstdc++.so.6: cannot open shared object file: No such file or directory
[end of output]
Similarly, building torch is successful, however when I try to import it I get this error:
Traceback (most recent call last):
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/__main__/ml_stack/test_imports.py", line 2, in <module>
import torch
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 234, in <module>
_load_global_deps()
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 193, in _load_global_deps
raise err
File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 174, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/nix/store/rc9cz7z4qlgmsbwvpw2acig5g2rdws46-python3-3.10.5/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libstdc++.so.6: cannot open shared object file: No such file or directory
amalthea$
I suspect something is not configured right, but I just do not know what. Given that the C++ toolchain from nixpkgs seems to work correctly, I had assumed that building the bound code in a wheel would also work properly.
As some extra troubleshooting steps (after reading this ticket: #177), I added the following to my bazelrc:
build --host_platform=@rules_nixpkgs_core//platforms:host
build --crosstool_top=@nixpkgs_config_cc//:toolchain
build --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1
As I wondered if it was picking up a system toolchain somehow (even though one is not even installed).
To Reproduce
Configure as above, and try to build detectron2 or torch.
Expected behavior
I had expected that building the relevant parts of detectron2 via the pip target would've had the environment passed through properly.
Environment
I am running Ubuntu 22.04 on WSL. Specifically I am using nixpkgs 23.11, with the gcc10stdenv.
http_archive(
name = "io_tweag_rules_nixpkgs",
sha256 = "980edfceef2e59e1122d9be6c52413bc298435f0a3d452532b8a48d7562ffd67",
strip_prefix = "rules_nixpkgs-0.10.0",
urls = ["https://github.com/tweag/rules_nixpkgs/releases/download/v0.10.0/rules_nixpkgs-0.10.0.tar.gz"],
)
amalthea$ nix --version
nix (Nix) 2.18.1
Here is my flake.lock:
{
"nodes": {
"flake-utils": {
"inputs": {
"systems": "systems"
},
"locked": {
"lastModified": 1701680307,
"narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"flake-utils_2": {
"locked": {
"lastModified": 1659877975,
"narHash": "sha256-zllb8aq3YO3h8B/U0/J1WBgAL8EX5yWf5pMj3G0NAmc=",
"owner": "numtide",
"repo": "flake-utils",
"rev": "c0e246b9b83f637f4681389ecabcb2681b4f3af0",
"type": "github"
},
"original": {
"owner": "numtide",
"repo": "flake-utils",
"type": "github"
}
},
"nixgl": {
"inputs": {
"flake-utils": "flake-utils_2",
"nixpkgs": "nixpkgs"
},
"locked": {
"lastModified": 1685908677,
"narHash": "sha256-E4zUPEUFyVWjVm45zICaHRpfGepfkE9Z2OECV9HXfA4=",
"owner": "guibou",
"repo": "nixGL",
"rev": "489d6b095ab9d289fe11af0219a9ff00fe87c7c5",
"type": "github"
},
"original": {
"owner": "guibou",
"repo": "nixGL",
"type": "github"
}
},
"nixgl": {
"inputs": {
"flake-utils": "flake-utils_2",
"nixpkgs": "nixpkgs"
},
"locked": {
"lastModified": 1685908677,
"narHash": "sha256-E4zUPEUFyVWjVm45zICaHRpfGepfkE9Z2OECV9HXfA4=",
"owner": "guibou",
"repo": "nixGL",
"rev": "489d6b095ab9d289fe11af0219a9ff00fe87c7c5",
"type": "github"
},
"original": {
"owner": "guibou",
"repo": "nixGL",
"type": "github"
}
},
"nixpkgs": {
"locked": {
"lastModified": 1660551188,
"narHash": "sha256-a1LARMMYQ8DPx1BgoI/UN4bXe12hhZkCNqdxNi6uS0g=",
"owner": "nixos",
"repo": "nixpkgs",
"rev": "441dc5d512153039f19ef198e662e4f3dbb9fd65",
"type": "github"
},
"original": {
"owner": "nixos",
"repo": "nixpkgs",
"type": "github"
}
},
"nixpkgs_2": {
"locked": {
"lastModified": 1702233072,
"narHash": "sha256-H5G2wgbim2Ku6G6w+NSaQaauv6B6DlPhY9fMvArKqRo=",
"owner": "NixOS",
"repo": "nixpkgs",
"rev": "781e2a9797ecf0f146e81425c822dca69fe4a348",
"type": "github"
},
"original": {
"owner": "NixOS",
"ref": "nixos-23.11",
"repo": "nixpkgs",
"type": "github"
}
},
"root": {
"inputs": {
"flake-utils": "flake-utils",
"nixgl": "nixgl",
"nixpkgs": "nixpkgs_2"
}
},
"systems": {
"locked": {
"lastModified": 1681028828,
"narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
"owner": "nix-systems",
"repo": "default",
"rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
"type": "github"
},
"original": {
"owner": "nix-systems",
"repo": "default",
"type": "github"
}
}
},
"root": "root",
"version": 7
}
Additionally, here is my devshell derivation:
{ pkgs, myPkgSets }:
let
packageSet = myPkgSets.basePkgs ++ myPkgSets.buildOnlyPkgs ++ myPkgSets.cudaPkgs ++ myPkgSets.miscPkgs;
in
pkgs.mkShell {
name = "myDevShell";
stdenv = pkgs.gcc10StdEnv;
propagatedBuildInputs = packageSet;
shellHook = ''
"
(You will notice I am not setting LD_LIBRARY_PATH - and anything I have tried here has not worked, presumably because Bazel ignores it, or makes it worse (dynamic link failures)).
Where the package sets are:
basePkgs = with pkgs; [
gcc10Stdenv
cacert
coreutils
gdal
glfw
libGL
libGL.dev
libGLU
nix
opencv4WithoutCuda
proj
xz
zlib
];
buildOnlyPkgs = with pkgs; [ bazel_6 gcc10 ];
cudaPkgs = with pkgs; [
cudaPackages.cudatoolkit
cudaPackages.libcusparse
cudaPackages.cudnn
linuxPackages.nvidia_x11
];
miscPkgs = with pkgs; [
bazel-buildtools
clang-tools
gdb
nixfmt
];
Additional context
This worked fine until I started using the nixpkg rules to bring in the compiler and interpreters directly from nixpkg. If I just let the flake make them available, and then let bazel automatically discover them as though they are a system toolchain (since the system does not have Python or gcc installed anyway they practically are), this works just fine.
Thank you for these rules. I am not new to Bazel, but am new to nix, so please let me know if I've omitted something that might be useful to you or anything that might help figure this out.
Thank you for the detailed report! pip_parse
is a repository rule and toolchain information is not yet available when these are executed. Instead they will try to discover tools like compilers in the environment. I haven't done a deep dive into this, but, it's possible that this is causing the issue.
rules_nixpkgs has a way to import Python packages from Nix directly with nixpkgs_python_repository
. Have you tried that approach?
Thank you for the detailed report!
pip_parse
is a repository rule and toolchain information is not yet available when these are executed. Instead they will try to discover tools like compilers in the environment. I haven't done a deep dive into this, but, it's possible that this is causing the issue.rules_nixpkgs has a way to import Python packages from Nix directly with
nixpkgs_python_repository
. Have you tried that approach?
Thanks for your reply, Andreas.
I guess that would make sense. I was unaware that the pip rules were not toolchain aware (this seems counterintuitive, especially when they have an option that specifically allows you to pass in the Python interpreter). I can confirm that the Python interpreter being used by pip is the one provided by Nix. I wonder if we need to set some environment variables and pass them in via --action_env
, I'll give this a try today and see what I come up with.
Your suggestion regarding nixpkgs_python_repository
is also a good one. And maybe one I end up trying. The main problem I foresee is that we might be wanting to run packages that are not ported to nix, and pip would seem to avoid that. We shall see.
So nix was indeed the right way forward with this. There were far fewer packages missing than I anticipated, and those that are seem trivial to either vendor in-source or just port.
Even the complicated ones like torch-bin
seem to work fine, and this avoids my problems with libstdc++ not being found.
Thanks for your help again.