tweag/rules_nixpkgs

Python toolchain does not seem to find libstdc++ when building pip package

fagg opened this issue · 4 comments

fagg commented

Describe the bug
I am trying to build detectron2 from a pip package, and I have configured the following toolchains:

nixpkgs_cc_configure(
    name = "nixpkgs_config_cc",
    repository = "@nixpkgs",
    attribute_path = "gcc10",
)

nixpkgs_python_configure(
    python3_attribute_path = "python310",
    repository = "@nixpkgs",
)

I am able to successfully build pure Python code, as well as pure C++ code. However, when building detectron2 from pip, I consistently get errors related to libstdc++.

I have configured my pip_parse target as such:

pip_parse(
    name = "pip_deps",
    requirements_lock = "//third_party/python:requirements.txt",
    extra_pip_args = [
                   "--extra-index-url", "https://download.pytorch.org/whl/cu118",
    ],
    isolated = False,
    python_interpreter_target = "@nixpkgs_python_toolchain_python3//:bin/python",
)

Namely, I am building detectron2 @ git+https://github.com/fagg/detectron2 (this is my own vendored version that does nothing substantive other than add a pyproject.toml file).

Upon building @pip_deps_detectron2//:all, I get this error:

amalthea$ bazel build @pip_deps_detectron2//:all                                                                                                                                                       [150/1923]INFO: Repository pip_deps_detectron2 instantiated at:                                                                                                                                                              /home/ajf/perception/WORKSPACE:380:13: in <toplevel>                                                                                                                                                             /home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pip_deps/requirements.bzl:49:20: in install_deps                                                                                   Repository rule whl_library defined at:                                                                                                                                                                            /home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/rules_python/python/pip_install/pip_repository.bzl:744:30: in <toplevel>                                                           ERROR: An error occurred during the fetch of repository 'pip_deps_detectron2':                                                                                                                                      Traceback (most recent call last):                                                                                                                                                                                    File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/rules_python/python/pip_install/pip_repository.bzl", line 605, column 13, in _whl_library_impl                                         fail("whl_library %s failed: %s (%s) error code: '%s'" % (rctx.attr.name, result.stdout, result.stderr, result.return_code))                                                                     Error in fail: whl_library pip_deps_detectron2 failed: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu118                                                                       Collecting detectron2@ git+https://github.com/fagg/detectron2 (from -r /tmp/nix-shell.5lOsih/tmpyrg49nh_ (line 1))                                                                                                 Cloning https://github.com/fagg/detectron2 to /tmp/nix-shell.5lOsih/pip-wheel-eqlz_i2k/detectron2_db9b3bf07b754b798468d58b1bd4977a                                                                               Resolved https://github.com/fagg/detectron2 to commit dadf02f9b74afcd0a8369a2d416a45babe1703d3                                                                                                                   Installing build dependencies: started                                                                                                                                                                           Installing build dependencies: still running...                                                                                                                                                                  Installing build dependencies: finished with status 'done'                                                                                                                                                       Getting requirements to build wheel: started                                                                                                                                                                     Getting requirements to build wheel: finished with status 'error'                                                                                                                                               (  Running command git clone --filter=blob:none --quiet https://github.com/fagg/detectron2 /tmp/nix-shell.5lOsih/pip-wheel-eqlz_i2k/detectron2_db9b3bf07b754b798468d58b1bd4977a
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      Traceback (most recent call last):
        File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/external/pypi__pip/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 480, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 10, in <module>
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 234, in <module>
          _load_global_deps()
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 193, in _load_global_deps
          raise err
        File "/tmp/nix-shell.5lOsih/pip-build-env-ppr1abcs/overlay/lib/python3.10/site-packages/torch/__init__.py", line 174, in _load_global_deps
          ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
        File "/nix/store/rc9cz7z4qlgmsbwvpw2acig5g2rdws46-python3-3.10.5/lib/python3.10/ctypes/__init__.py", line 374, in __init__
          self._handle = _dlopen(self._name, mode)
      OSError: libstdc++.so.6: cannot open shared object file: No such file or directory
      [end of output]

Similarly, building torch is successful, however when I try to import it I get this error:

Traceback (most recent call last):
  File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/__main__/ml_stack/test_imports.py", line 2, in <module>
    import torch
  File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 234, in <module>
    _load_global_deps()
  File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 193, in _load_global_deps
    raise err
  File "/home/ajf/.cache/bazel/_bazel_ajf/fedd80db5d16551d9f1974a22f6fdf3c/execroot/__main__/bazel-out/k8-fastbuild/bin/ml_stack/test_imports.runfiles/pip_deps_torch/site-packages/torch/__init__.py", line 174, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/nix/store/rc9cz7z4qlgmsbwvpw2acig5g2rdws46-python3-3.10.5/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libstdc++.so.6: cannot open shared object file: No such file or directory
amalthea$

I suspect something is not configured right, but I just do not know what. Given that the C++ toolchain from nixpkgs seems to work correctly, I had assumed that building the bound code in a wheel would also work properly.

As some extra troubleshooting steps (after reading this ticket: #177), I added the following to my bazelrc:

build --host_platform=@rules_nixpkgs_core//platforms:host
build --crosstool_top=@nixpkgs_config_cc//:toolchain
build --action_env=BAZEL_DO_NOT_DETECT_CPP_TOOLCHAIN=1

As I wondered if it was picking up a system toolchain somehow (even though one is not even installed).

To Reproduce
Configure as above, and try to build detectron2 or torch.

Expected behavior
I had expected that building the relevant parts of detectron2 via the pip target would've had the environment passed through properly.

Environment
I am running Ubuntu 22.04 on WSL. Specifically I am using nixpkgs 23.11, with the gcc10stdenv.

http_archive(
    name = "io_tweag_rules_nixpkgs",
    sha256 = "980edfceef2e59e1122d9be6c52413bc298435f0a3d452532b8a48d7562ffd67",
    strip_prefix = "rules_nixpkgs-0.10.0",
    urls = ["https://github.com/tweag/rules_nixpkgs/releases/download/v0.10.0/rules_nixpkgs-0.10.0.tar.gz"],
)
amalthea$ nix --version
nix (Nix) 2.18.1

Here is my flake.lock:

{
  "nodes": {
    "flake-utils": {
      "inputs": {
        "systems": "systems"
      },
      "locked": {
        "lastModified": 1701680307,
        "narHash": "sha256-kAuep2h5ajznlPMD9rnQyffWG8EM/C73lejGofXvdM8=",
        "owner": "numtide",
        "repo": "flake-utils",
        "rev": "4022d587cbbfd70fe950c1e2083a02621806a725",
        "type": "github"
      },
      "original": {
        "owner": "numtide",
        "repo": "flake-utils",
        "type": "github"
      }
    },
    "flake-utils_2": {
      "locked": {
        "lastModified": 1659877975,
        "narHash": "sha256-zllb8aq3YO3h8B/U0/J1WBgAL8EX5yWf5pMj3G0NAmc=",
        "owner": "numtide",
        "repo": "flake-utils",
        "rev": "c0e246b9b83f637f4681389ecabcb2681b4f3af0",
        "type": "github"
      },
      "original": {
        "owner": "numtide",
        "repo": "flake-utils",
        "type": "github"
      }
    },
    "nixgl": {
      "inputs": {
        "flake-utils": "flake-utils_2",
        "nixpkgs": "nixpkgs"
      },
      "locked": {
        "lastModified": 1685908677,
        "narHash": "sha256-E4zUPEUFyVWjVm45zICaHRpfGepfkE9Z2OECV9HXfA4=",
        "owner": "guibou",
        "repo": "nixGL",
        "rev": "489d6b095ab9d289fe11af0219a9ff00fe87c7c5",
        "type": "github"
      },
      "original": {
        "owner": "guibou",
        "repo": "nixGL",
        "type": "github"
      }
    },
    "nixgl": {
      "inputs": {
        "flake-utils": "flake-utils_2",
        "nixpkgs": "nixpkgs"
      },
      "locked": {
        "lastModified": 1685908677,
        "narHash": "sha256-E4zUPEUFyVWjVm45zICaHRpfGepfkE9Z2OECV9HXfA4=",
        "owner": "guibou",
        "repo": "nixGL",
        "rev": "489d6b095ab9d289fe11af0219a9ff00fe87c7c5",
        "type": "github"
      },
      "original": {
        "owner": "guibou",
        "repo": "nixGL",
        "type": "github"
      }
    },
    "nixpkgs": {
      "locked": {
        "lastModified": 1660551188,
        "narHash": "sha256-a1LARMMYQ8DPx1BgoI/UN4bXe12hhZkCNqdxNi6uS0g=",
        "owner": "nixos",
        "repo": "nixpkgs",
        "rev": "441dc5d512153039f19ef198e662e4f3dbb9fd65",
        "type": "github"
      },
      "original": {
        "owner": "nixos",
        "repo": "nixpkgs",
        "type": "github"
      }
    },
    "nixpkgs_2": {
      "locked": {
        "lastModified": 1702233072,
        "narHash": "sha256-H5G2wgbim2Ku6G6w+NSaQaauv6B6DlPhY9fMvArKqRo=",
        "owner": "NixOS",
        "repo": "nixpkgs",
        "rev": "781e2a9797ecf0f146e81425c822dca69fe4a348",
        "type": "github"
      },
      "original": {
        "owner": "NixOS",
        "ref": "nixos-23.11",
        "repo": "nixpkgs",
        "type": "github"
      }
    },
    "root": {
      "inputs": {
        "flake-utils": "flake-utils",
        "nixgl": "nixgl",
        "nixpkgs": "nixpkgs_2"
      }
    },
    "systems": {
      "locked": {
        "lastModified": 1681028828,
        "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=",
        "owner": "nix-systems",
        "repo": "default",
        "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e",
        "type": "github"
      },
      "original": {
        "owner": "nix-systems",
        "repo": "default",
        "type": "github"
      }
    }
  },
  "root": "root",
  "version": 7
}

Additionally, here is my devshell derivation:


{ pkgs, myPkgSets }:

let
  packageSet = myPkgSets.basePkgs ++ myPkgSets.buildOnlyPkgs ++ myPkgSets.cudaPkgs ++ myPkgSets.miscPkgs;
in
pkgs.mkShell {
  name = "myDevShell";
  stdenv = pkgs.gcc10StdEnv;
  propagatedBuildInputs = packageSet;

  shellHook = ''
  "

(You will notice I am not setting LD_LIBRARY_PATH - and anything I have tried here has not worked, presumably because Bazel ignores it, or makes it worse (dynamic link failures)).

Where the package sets are:


  basePkgs = with pkgs; [
    gcc10Stdenv
    cacert
    coreutils
    gdal
    glfw
    libGL
    libGL.dev
    libGLU
    nix
    opencv4WithoutCuda
    proj
    xz
    zlib
  ];

  buildOnlyPkgs = with pkgs; [ bazel_6 gcc10 ];

  cudaPkgs = with pkgs; [
    cudaPackages.cudatoolkit
    cudaPackages.libcusparse
    cudaPackages.cudnn
    linuxPackages.nvidia_x11
  ];

  miscPkgs = with pkgs; [
    bazel-buildtools
    clang-tools
    gdb
    nixfmt
  ];

Additional context
This worked fine until I started using the nixpkg rules to bring in the compiler and interpreters directly from nixpkg. If I just let the flake make them available, and then let bazel automatically discover them as though they are a system toolchain (since the system does not have Python or gcc installed anyway they practically are), this works just fine.

Thank you for these rules. I am not new to Bazel, but am new to nix, so please let me know if I've omitted something that might be useful to you or anything that might help figure this out.

Thank you for the detailed report! pip_parse is a repository rule and toolchain information is not yet available when these are executed. Instead they will try to discover tools like compilers in the environment. I haven't done a deep dive into this, but, it's possible that this is causing the issue.

rules_nixpkgs has a way to import Python packages from Nix directly with nixpkgs_python_repository. Have you tried that approach?

fagg commented

Thank you for the detailed report! pip_parse is a repository rule and toolchain information is not yet available when these are executed. Instead they will try to discover tools like compilers in the environment. I haven't done a deep dive into this, but, it's possible that this is causing the issue.

rules_nixpkgs has a way to import Python packages from Nix directly with nixpkgs_python_repository. Have you tried that approach?

Thanks for your reply, Andreas.

I guess that would make sense. I was unaware that the pip rules were not toolchain aware (this seems counterintuitive, especially when they have an option that specifically allows you to pass in the Python interpreter). I can confirm that the Python interpreter being used by pip is the one provided by Nix. I wonder if we need to set some environment variables and pass them in via --action_env, I'll give this a try today and see what I come up with.

Your suggestion regarding nixpkgs_python_repository is also a good one. And maybe one I end up trying. The main problem I foresee is that we might be wanting to run packages that are not ported to nix, and pip would seem to avoid that. We shall see.

fagg commented

So nix was indeed the right way forward with this. There were far fewer packages missing than I anticipated, and those that are seem trivial to either vendor in-source or just port.

Even the complicated ones like torch-bin seem to work fine, and this avoids my problems with libstdc++ not being found.

Thanks for your help again.

@fagg Thanks for the update! I'm glad to hear that you were able to resolve the issue this way.

It sounds to me like this issue can be closed then. Feel free to reopen if I'm overlooking something.