NVIDIA/NeMo-Curator

Unable to install[BUG]

mike2463 opened this issue · 3 comments

Describe the bug

A clear and concise description of what the bug is.
I have followed the install instructions and I keep getting an error.
Steps/Code to reproduce bug
Try to install the library
Please list minimal steps or code snippet for us to be able to reproduce the bug.
After the failure I get this message, and as far as I know I have everything correct for the install
Here is some debug information about your platform to include in any bug
report:

  Python Version: CPython 3.10.14
  Operating System: Windows 10
  CPU Architecture: AMD64
  Driver Version: 546.59
  CUDA Version: 12.3

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.

Expected behavior

A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)
Installing on Dell PC with Nividia GPU

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider - AWS, Azure, GCP, Collab)]
  • Method of NeMo-Curator install: [pip install or from source]. Please specify exact commands you used to install.
  • If method of install is [Docker], provide docker pull & docker run commands used

Environment details

If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:

  • OS version
  • Dask version
  • Python version

Additional context

Add any other context about the problem here.

Hello!

I have followed the install instructions and I keep getting an error.

Could you please list out the commands that you ran, and the point at which you receive the error? Please paste the full error log here as well.

(nemo) C:\Users\micha\Documents>git clone https://github.com/NVIDIA/NeMo-Curator.git
Cloning into 'NeMo-Curator'...
remote: Enumerating objects: 860, done.
remote: Counting objects: 100% (372/372), done.
remote: Compressing objects: 100% (202/202), done.
remote: Total 860 (delta 268), reused 211 (delta 170), pack-reused 488Receiving objects: 99% (852/860), 492.00 KiB | 95Receiving objects: 100% (860/860), 492.00 KiB | 95.00 KiB/s
Receiving objects: 100% (860/860), 540.29 KiB | 88.00 KiB/s, done.
Resolving deltas: 100% (530/530), done.

(nemo) C:\Users\micha\Documents>cd NeMo-Curator

(nemo) C:\Users\micha\Documents\NeMo-Curator>pip install --extra-index-url https://pypi.nvidia.com ".[cuda12x]"
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Processing c:\users\micha\documents\nemo-curator
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting mwparserfromhell@ git+https://github.com/earwig/mwparserfromhell.git@0f89f44 (from nemo_curator==0.2.0)
Cloning https://github.com/earwig/mwparserfromhell.git (to revision 0f89f44) to c:\users\micha\appdata\local\temp\pip-install-oclk1au1\mwparserfromhell_302bb5bc64944e77bb52a29efef2191d
Running command git clone --filter=blob:none --quiet https://github.com/earwig/mwparserfromhell.git 'C:\Users\micha\AppData\Local\Temp\pip-install-oclk1au1\mwparserfromhell_302bb5bc64944e77bb52a29efef2191d'
WARNING: Did not find branch or tag '0f89f44', assuming revision or ref.
Running command git checkout -q 0f89f44
Resolved https://github.com/earwig/mwparserfromhell.git to commit 0f89f44
Preparing metadata (setup.py) ... done
Collecting crossfit@ git+https://github.com/rapidsai/crossfit.git@1ee3de4 (from nemo_curator==0.2.0)
Cloning https://github.com/rapidsai/crossfit.git (to revision 1ee3de4) to c:\users\micha\appdata\local\temp\pip-install-oclk1au1\crossfit_e74a9ddbd26e4934a17dbc772b179861
Running command git clone --filter=blob:none --quiet https://github.com/rapidsai/crossfit.git 'C:\Users\micha\AppData\Local\Temp\pip-install-oclk1au1\crossfit_e74a9ddbd26e4934a17dbc772b179861'
WARNING: Did not find branch or tag '1ee3de4', assuming revision or ref.
Running command git checkout -q 1ee3de4
Resolved https://github.com/rapidsai/crossfit.git to commit 1ee3de4
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Collecting dask>=2021.7.1 (from dask[complete]>=2021.7.1->nemo_curator==0.2.0)
Downloading dask-2024.5.1-py3-none-any.whl.metadata (3.8 kB)
Collecting distributed>=2021.7.1 (from nemo_curator==0.2.0)
Downloading distributed-2024.5.1-py3-none-any.whl.metadata (3.4 kB)
Collecting dask-mpi>=2021.11.0 (from nemo_curator==0.2.0)
Downloading dask_mpi-2022.4.0-py3-none-any.whl.metadata (2.0 kB)
Collecting charset-normalizer>=3.1.0 (from nemo_curator==0.2.0)
Using cached charset_normalizer-3.3.2-cp310-cp310-win_amd64.whl.metadata (34 kB)
Collecting awscli>=1.22.55 (from nemo_curator==0.2.0)
Downloading awscli-1.32.110-py3-none-any.whl.metadata (11 kB)
Collecting fasttext==0.9.2 (from nemo_curator==0.2.0)
Downloading fasttext-0.9.2.tar.gz (68 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.8/68.8 kB 340.8 kB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pycld2==0.41 (from nemo_curator==0.2.0)
Downloading pycld2-0.41.tar.gz (41.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.4/41.4 MB 1.3 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting justext==3.0.0 (from nemo_curator==0.2.0)
Downloading jusText-3.0.0-py2.py3-none-any.whl.metadata (6.8 kB)
Collecting ftfy==6.1.1 (from nemo_curator==0.2.0)
Downloading ftfy-6.1.1-py3-none-any.whl.metadata (6.1 kB)
Collecting warcio==1.7.4 (from nemo_curator==0.2.0)
Downloading warcio-1.7.4-py2.py3-none-any.whl.metadata (15 kB)
Collecting zstandard==0.18.0 (from nemo_curator==0.2.0)
Downloading zstandard-0.18.0-cp310-cp310-win_amd64.whl.metadata (2.6 kB)
Collecting in-place==0.5.0 (from nemo_curator==0.2.0)
Downloading in_place-0.5.0-py3-none-any.whl.metadata (9.2 kB)
Collecting unidic-lite==1.0.8 (from nemo_curator==0.2.0)
Downloading unidic-lite-1.0.8.tar.gz (47.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 47.4/47.4 MB 1.6 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting jieba==0.42.1 (from nemo_curator==0.2.0)
Downloading jieba-0.42.1.tar.gz (19.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.2/19.2 MB 1.4 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting comment-parser (from nemo_curator==0.2.0)
Downloading comment_parser-1.2.4.tar.gz (8.3 kB)
Preparing metadata (setup.py) ... done
Collecting beautifulsoup4 (from nemo_curator==0.2.0)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting spacy<4.0.0,>=3.6.0 (from nemo_curator==0.2.0)
Using cached spacy-3.7.4-cp310-cp310-win_amd64.whl.metadata (27 kB)
Collecting presidio-analyzer==2.2.351 (from nemo_curator==0.2.0)
Downloading presidio_analyzer-2.2.351-py3-none-any.whl.metadata (2.5 kB)
Collecting presidio-anonymizer==2.2.351 (from nemo_curator==0.2.0)
Downloading presidio_anonymizer-2.2.351-py3-none-any.whl.metadata (8.0 kB)
Collecting usaddress==0.5.10 (from nemo_curator==0.2.0)
Downloading usaddress-0.5.10-py2.py3-none-any.whl.metadata (1.5 kB)
Collecting nemo-toolkit>=1.23.0 (from nemo-toolkit[nlp]>=1.23.0->nemo_curator==0.2.0)
Downloading nemo_toolkit-1.23.0-py3-none-any.whl.metadata (18 kB)
Collecting lxml[html_clean] (from nemo_curator==0.2.0)
Downloading lxml-5.2.2-cp310-cp310-win_amd64.whl.metadata (3.5 kB)
Collecting pybind11>=2.2 (from fasttext==0.9.2->nemo_curator==0.2.0)
Using cached pybind11-2.12.0-py3-none-any.whl.metadata (9.5 kB)
Requirement already satisfied: setuptools>=0.7.0 in c:\users\micha\anaconda3\envs\nemo\lib\site-packages (from fasttext==0.9.2->nemo_curator==0.2.0) (69.5.1)
Collecting numpy (from fasttext==0.9.2->nemo_curator==0.2.0)
Using cached numpy-1.26.4-cp310-cp310-win_amd64.whl.metadata (61 kB)
Collecting wcwidth>=0.2.5 (from ftfy==6.1.1->nemo_curator==0.2.0)
Downloading wcwidth-0.2.13-py2.py3-none-any.whl.metadata (14 kB)
Collecting regex (from presidio-analyzer==2.2.351->nemo_curator==0.2.0)
Downloading regex-2024.5.15-cp310-cp310-win_amd64.whl.metadata (41 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 42.0/42.0 kB 991.7 kB/s eta 0:00:00
Collecting tldextract (from presidio-analyzer==2.2.351->nemo_curator==0.2.0)
Downloading tldextract-5.1.2-py3-none-any.whl.metadata (11 kB)
Collecting pyyaml (from presidio-analyzer==2.2.351->nemo_curator==0.2.0)
Using cached PyYAML-6.0.1-cp310-cp310-win_amd64.whl.metadata (2.1 kB)
Collecting phonenumbers<9.0.0,>=8.12 (from presidio-analyzer==2.2.351->nemo_curator==0.2.0)
Downloading phonenumbers-8.13.37-py2.py3-none-any.whl.metadata (11 kB)
Collecting pycryptodome>=3.10.1 (from presidio-anonymizer==2.2.351->nemo_curator==0.2.0)
Downloading pycryptodome-3.20.0-cp35-abi3-win_amd64.whl.metadata (3.4 kB)
Collecting future>=0.14 (from usaddress==0.5.10->nemo_curator==0.2.0)
Downloading future-1.0.0-py3-none-any.whl.metadata (4.0 kB)
Collecting probableparsing (from usaddress==0.5.10->nemo_curator==0.2.0)
Downloading probableparsing-0.0.1-py2.py3-none-any.whl.metadata (908 bytes)
Collecting python-crfsuite>=0.7 (from usaddress==0.5.10->nemo_curator==0.2.0)
Downloading python_crfsuite-0.9.10-cp310-cp310-win_amd64.whl.metadata (4.3 kB)
Collecting six (from warcio==1.7.4->nemo_curator==0.2.0)
Using cached six-1.16.0-py2.py3-none-any.whl.metadata (1.8 kB)
Collecting cudf-cu12>=24.2 (from nemo_curator==0.2.0)
Downloading cudf_cu12-24.4.1.tar.gz (2.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [51 lines of output]
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
Traceback (most recent call last):
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
RuntimeError: Didn't find wheel for cudf-cu12 24.4.1

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
      main()
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 152, in prepare_metadata_for_build_wheel
      whl_basename = backend.build_wheel(metadata_directory, config_settings)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\buildapi.py", line 29, in build_wheel
      return download_wheel(pathlib.Path(wheel_directory), config_settings)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 149, in download_wheel
      report_install_failure(distribution, version, exception_context)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-qza6frh0\overlay\Lib\site-packages\nvidia_stub\error.py", line 63, in report_install_failure
      raise InstallFailedError(
  nvidia_stub.error.InstallFailedError:
  *******************************************************************************

  The installation of cudf-cu12 for version 24.4.1 failed.

  This is a special placeholder package which downloads a real wheel package
  from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
  cannot download the real wheel file to install.

  You might try installing this package via
  ```
  $ pip install --extra-index-url https://pypi.nvidia.com cudf-cu12
  ```

  Here is some debug information about your platform to include in any bug
  report:

  Python Version: CPython 3.10.14
  Operating System: Windows 10
  CPU Architecture: AMD64
  Driver Version: 546.59
  CUDA Version: 12.3

  *******************************************************************************

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

(nemo) C:\Users\micha\Documents\NeMo-Curator>pip install --extra-index-url https://pypi.nvidia.com cudf-cu12
Looking in indexes: https://pypi.org/simple, https://pypi.nvidia.com
Collecting cudf-cu12
Using cached cudf_cu12-24.4.1.tar.gz (2.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... error
error: subprocess-exited-with-error

× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [51 lines of output]
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
Traceback (most recent call last):
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 147, in download_wheel
return download_manual(wheel_directory, distribution, version)
File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 114, in download_manual
raise RuntimeError(f"Didn't find wheel for {distribution} {version}")
RuntimeError: Didn't find wheel for cudf-cu12 24.4.1

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
      main()
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "C:\Users\micha\anaconda3\envs\nemo\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 152, in prepare_metadata_for_build_wheel
      whl_basename = backend.build_wheel(metadata_directory, config_settings)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\buildapi.py", line 29, in build_wheel
      return download_wheel(pathlib.Path(wheel_directory), config_settings)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\wheel.py", line 149, in download_wheel
      report_install_failure(distribution, version, exception_context)
    File "C:\Users\micha\AppData\Local\Temp\pip-build-env-oescv4hq\overlay\Lib\site-packages\nvidia_stub\error.py", line 63, in report_install_failure
      raise InstallFailedError(
  nvidia_stub.error.InstallFailedError:
  *******************************************************************************

  The installation of cudf-cu12 for version 24.4.1 failed.

  This is a special placeholder package which downloads a real wheel package
  from https://pypi.nvidia.com. If https://pypi.nvidia.com is not reachable, we
  cannot download the real wheel file to install.

  You might try installing this package via
  ```
  $ pip install --extra-index-url https://pypi.nvidia.com cudf-cu12
  ```

  Here is some debug information about your platform to include in any bug
  report:

  Python Version: CPython 3.10.14
  Operating System: Windows 10
  CPU Architecture: AMD64
  Driver Version: 546.59
  CUDA Version: 12.3

  *******************************************************************************

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Thanks for following up, unfortunately NeMo-Curator only supports & is tested with linux (ubuntu) based distributions. Based on the error, some of the GPU packages (cuDF etc.) are not supported on Windows 10.

While untested I think the CPU only dependencies might install without issues Windows10.
If linux/Ubuntu is not an option, it might be worth trying out WSL2 on windows with a ubuntu image for GPU support. While not specific to NeMo-Curator here are some details on setting up GPU packages in the WSL2 environment.

I noticed that the README section doesn't have information about OS support, so we'll work on adding some details there.