About environment!

Question

About environment!

XLR-man opened this issue 2 years ago · 7 comments

I can't create the environment successfully:

Collecting ipython==8.4.0
  Downloading http://mirrors.aliyun.com/pypi/packages/fe/10/0a5925e6e8e4c948b195b4c776cae0d9d7bc6382008a0f7ed2d293bf1cfb/ipython-8.4.0-py3-none-any.whl (750 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.8/750.8 kB 7.7 MB/s eta 0:00:00
Collecting jax==0.3.17
  Downloading http://mirrors.aliyun.com/pypi/packages/87/74/950b7af8176499fdc3afea6352b4734325a1c735c026eeb3918b7e422b9a/jax-0.3.17.tar.gz (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 29.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'

Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement jaxlib==0.3.15+cuda11.cudnn82 (from versions: 0.1.63, 0.1.74, 0.1.75, 0.1.76, 0.3.0, 0.3.2, 0.3.5, 0.3.7, 0.3.10, 0.3.14, 0.3.15, 0.3.20, 0.3.22, 0.3.24, 0.3.25, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.15)
ERROR: No matching distribution found for jaxlib==0.3.15+cuda11.cudnn82

failed

CondaEnvException: Pip failed

By the way, the cuda you said is 11.8,but in the environment.yaml,the cuda is 11.3,is it right?

Answer 1 · 2023-09-11T05:01:10.000Z

In your case, you may need to install jaxlib using pip manually, as conda will not find the appropriate jaxlib version from pypi for you. Please refer to the documentation of multinerf and jax for more details.

Answer 2 · 2023-09-11T05:09:49.000Z

I use nvcc with version 11.7 (which is provided by our admin and I have no permission to modify) and cuda toolkit 11.3 because I found this environment works for me. You may need to first check the installed cuda version of your system, and then get a compatible jax version here: https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Answer 3 · 2023-09-11T13:07:35.000Z

Hi,there is another issue, the file should be "scripts/train.sh",but your file is "scripts/trian.sh"

Answer 4 · 2023-09-11T13:21:31.000Z

when i run the code,there is an error:

  File "/root/miniconda3/envs/jax/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/jax/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/root/autodl-tmp/LLNeRF-main/train.py", line 316, in <module>
    app.run(main)
  File "/root/miniconda3/envs/jax/lib/python3.9/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/root/miniconda3/envs/jax/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/root/autodl-tmp/LLNeRF-main/train.py", line 58, in main
    print_func(f'{"="*20}\nA new run start at {datetime.datetime.now()}:\n{"="*20}\n\nGo into train.py. Config inited: \n{config}')
  File "/root/autodl-tmp/LLNeRF-main/train.py", line 56, in print_func
    print(*args, **kargs, file=open(config.logfile, 'a'))
FileNotFoundError: [Errno 2] No such file or directory: './logs/llnerf__still2.txt'

Cannot found the file,do the file need to create firstly?

Answer 5 · 2023-09-11T14:54:56.000Z

Yeah, you need to create logs/ before running. Also, the filename error is fixed now.

Answer 6 · 2023-09-14T11:29:23.000Z

There is one more question about the training time. My GPU is 2080Ti, your maxsteps is 100000, and I trained with this setting and found that it took 12 hours of training to get 10% of the time. Does this mean I won't be able to train the model anytime soon?

Can you tell me about your experiment setup and the training duration

Answer 7 · 2023-09-15T09:30:39.000Z

It takes about 8 hours to train a model on A100 GPU. The training duration should be similar to training a mipnerf using multinerf code.