About environment!
XLR-man opened this issue · 7 comments
I can't create the environment successfully:
Collecting ipython==8.4.0
Downloading http://mirrors.aliyun.com/pypi/packages/fe/10/0a5925e6e8e4c948b195b4c776cae0d9d7bc6382008a0f7ed2d293bf1cfb/ipython-8.4.0-py3-none-any.whl (750 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.8/750.8 kB 7.7 MB/s eta 0:00:00
Collecting jax==0.3.17
Downloading http://mirrors.aliyun.com/pypi/packages/87/74/950b7af8176499fdc3afea6352b4734325a1c735c026eeb3918b7e422b9a/jax-0.3.17.tar.gz (1.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 29.6 MB/s eta 0:00:00
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'done'
Pip subprocess error:
ERROR: Could not find a version that satisfies the requirement jaxlib==0.3.15+cuda11.cudnn82 (from versions: 0.1.63, 0.1.74, 0.1.75, 0.1.76, 0.3.0, 0.3.2, 0.3.5, 0.3.7, 0.3.10, 0.3.14, 0.3.15, 0.3.20, 0.3.22, 0.3.24, 0.3.25, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.6, 0.4.7, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.15)
ERROR: No matching distribution found for jaxlib==0.3.15+cuda11.cudnn82
failed
CondaEnvException: Pip failed
By the way, the cuda you said is 11.8,but in the environment.yaml,the cuda is 11.3,is it right?
In your case, you may need to install jaxlib using pip manually, as conda will not find the appropriate jaxlib version from pypi for you. Please refer to the documentation of multinerf and jax for more details.
I use nvcc with version 11.7 (which is provided by our admin and I have no permission to modify) and cuda toolkit 11.3 because I found this environment works for me. You may need to first check the installed cuda version of your system, and then get a compatible jax version here: https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Hi,there is another issue, the file should be "scripts/train.sh",but your file is "scripts/trian.sh"
when i run the code,there is an error:
File "/root/miniconda3/envs/jax/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/jax/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/autodl-tmp/LLNeRF-main/train.py", line 316, in <module>
app.run(main)
File "/root/miniconda3/envs/jax/lib/python3.9/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/root/miniconda3/envs/jax/lib/python3.9/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/root/autodl-tmp/LLNeRF-main/train.py", line 58, in main
print_func(f'{"="*20}\nA new run start at {datetime.datetime.now()}:\n{"="*20}\n\nGo into train.py. Config inited: \n{config}')
File "/root/autodl-tmp/LLNeRF-main/train.py", line 56, in print_func
print(*args, **kargs, file=open(config.logfile, 'a'))
FileNotFoundError: [Errno 2] No such file or directory: './logs/llnerf__still2.txt'
Cannot found the file,do the file need to create firstly?
Yeah, you need to create logs/ before running. Also, the filename error is fixed now.
There is one more question about the training time. My GPU is 2080Ti, your maxsteps is 100000, and I trained with this setting and found that it took 12 hours of training to get 10% of the time. Does this mean I won't be able to train the model anytime soon?
Can you tell me about your experiment setup and the training duration
It takes about 8 hours to train a model on A100 GPU. The training duration should be similar to training a mipnerf using multinerf code.