fatal: unable to access 'https://github.com/mlcommons/ck/': GnuTLS recv error (-9): Error decoding the received TLS packet.
Opened this issue · 7 comments
I want to reproduce nvidia-bert https://github.com/mlcommons/ck/blob/master/docs/mlperf/inference/bert/README_nvidia.md#build-nvidia-docker-container-from-31-inference-round
when I run "cm docker script --tags=build,nvidia,inference,server", I encounter some problems.
=> ERROR [10/12] RUN cm pull repo mlcommons@ck 104.6s
[10/12] RUN cm pull repo mlcommons@ck:
0.255 Cloning into 'mlcommons@ck'...
104.5 error: RPC failed; curl 92 HTTP/2 stream 0 was not closed cleanly: CANCEL (err 8)
104.5 fatal: the remote end hung up unexpectedly
104.5 fatal: early EOF
104.5 fatal: index-pack failed
104.5 Warning: CM index is used for the first time. CM will reindex all artifacts now - it may take some time ...
104.5 =======================================================
104.5 Alias: mlcommons@ck
104.5 URL: https://github.com/mlcommons/ck
104.5
104.5 Local path: /home/cmuser/CM/repos/mlcommons@ck
104.5
104.5 git clone https://github.com/mlcommons/ck mlcommons@ck
104.5
104.5
104.5 CM error: repository was not cloned!
mlperf-inference:mlpinf-v3.1-cuda12.2-cudnn8.9-x86_64-ubuntu20.04-l4-public.Dockerfile:32
30 |
31 | # Download CM repo for scripts
32 | >>> RUN cm pull repo mlcommons@ck
33 |
34 | # Install all system dependencies
ERROR: failed to solve: process "/bin/bash -c cm pull repo mlcommons@ck" did not complete successfully: exit code: 1
CM error: Portable CM script failed (name = build-docker-image, return code = 256)
I think the problem is that GitHub was down or you don't have an access to it.
Can you please try git clone https://github.com/mlcommons/ck mlcommons@ck
in some temp directory to check if it works and then restart the cm command when it's working? Please tell us if it helps! Thanks!
@KingICCrab - did you try again to see if it works? I believe it's a network issue - it happens with GitHub from time to time ;) ...
Thank you for your consideration!
I‘m sorry. I temporarily give up reproducing it, because I know about docker little.
Thank you for your consideration! I‘m sorry. I temporarily give up reproducing it, because I know about docker little.
No problem. What I meant is that may I ask you to retry the same CM command and see if it works now:
cm docker script --tags=build,nvidia,inference,server
When there is a network issue, CM should restart building Docker container at the place it failed ...
Thanks!
After I run the command, the error is following.
(These words are red!)
Cloning into 'repo'...
error: RPC failed; curl 28 Failed to connect to github.com port 443: Connection timed out
fatal: the remote end hung up unexpectedly
Traceback (most recent call last):
File "/home/cmuser/.local/bin/cm", line 8, in
sys.exit(run())
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/cli.py", line 35, in run
r = cm.access(argv, out='con')
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1281, in _run
r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1281, in _run
r = self._call_run_deps(deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1454, in _run
r = self._call_run_deps(prehook_deps, self.local_env_keys, local_env_keys_from_meta, env, state, const, const_state, add_deps_recursive,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2699, in _call_run_deps
r = script._run_deps(deps, local_env_keys, env, state, const, const_state, add_deps_recursive, recursion_spaces,
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 2854, in _run_deps
r = self.cmind.access(ii)
File "/home/cmuser/.local/lib/python3.8/site-packages/cmind/core.py", line 587, in access
r = action_addr(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 193, in run
r = self._run(i)
File "/home/cmuser/CM/repos/mlcommons@ck/cm-mlops/automation/script/module.py", line 1596, in _run
if dependent_cached_path != '' and not os.path.samefile(cached_path, dependent_cached_path):
File "/usr/lib/python3.8/genericpath.py", line 101, in samefile
s2 = os.stat(f2)
FileNotFoundError: [Errno 2] No such file or directory: '/home/cmuser/CM/repos/local/cache/9d809940ee024b38/repo'
Interesting. Thank you very much again for your feedback @KingICCrab - we didn't encounter such case before and will need to CM support to handle it in a better way! I will keep this ticket open to check it when we have time ... Thanks again!