HFAiLab/hai-platform

hai-cli使用报错

Closed this issue · 1 comments

  • 搭建集群在本地部署hai-platform成功,状态如下
(base) ➜  ~ k get pod -n hai-platform
NAME             READY   STATUS    RESTARTS   AGE
hai-platform-0   1/1     Running   0          20h
(base) ➜  ~ k get svc -n hai-platform 
NAME               TYPE           CLUSTER-IP      EXTERNAL-IP     PORT(S)                                                     AGE
hai-platform-svc   LoadBalancer   10.68.214.177   192.168.1.201   5432:30461/TCP,6379:30977/TCP,80:30599/TCP,8080:30411/TCP   20h
(base) ➜  ~ k get node               
NAME     STATUS                     ROLES    AGE   VERSION
master   Ready,SchedulingDisabled   master   44h   v1.29.0
worker   Ready                      node     44h   v1.29.0
  • hai-cli初始化看起来也成功
    1. 平台部署后,user表中有两条记录:haiadmin, bff_admin,但user_access_token表为空
    2. 参考这个issue,在user_access_token表中插入了一条haiadmin的记录:#5 (comment)
    3. 初始化效果
 (hai) ark@zero:~/code/hai/hai-platform$ hai-cli init ACCESS-68516961646d696e2368616961646d696e-E0lGXwIswnn0HpbXAW_tVRjga1wRjD0u --url http://192.168.1.201
初始化成功, 目标配置 /home/ark/.hfai/conf.yml, 配置如下: 
token: ACCESS-68516961646d696e2368616961646d696e-E0lGXwIswnn0HpbXAW_tVRjga1wRjD0u
url: http://192.168.1.201
  • 提交任务报错
(hai) ark@zero:~/code/hai/hai-platform$ hai-cli python /nfsroot/hai-platform/workspace/haiadmin/test.py -- -n 1
 WARNING:  提交的任务将会继承当前环境 ,有可能造成环境不兼容,如不想继承当前环境请添加参数 --no_inherit 
提交任务成功,定义如下
--------------------------------------------------------------------------------
name: test.py
priority: 30
resource:
  group: default
  image: default
  node_count: 1
spec:
  entrypoint: test.py
  parameters: ''
  workspace: /nfsroot/hai-platform/workspace/haiadmin
version: 2

--------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/api/api_utils.py", line 101, in async_requests
    result = json.loads(result)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ark/Program/anaconda3/envs/hai/bin/hai-cli", line 9, in <module>
    sys.exit(cli())
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 1159, in __call__
    return anyio.run(self._main, main, args, kwargs, **({"backend":_anyio_backend} if _anyio_backend is not None else {}))
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/anyio/_core/_eventloop.py", line 68, in run
    return asynclib.run(func, *args, **backend_options)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 204, in run
    return native_run(wrapper(), debug=debug)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/asyncio/base_events.py", line 608, in run_until_complete
    return future.result()
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 199, in wrapper
    return await func(*args)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 1162, in _main
    return await main(*args, **kwargs)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 1083, in main
    rv = await self.invoke(ctx)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 1693, in invoke
    return await _process_result(await sub_ctx.command.invoke(sub_ctx))
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 1429, in invoke
    return await ctx.invoke(self.callback, **ctx.params)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/asyncclick/core.py", line 783, in invoke
    rv = await rv
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/commands/hfai_python.py", line 294, in python
    await func_python_cluster(experiment_py, experiment_args, name, nodes, priority, group, image, environments,
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/commands/hfai_python.py", line 255, in func_python_cluster
    await hfai_experiment.run.callback(config, follow, None, None, None)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/commands/hfai_experiment.py", line 167, in run
    experiment = await create_experiment(experiment_yml)
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/api/experiment_api.py", line 444, in create_experiment
    result = await async_requests(RequestMethod.POST, url=f'{mars_url()}/operating/task/create?token={token}',
  File "/home/ark/Program/anaconda3/envs/hai/lib/python3.8/site-packages/hfai/client/api/api_utils.py", line 116, in async_requests
    raise Exception(f'请求失败: [exception: {str(e)}] [result: {result}]')
Exception: 请求失败: [exception: Expecting value: line 1 column 1 (char 0)] [result: Internal Server Error]

问题已解决,官方提供的镜像registry.cn-hangzhou.aliyuncs.com/hfai/hai-platform:latest中,redis和pgsql使用的端口和对外暴露的端口不一致,修改one/one_etc/core.toml中的端口,重新构建镜像并部署平台即可
截图 2024-04-11 16-22-38