cea-hpc/shine

start,mount,tune fail when tuning_file is not defined in shine.conf

btravouillon opened this issue · 4 comments

With the default shine.conf, the command shine start fails with the following stack trace:

[root@80e259718e60 /]# shine --version
Shine v1.5
[root@80e259718e60 /]# shine start
WARNING: Error connecting to syslog, disabling logging.
Traceback (most recent call last):
  File "/usr/sbin/shine", line 34, in <module>
    sys.exit(Controller().run_command())
  File "/usr/local/lib/python2.7/dist-packages/Shine/Controller.py", line 254, in run_command
    rc = command.filter_rc(command.execute())
  File "/usr/local/lib/python2.7/dist-packages/Shine/Commands/Base/FSLiveCommand.py", line 105, in execute
    result = max(result, self.execute_fs(fs, fs_conf, eh, vlevel))
  File "/usr/local/lib/python2.7/dist-packages/Shine/Commands/Start.py", line 95, in execute_fs
    tunings=Tune.get_tuning(fs_conf, fs.components))
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/FileSystem.py", line 529, in start
    actions.launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 218, in launch
    self._launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 272, in _launch
    if not self._graph_ok(self._members):
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 177, in _graph_ok
    dep.launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 218, in launch
    self._launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 272, in _launch
    if not self._graph_ok(self._members):
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 177, in _graph_ok
    dep.launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 218, in launch
    self._launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 272, in _launch
    if not self._graph_ok(self._members):
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 177, in _graph_ok
    dep.launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Action.py", line 218, in launch
    self._launch()
  File "/usr/local/lib/python2.7/dist-packages/Shine/Lustre/Actions/Install.py", line 50, in _launch
    nodes=self.nodes, handler=self)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Task.py", line 629, in copy
    reverse=reverse)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Exec.py", line 290, in __init__
    self._create_clients(timeout=timeout, **kwargs)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Exec.py", line 310, in _create_clients
    self._add_client(node, rank=rank, **kwargs)
  File "/usr/lib/python2.7/site-packages/ClusterShell/Worker/Exec.py", line 330, in _add_client
    raise ValueError("missing command or source parameter in "
ValueError: missing command or source parameter in worker constructor

With the tuning_file parameter, the start command works as expected.

[root@80e259718e60 /]# echo "tuning_file=/shine/conf/tuning.conf.example" >> /etc/shine/shine.conf 
[root@80e259718e60 /]# shine start
WARNING: Error connecting to syslog, disabling logging.
WARNING: no state report from node nova5 (nova5@tcp0)
WARNING: no state report from node nova5 (nova5@tcp0)
WARNING: no state report from node nova6 (nova6@tcp0)
WARNING: no state report from node nova6 (nova6@tcp0)
= FILESYSTEM STATUS (example) =
TYPE # STATUS        NODES
---- - ------        -----
MGT  1 CHECK FAILURE nova4
MDT  1 CHECK FAILURE nova4
OST  4 CHECK FAILURE nova[5-6]

It looks like the regression has been introduced with dd1f42f.

To be complete, some tunings are automatically added even when no tuning_file is defined. See d5d29a0.

Tunings:
Tuning param: /proc/fs/lustre/osc/example-OST0003*/active=1 types=mds,client
Tuning param: /proc/fs/lustre/osc/example-OST0000*/active=1 types=mds,client
Tuning param: /proc/fs/lustre/osc/example-MDT0000*/active=1 types=mds,client
Tuning param: /proc/fs/lustre/osc/MGS*/active=1 types=mds,client
Tuning param: /proc/fs/lustre/osc/example-OST0001*/active=1 types=mds,client
Tuning param: /proc/fs/lustre/osc/example-OST0002*/active=1 types=mds,client

fs.components=MGS,example-MDT0000,example-OST[0000-0003],example-client

So far, the test_proxy_tunings seems incomplete. I will try to provide an additional test.

Thanks for the report and the patch. See GerritHub patch proposal for more details.

Thanks for this patch!