Bugs while running ~/FedML/python/examples/simulation/mpi_torch_async_fedavg/torch_fedavg_mnist_lr_custum_data_and_model_example.py
yaokunxu opened this issue · 0 comments
when i follow the Readme to test the program, the problems come out
First one:
I have no idea about why this would happen and get no way to know how does it work , Anyway the code run
"run_custom_data_and_model_example.sh: 11: -hostfile: not found" and "run.sh: 19: -host: not found"
Second one:
It may indicate that there are some bugs in the code
"[FedML-Server @device-id-0] [Sun, 16 Jul 2023 12:46:33] [ERROR] [mlops_runtime_log.py:36:handle_exception] Uncaught exception
Traceback (most recent call last):
File "/home/xuhd/FedML/python/examples/simulation/mpi_torch_async_fedavg/torch_fedavg_mnist_lr_custum_data_and_model_example.py", line 40, in
simulator = SimulatorMPI(args, device, dataset, model)
File "/home/xuhd/FedML/python/fedml/simulation/simulator.py", line 107, in init
FedML_FedAvgSeq_distributed(
File "/home/xuhd/FedML/python/fedml/simulation/mpi/fedavg_seq/FedAvgSeqAPI.py", line 31, in FedML_FedAvgSeq_distributed
init_server(
File "/home/xuhd/FedML/python/fedml/simulation/mpi/fedavg_seq/FedAvgSeqAPI.py", line 99, in init_server
server_manager.send_init_msg()
File "/home/xuhd/FedML/python/fedml/simulation/mpi/fedavg_seq/FedAvgServerManager.py", line 42, in send_init_msg
client_schedule = self.aggregator.generate_client_schedule(self.args.round_idx, client_indexes)
File "/home/xuhd/FedML/python/fedml/simulation/mpi/fedavg_seq/FedAVGAggregator.py", line 183, in generate_client_schedule
client_schedule = np.array_split(client_indexes, self.worker_num)
File "/usr/local/lib/python3.10/dist-packages/numpy/lib/shape_base.py", line 770, in array_split
raise ValueError('number sections must be larger than 0.') from None
ValueError: number sections must be larger than 0."
Third one:
The file is there but the error occurs
"Traceback (most recent call last):
File "/home/xuhd/FedML/python/examples/simulation/mpi_torch_async_fedavg/torch_fedavg_mnist_lr_custum_data_and_model_example.py", line 25, in
args = fedml.init()
File "/home/xuhd/FedML/python/fedml/init.py", line 33, in init
args = load_arguments(fedml._global_training_type, fedml._global_comm_backend)
File "/home/xuhd/FedML/python/fedml/arguments.py", line 188, in load_arguments
args = Arguments(cmd_args, training_type, comm_backend)
File "/home/xuhd/FedML/python/fedml/arguments.py", line 74, in init
self.get_default_yaml_config(cmd_args, training_type, comm_backend)
File "/home/xuhd/FedML/python/fedml/arguments.py", line 129, in get_default_yaml_config
configuration = self.load_yaml_config(cmd_args.yaml_config_file)
File "/home/xuhd/FedML/python/fedml/arguments.py", line 80, in load_yaml_config
with open(yaml_path, "r") as stream:
FileNotFoundError: [Errno 2] No such file or directory: 'config/zht_config.yaml\r'"
Looking forward to your help.Respect.OTZ.