a question about fedprox
Closed this issue · 2 comments
in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py".
line 123 -> 129, code is below.
elif client.args.federated_type == 'fedprox':
# Adding proximal gradients and loss for fedprox
for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
if client.args.graph.rank == 0:
print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)
client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
may mean $\frac{\mu}{2} \left\|w-w_t\right\|$
.
But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$
is used in fedprox.
I'm not sure what I said above is true. Thank you very much for your kind consideration.
in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.
elif client.args.federated_type == 'fedprox': # Adding proximal gradients and loss for fedprox for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()): if client.args.graph.rank == 0: print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data ))) loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)
client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
may mean$\frac{\mu}{2} \left\|w-w_t\right\|$
. But,$\frac{\mu}{2} \left\|w-w_t\right\|^2$
is used in fedprox.I'm not sure what I said above is true. Thank you very much for your kind consideration.
Hi! I think that you are right. But in fact, the final implementation of the code is correct because the term "client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)" does not affect the gradient of back propagation since the attribute "requires_grad" of the term "client_param.data" and "server_param.data" is false.
in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.
elif client.args.federated_type == 'fedprox': # Adding proximal gradients and loss for fedprox for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()): if client.args.graph.rank == 0: print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data ))) loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)
client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
may mean$\frac{\mu}{2} \left\|w-w_t\right\|$
. But,$\frac{\mu}{2} \left\|w-w_t\right\|^2$
is used in fedprox.
I'm not sure what I said above is true. Thank you very much for your kind consideration.Hi! I think that you are right. But in fact, the final implementation of the code is correct because the term "client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)" does not affect the gradient of back propagation since the attribute "requires_grad" of the term "client_param.data" and "server_param.data" is false.
Think you for your kind reply, I think it helps me a lot.