MLOPTPSU/FedTorch

a question about fedprox

Closed this issue · 2 comments

in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py".
line 123 -> 129, code is below.

elif client.args.federated_type == 'fedprox':
    # Adding proximal gradients and loss for fedprox
    for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
        if client.args.graph.rank == 0:
            print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
        loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
        client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)

client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) may mean $\frac{\mu}{2} \left\|w-w_t\right\|$.
But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$ is used in fedprox.

I'm not sure what I said above is true. Thank you very much for your kind consideration.

in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.

elif client.args.federated_type == 'fedprox':
    # Adding proximal gradients and loss for fedprox
    for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
        if client.args.graph.rank == 0:
            print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
        loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
        client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)

client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) may mean $\frac{\mu}{2} \left\|w-w_t\right\|$. But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$ is used in fedprox.

I'm not sure what I said above is true. Thank you very much for your kind consideration.

Hi! I think that you are right. But in fact, the final implementation of the code is correct because the term "client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)" does not affect the gradient of back propagation since the attribute "requires_grad" of the term "client_param.data" and "server_param.data" is false.

in this page "https://github.com/MLOPTPSU/FedTorch/blob/main/fedtorch/comms/trainings/federated/main.py". line 123 -> 129, code is below.

elif client.args.federated_type == 'fedprox':
    # Adding proximal gradients and loss for fedprox
    for client_param, server_param in zip(client.model.parameters(), client.model_server.parameters()):
        if client.args.graph.rank == 0:
            print("distance norm for prox is:{}".format(torch.norm(client_param.data - server_param.data )))
        loss += client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)
        client_param.grad.data += client.args.fedprox_mu * (client_param.data - server_param.data)

client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data) may mean $\frac{\mu}{2} \left\|w-w_t\right\|$. But, $\frac{\mu}{2} \left\|w-w_t\right\|^2$ is used in fedprox.
I'm not sure what I said above is true. Thank you very much for your kind consideration.

Hi! I think that you are right. But in fact, the final implementation of the code is correct because the term "client.args.fedprox_mu /2 * torch.norm(client_param.data - server_param.data)" does not affect the gradient of back propagation since the attribute "requires_grad" of the term "client_param.data" and "server_param.data" is false.

Think you for your kind reply, I think it helps me a lot.