OPTML-Group/DeepZero

Requesting demo code for the use of RGE and CGE.

lizhh268 opened this issue · 1 comments

Hello, I'm very interested in your work. However, I currently have some questions:

When reproducing the code, I found that the accuracy was consistently low and didn't match the accuracy reported in the paper. I'm not sure if I made any mistakes in the process.

I wrote a sample code for using RGE, but I'm unsure if this approach is correct because I noticed that the results are still poor. Each time I update the ZO-estimated gradient, there's only a slight improvement of a few tenths. Additionally, when I calculate the cosine similarity between the estimated gradient and the gradient provided by PyTorch, the similarity is always close to 0.

device = next(model.parameters()).device
x, y = fetch_data(dataloader, class_num, samples_per_class)
x, y = x.to(device), y.to(device)

params = extract_conv2d_and_linear_weights(model)
optimizer.zero_grad()
f_theta = partial(f, network=model, x=x, y=y, loss_func=loss_func)
g0 = rge(f_theta, params, zoo_rs_size, zoo_step_size)
model.parameters.grad = g0
optimizer.step()

First, additional details regarding your implementation, such as specific commands and hardware configurations, may be necessary to identify potential issues.

Second, we employ CGE as the default ZO gradient estimator in our paper due to the high variance of RGE, particularly in high-dimensional settings.

Lastly, as illustrated in the figure below, the ZO estimated gradient is not expected to closely resemble the FO gradient.

Screenshot 2024-09-30 at 9 27 30 AM