Sorry, i can not reproduced your result in your plotting_log.zip?

Question

Sorry, i can not reproduced your result in your plotting_log.zip?

Closed this issue 4 years ago · 9 comments

i try parameters of every group in the "plotting_log.zip" you provided, all of results show 0.0833333358168602~0.1666666716337204. It's really strange. and the value of time in my result are all smaller than yours.

Answer 1 · 2021-03-22T14:52:35.000Z

i don't have 5 gpu，i train it using my laptop

Answer 2 · 2021-03-22T14:54:25.000Z

Are you sure you have provided the right hyperparameters？

Answer 3 · 2021-03-22T17:36:57.000Z

Hi Jinhua, thank you very much for your interest in this project! The arguments written in the file args_used.txt in each group of plotting_log.zip (with hyperparams included) were preserved by code, so they were indeed the command line arguments I used for the particular experiments.

About accuracy -
From your screenshot I think it actually shows the expected behavior. The right part of your screenshot shows the accuracy has achieved 77%. The left part shows 10%, which should be due to the learning is still in an early stage (and the system did not have so many malicious nodes). The right part is either because the learning is in a middle or nearly late stage or you had many malicious nodes.

As the accuracy achieved 0.7739, could you elaborate on the number 0.1666666716337204?

About time -
From my point of view you used a GPU to train, right? I also did not set 5 GPUs but rather, used a single V100 to train on Colab Pro. I think the time values shown in your screenshot are also normal if you used a GPU. I can see you were reproducing a PoS experiment, and for example in my log PoS_3_20_vh=0.08/run1/comm_31/accuracy_comm_31.txt, the time info shown there is

...
device_7 validator B: 0.766799807548523
device_5 validator B: 0.766799807548523
device_17 validator B: 0.766799807548523
device_11 validator B: 0.766799807548523
comm_round_block_gen_time: 8.461659180886404
slowest_device_round_ends_time: 23.351376660592212
mining_consensus: PoS 0
forking_happened: False
comm_round_spent_time_on_this_machine: 126.09296488761902

Which is quite similar to your right screenshot.

May I know the arguments you used for generating the reports, and could you send me a zip of your logs to my email found on my profile?

Thanks Jinhua!

Answer 4 · 2021-03-22T17:55:25.000Z

Oh, I'm so sorry! Your right screenshot is from my log, right? Sorry I thought that's your reproduced log as well. I just realized that in my early code, I actually named the current validator to be "evaluator", and that log is from one of my early runs.

For your left screenshot, what were the args you used and what communication round was that? Thanks.

Answer 5 · 2021-03-23T01:25:27.000Z

Hi，Hang Chen! Yes, the right one is your results. I will reply to you in a Gmail letter. Thanks for your timely response.

Answer 6 · 2021-04-19T20:31:04.000Z

Hello Jinhua! As I stated in our emails, the issue was caused by my careless uncommenting of lines 36~40 in DatasetLoad.py, which led to only 60 data samples of the MNIST dataset was sharded to the nodes in the network... I have fixed it and verified that the program runs okay after I commented them again. The change was made in the #101 commit.

Please pull and run the program again when you have time and let me know whether it performs as expected here. Thank you very much for spotting that bug for me!!

Answer 7 · 2021-04-22T07:36:47.000Z

Hi, Hang！Thanks for your reply and hard work. These days, I have been delayed in replying to you because of the evaluation of some companies. I will verify your revised work in the next few days. Thank you again for your contribution.

Answer 8 · 2021-04-23T02:01:09.000Z

Hi，Hang！Yesterday，I commented 36-40 line in the Dataload.py. And then, every round of VBFL training takes a normal time consumption. Moreover, the accuracy performs normally too. As shown in the figure, when VBFL runs “-nd 20 -max_ncomm 100 -ha 12,5,3 -aio 1 -pow 0 -ko 6 -nm 3 -vh 0.08 -cs 0 -B 10 -mn mnist_cnn -iid 0 -lr 0.01 -dtx 1” at round-26, the accuracy of some of the workers can reach 62%+. So far, so good. Thanks for your enhancements.

Answer 9 · 2021-04-23T02:15:19.000Z

Hi Jinhua! Thanks for your verification! I am very glad that it goes back to normal! 😄 Again, I am sorry for the time it cost you for this bug, and thanks again for spotting that for me!

Good luck with your assessment and everything! We shall keep in touch 👍