microsoft/Recursive-Cascaded-Networks

my training result is not as good as paper

w19787 opened this issue · 8 comments

thanks for excellent paper and work. i try to re-train the project after migrating the project to tensorflow 2.0. Only api is migrated, no network, loss, structure, data preprocessing ,etc changed.

The following is the screen shot of training tensorboard on liver dataset:
image
image

and the evalation result:
image

any idea what might be go wrong? thanks.

What setting are you running? I guess you are running the 1-cascade VTN version that should correspond to these numbers. Specify -n to train more cascades.

What setting are you running? I guess you are running the 1-cascade VTN version that should correspond to these numbers. Specify -n to train more cascades.

thanks for reply. it is my bad not to notice the n should be changed for better performance according paper.

What setting are you running? I guess you are running the 1-cascade VTN version that should correspond to these numbers. Specify -n to train more cascades.

in order to process the 10-cascade VTN training, what kinds of GPU is required? it is OOM on 16G v100 when train on 5 or 10 cascades.

It would be fine if using 4 GPUs.

It would be fine if using 4 GPUs.

after migrating to tf2.0 (since my server is installed new version's cuda which cannot run on tf1.4), the multi-gpu cannot work properly. the compute_gradients seems only work on gpu0. Currently, no idea how to fix it.
image

image

It would be fine if using 4 GPUs.

@zsyzzsoft i have upload https://github.com/w19787/Recursive-Cascaded-Networks-TF2.0, if you can advice how to fix the issue of multi-gpu support on this version will be appreciated!!!

I haven't met this issue before and I'm not familiar with TF 2.0. Maybe the GPU specification doesn't work properly for TF 2, but I'm not sure.

I haven't met this issue before and I'm not familiar with TF 2.0. Maybe the GPU specification doesn't work properly for TF 2, but I'm not sure.

got it. thanks!