Questions about how to stably reproducing reported results of CRN(lambda=0) and CRN with Tumor Cancer Simulation data
mengcz13 opened this issue · 0 comments
I am writing to inquire about your ICLR 2020 paper, "Estimating counterfactual treatment outcomes over time through adversarially balanced representations", which I found very interesting. I have been attempting to reproduce your reported results of CRN(lambda=0) and CRN in Figure 1 and Table 8, but I have encountered some difficulties. I was wondering if you could kindly provide some guidance or clarification on the methodology or data used in the paper.
With gamma=10, the paper reported that "CRN improves by 48.1% on the same model architecture without domain adversarial training CRN (λ = 0)" in one-step counterfactual prediction, which is 2.41% v.s. 3.57% according to Table 8.
I tried reproducing these numbers with my own fork here: https://github.com/mengcz13/Counterfactual-Recurrent-Network (the main changes include adding the config to enable CRN(lambda=0), using the best hyperparameters reported in Table 6). I used tensorflow-gpu==1.15.0 and python==3.6.15.
For both CRN(lambda=0) and CRN, I repeated the experiments 5 times with different random seeds (see commands in https://github.com/mengcz13/Counterfactual-Recurrent-Network/blob/master/reproduce_script.sh) and got the average(stdev) of normalized RMSE for one-step prediction.
The differences between the reported results and my reproduced results are listed as follows:
Reported Results in Table 8 | Reproduced Results [reported as average(stdev)] | |
---|---|---|
CRN(lambda=0) | 3.57% | 3.93% (0.34%) |
CRN | 2.41% | 4.22% (0.81%) |
My reproduced normalized RMSE for one-step counterfactual prediction of CRN is significantly higher than the reported numbers, and could not demonstrate the benefit of having balanced representation for treatment effect estimation. I would appreciate it if you could offer any hints or insights on this matter. Thank you very much.