paubellot/netbenchmark

Interpreting aupr results

Closed this issue · 1 comments

alnf commented

I confirm that following commit e07b557 fixes the problem when true network evaluated against true network produced wrong AUROC value.

However, there is one more issue, which is a subject for discussion. If you take true network and replace all the zeros with the same non-negative value you would still get nearly perfect precision, while based on my understanding you should get very poor precision:

syntren300.net.nozeros <- syntren300.net
syntren300.net.nozeros[syntren300.net.nozeros==0] <- 0.9

e <- evaluate(syntren300.net.nozeros,syntren300.net,extend=0,sym=FALSE)
aupr(e)
auroc(e)
e <- evaluate(syntren300.net.nozeros,syntren300.net,extend=no.edges,sym=FALSE)
aupr(e)
auroc(e)

Moreover, randomly replacing a row with zeros does not change aupr value at all:

syntren300.net.nozeros <- syntren300.net
syntren300.net.nozeros[syntren300.net.nozeros==0] <- 0.9
syntren300.net.nozeros[4, ] <- 0

e <- evaluate(syntren300.net.nozeros,syntren300.net,extend=0,sym=FALSE)
aupr(e)
auroc(e)
e <- evaluate(syntren300.net.nozeros,syntren300.net,extend=no.edges,sym=FALSE)
aupr(e)
auroc(e)

I believe this could be an issue with minet package. The other package, which I tried with the very same inferred and true networks and which doesn't seem to use minet is networkBMA. It outputs very poor aupr (although I had other problems with that package).

For me this is quite a philosophical question. In netbenchmark (and also in minet) package we decided to evaluate link by link and compute a contingency table for each of the inferred links.

Regarding the first question, if you generate a network taking the true network and replace all the zeros with the same non-negative that is small you don't necessary get a bad result. It would depend:

syntren300.net.nozeros <- syntren300.net
syntren300.net.nozeros[syntren300.net.nozeros==0] <- 0.9
tb <- evaluate(syntren300.net.nozeros,syntren300.net,extend=0,sym=FALSE)

In this case the false links have a smaller confidence than the true ones, therefore you will find first all the true links with a perfect precision and increasing the recall up to 1. Then, you will start to find false inferred links and your precision will start to decrease from 1 to 0. You can see the described behaviour in the next figure:
ex1

pred <- syntren300.net*0.3
pred[pred==0] <- 0.9
tb2 <- evaluate(pred,syntren300.net,extend=0,sym=FALSE)

In this case you get a very bad prediction, as you can see in the next figure:
ex2

Regarding the other questions:

  1. In this particular example it makes no difference to use the extend parameter since the predicted network is complete.
  2. replacing a row with zeros does change aupr if the row contains a link in the true network.
pred <- syntren300.net
pred[pred==0] <- 0.9
pred[2, ] <- 0
tb3 <- evaluate(pred,syntren300.net,extend=0,sym=FALSE)
aupr(tb3) # returns 0.965812
auroc(tb3) # returns 0.9681114

It is a small change since in the 2nd row there are only 15 links.