statnet/network

Breaking change: network function errors on edgelists with one edge

ifellows opened this issue · 4 comments

I received a notification that lolog fails cran checks. Upon inspection, I believe this is due to a bug in the new version of network.

> # Works fine
> el <- matrix(c(1,2,1,3),ncol=2, byrow=TRUE)
> attr(el, "n") <- 3
> network(el, directed=TRUE, matrix.type="edgelist")
 Network attributes:
  vertices = 3 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 2 
    missing edges= 0 
    non-missing edges= 2 

 Vertex attribute names: 
    vertex.names 

No edge attributes
> network(el, directed=FALSE, matrix.type="edgelist")
 Network attributes:
  vertices = 3 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 2 
    missing edges= 0 
    non-missing edges= 2 

 Vertex attribute names: 
    vertex.names 

No edge attributes
> 
> # Errors
> el <- matrix(c(1,2),ncol=2)
> attr(el, "n") <- 2
> network(el, directed=TRUE, matrix.type="edgelist")
Error in x[!duplicated(x[, 1:2]), , drop = FALSE] : 
  (subscript) logical subscript too long
> network(el, directed=FALSE, matrix.type="edgelist")
Error in apply(x[, 1:2], 1, sort) : dim(X) must have a positive length

This seems like it may be due to some missing drop=FALSE statements. For example, in network.edgelist perhaps

x[!duplicated(x[, 1:2]), , drop = FALSE]

should be replaced by

x[!duplicated(x[, 1:2, drop=FALSE]), , drop = FALSE]

Good catch. I just checked in a set of changes that should fix the issue. Did you spot any other issues with lolog, other than ones associated with the changed behavior of data.frames? I'll want to push a big fix update to CRAN, but first want to make sure that nothing else slipped through the cracks....

Thanks!

I don't know if it was like this for the previous version of network, but edge lists with two rows are interpreted as adjacency matrices when matrix.type is not specified. It was entirely my own fault for relying on code to guess my intention. That said, I think the guess could have been better.

> el <- matrix(c(1,2,1,3),ncol=2, byrow=TRUE)
> attr(el, "n") <- 3
> el
     [,1] [,2]
[1,]    1    2
[2,]    1    3
attr(,"n")
[1] 3
> network(el, directed=TRUE)
Error in as.network.matrix(x, directed = directed, hyper = hyper, loops = loops,  : 
  the dimensions of the matrix argument (2 by 2) do not match the network size indicated by the attached n attribute (3), perhaps matrix.type argument is not correct
> network(el, directed=TRUE, matrix.type="edgelist")
 Network attributes:
  vertices = 3 
  directed = TRUE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 2 
    missing edges= 0 
    non-missing edges= 2 

 Vertex attribute names: 
    vertex.names 

No edge attributes

Yeah, the matrix type heuristics have to make some judgment calls, and those are tricky in some cases. Currently, a square matrix is always assumed to be an adjacency matrix if not specified (since it usually is); if it has an n attribute that doesn't match the dimension, then that flags as an error. Regular edgelists with two edges, or sna edgelists with three edges, can be hard to distinguish from valued adjacency matrices. It may be worth revisiting those heuristics (especially since we weren't using extra attributes, IIRC, when they were first created); the help does specify that they are dubious, but one would like them to be as smart as they can reasonably be under the circumstances. One such heuristic might be that if a square matrix has 2 or 3 columns, an n attribute not matching the dimension, and the first two columns contain only values in 1:n, then it's probably an edgelist.

Am closing, but will open issue to revisit the matrix.type heuristics.