stocnet/goldfish

Preprocessing error when formula changes

Closed this issue · 4 comments

Describe the bug
Quite often, I am getting the following error just after the first iteration of the choice and choice_coordination models:

Error: Error in DyNAM choice_coordination estimation: Error in estimate_int(maxIterations = 20, maxScoreStopCriterion = 0.01, : Matrix cannot be inverted; probably due to collinearity between parameters.

I would like to receive some tips from you on how to alleviate this. I have observed that generally reducing the amount of data helps, which should not be the case for collinearity problems, so I am a bit at loss. Should I tweak the estimation parameters such as damping also?

To Reproduce
I am unable to provide you the data, but what I can say is that it has around 100.000 events with 60 actors, and the effects appear to be irrelevant: for the same data, I get the same error for different sets of effects. In the above case, it was a tie, inertia, trans and indeg combination.

Desktop (please complete the following information):

  • OS: Windows 10
  • R Version: 4.0.3
  • Goldfish Version: latest

I guess the problem could be related to the default value for the weighted parameter of the effects. By default, it is set to weighted = FALSE, which means the covariate used are ones or zeros. For 60 actors, it's just 3.540 directed ties possible. With 100.000 events, the complete network is reached at some point of the event's history that makes all the effects constant at some point in time, then the correlation of the effects would be near 1 despite that at the beginning they were different.

To have a better sense of the problem, could you provide the following information:

  • Model formula, e.g., callsDependent ~ inertia + recip + trans.
  • Is it an undirected events' network callNetwork <- defineNetwork(nodes = xxx, directed = FALSE) ? If you mention choice_coordination subModel my guess is that is the case.
  • Description of the type of events and networks use as exogenous information. (directed, number of unweighted ties, one-mode, range of values). You don't need to describe the content. I'm rather interested in descriptive statistics.
  • When you try to run the model with all the events: does it complete some iterations? If it can, how many?

I'd be really curious to know how long it takes to estimate a model with all the events. You could explore setting weighted = TRUE . It makes sense for all except trans, transitivity is just the unweighted version, and it's not possible to change to a weighted version. I'd try to add effects in some order to know when it crashes who the culprit is.

As a final remark, you can compute the final values of the statistics quite easily. I'd plot the histograms of them to have an idea of how to interpret the coefficients (it is not exactly what the model uses, but it'd help understand the data). I guess it would be constant values for unweighted versions and heavy tail distributions for the weighted versions. The following code should give the state of the network at a given time. The plots using all the events max(calls$time) should give an idea of the heavy tails of the distribution

library(goldfish)
data("Social_Evolution")

callNetwork <- defineNetwork(nodes = actors, directed = TRUE)
callNetwork <- linkEvents(x = callNetwork, changeEvent = calls, nodes = actors)

# change time to see state of the network a that time
callNetworkEnd <- as.matrix(callNetwork, time = max(calls$time)) # max(time) is the state at the end

transitivity <- (callNetworkEnd > 0) %*% (callNetworkEnd > 0)
hist(transitivity)

diag(callNetworkEnd) <- NA # not self ties or events

hist(callNetworkEnd) # tie, inertia or recip overview. weighted = TRUE
hist((callNetworkEnd > 0) * 1) # tie, inertia or recip overview. weighted = FALSE


hist(colSums(callNetworkEnd, na.rm = TRUE)) # indeg overview. weighted = TRUE
hist(colSums(callNetworkEnd > 0, na.rm = TRUE)) # indeg overview. weighted = FALSE

Hi Alvaro, thanks for the quick and detailed response. I answer your questions below, yet I started to suspect something different, possibly a bug, which is also detailed at the end.

  • Here is the model formula:

        choice_formula <- contacts_dependent ~
            tie(dist, weighted = TRUE) +
            inertia(network, weighted = FALSE, window = "30 mins") +
            inertia(network, weighted = TRUE) +
            trans(network, window = "1 hour") +
            indeg(network, weighted = FALSE)
  • The dependent network is directed (though indeed I also tried the undirected version, yet after reading some more on the coordination and considering the data at hand, I have changed it). The network connected to the tie effect, though, is undirected (but static & not collinear).

  • The dependent network is a contact network, and events are contacts that are (nearly) always replicated (if there is an i->j, after some time there is generally a j->i). It is one mode.

  • I think only one iteration. I see a number with a (1) right to it, and nearly immediately after that the error text. It is always one iteration, no more or less.

That code is most helpful, thanks you! And it seems it is safe to safe that the issue did not come from the unweighted inertia:

image

And something else: the error disappeared out of the blue, as it did once before. This time though I was a bit more careful, and I have some serious suspicions about some bug regarding the usage of the preprocessed effects. I was using a preprocessed formula as init, and then changing one of the effects in the main formula I feed to the estimate function in order to compare a weighted tie effect of some exogenous static network (and have a quicker effect processing phase, as it takes several minutes without that). When I remove that preprocessing and do the whole estimation separately each time, things seem to work. Furthermore, I have noticed that when I set debug: TRUE and verbose: TRUE and use preprocessed formula, even if I change the effects in the formula I feed to the estimate function the debugged output would nevertheless contain a list of parameters with the number of effects from the preprocessed formula and not the formula I give for estimation.

I am not quite sure how to demonstrate this Replicated the error just as I was writing it! Here's a code:

data("Social_Evolution")

callNetwork <- defineNetwork(nodes = actors, directed = TRUE)
callNetwork <- linkEvents(x = callNetwork, changeEvent = calls, nodes = actors)
callsDependent <- defineDependentEvents(events = calls, nodes = actors,
                                        defaultNetwork = callNetwork)

choice_formula <- callsDependent ~ inertia + recip + trans 
preprocessed <- estimate(choice_formula, preprocessingOnly = TRUE, model = "DyNAM", subModel = "choice")  
new_formula<- callsDependent ~ inertia(callNetwork,weighted=TRUE) + recip(callNetwork) + trans(callNetwork) + tie(callNetwork) 
# this fails
estimate(new_formula, subModel = "choice", preprocessingInit = preprocessed, verbose = T,silent = F, debug = T)
# this runs!
estimate(new_formula, subModel = "choice", verbose = T,silent = F, debug = T)

I changed the title and labeled it as a bug. Thanks for the update!

Hello!
My bad, I coded this and indeed there was a bug when the order of effects changed! I now corrected it in the develop branch.
I hope it works fine now (although it's a relatively untested functionality so far...).
Let me know if another error related to this option comes up.
Best,
Marion