fabiogiglietto/CooRnet

get_coord_shares() issue

aqibufu opened this issue · 4 comments

Hi,
I'm using the get_coord_shares() function to detect the coordinated sharing behavior. And the data size of Crowdtangle shares is 142k. The memory of my computer is 16G which not work for this data size. So I create a VM on google cloud. First, I try a VM of 16 CPUs and 128G memory. But it don't enough for this 142k shares data whether enable parallel processing or not. Then I increase the VM size to 32 CPUs and 256G memory. Once again, the parallel processing don't work. I want to ask: is such issue normal?

Hi :)
I don't have specific experience with gcloud but on aws parallel processing tends to crash. My guess is that it may be an issue with virtual machines.

One more thing, we recently added a new parameter to get_ctshares to avoid getting the historical posts engagement from CrowdTangle. In my experience this drastically reduces the amount of memory requested.

Yes, the parallel processing on google cloud also tends to crash.

Thanks for letting us know. I guess it's an issue with the underline packages. Please due parallel=FALSE until this issue is addressed by their maintainers.

Thanks. And here I have another question. Under what kinds of conditions, the coordination interval from estimate_coord_interval will be automatically set to 1 secs. The log.txt file report the Warning: with the specified parameters p and q the median was 0 secs. The coordination interval has been automatically set to 1 secs.