cjbarrie/academictwitteR

Allow `bind_tweets` to work with individual json file, not directory

Closed this issue · 0 comments

Is your feature request related to a problem? Please describe.

In the discussion below, it is about using bind_tweets to convert an >4G directory of data to tidy data frame using a machine with only 8G of main memory. It is doomed to be failed because of the fact that bind_tweets reads all jsons files into the main memory. And the manipulation of the data needs even more memory.

Describe the solution you'd like
In order to faciliate divide et impera, it would be a nicer idea to allow at least converting each individual data_*.json file of around 500 tweets (with its matching users_.json file) to a data frame first. So that the users can either combine them or put it in a data base or whether later on. The function could be called convert_json. A new bind_tweets would be simply purrr::map_dfr of convert_json for each individual data_.json file in data_path.

Discussed in #197

Originally posted by keoghca July 22, 2021
I have just collected data using academictwitteR and am trying to use the bind_tweets function to convert the JSON files to an R data frame. Here's the code I used:

   tweets <- bind_tweets(data_path = "alltweetdata/", output_format = "tidy")

Here's the error message:

   Error: cannot allocate vector of size 13.1 Mb
   In addition: Warning message:
   closing unused connection 3 (alltweetdata/data_1409120214051139589.json) 

Any suggestions?