Allow `bind_tweets` to work with individual json file, not directory
Closed this issue · 0 comments
Is your feature request related to a problem? Please describe.
In the discussion below, it is about using bind_tweets
to convert an >4G directory of data to tidy data frame using a machine with only 8G of main memory. It is doomed to be failed because of the fact that bind_tweets
reads all jsons files into the main memory. And the manipulation of the data needs even more memory.
Describe the solution you'd like
In order to faciliate divide et impera, it would be a nicer idea to allow at least converting each individual data_*.json
file of around 500 tweets (with its matching users_.json
file) to a data frame first. So that the users can either combine them or put it in a data base or whether later on. The function could be called convert_json
. A new bind_tweets
would be simply purrr::map_dfr
of convert_json
for each individual data_.json
file in data_path
.
Discussed in #197
Originally posted by keoghca July 22, 2021
I have just collected data using academictwitteR and am trying to use the bind_tweets function to convert the JSON files to an R data frame. Here's the code I used:
tweets <- bind_tweets(data_path = "alltweetdata/", output_format = "tidy")
Here's the error message:
Error: cannot allocate vector of size 13.1 Mb
In addition: Warning message:
closing unused connection 3 (alltweetdata/data_1409120214051139589.json)
Any suggestions?