ropensci/rtweet

Tweets open as list with search_30days and unable to convert to data frame with TwListToDF()

Ariyanjyr opened this issue · 3 comments

Rtweet package

Problem

<So, since the packahe needs to updated as in #739/#738, I turned to the search_30days() function and extracted tweets by dat from 15 september 2022 up to 12-10-2022. However, the extracted tweets are return as list and therefore I am not be able to convert it to a data frame by simply using TwListToDF() as one would do if the tweet are extracted using SearchTwitteR(). I have tried to unlist it, but then all the data is returned to be as one column resulting in 36000+ rows for each day. I just want the twitter data of each day to be converted from list to data frame with keeping the original data as it is. >

Expected behavior

<I expect that there should be a package or a way to do this, because it is a very common thing to go from List to data frame, owever in some way it does not work for me.>

Reproduce the problem

install.packages("rtweet")
remotes::install_github("ropensci/rtweet@devel")
library("remotes")
library("rtweet")

consumer_key <- ".."
consumer_secret <- ".."
access_token <- ".."
access_secret <- ".."
app <- ".."

token = rtweet::create_token(app,consumer_key,consumer_secret,access_token,access_secret)
auth_get()
dataBTC1 <- search_30day("Bitcoin analysis", n = 100, env_name = "Tweets30", fromDate = "20220915000", toDate = "202209152359", parse = FALSE)
data.class(dataBTC1)
## data class is ''list''

##This is one way to make a data frame for 1 column:
df1 <- unlist(dataBTC1)
dataBitcoin1 <- as.data.frame(df1[1:99])

However, to do such manipulations for 36000+ rows and then for 30 days it way too much work. Is there perhaps an easier way?

rtweet version

packageVersion("rtweet")
1.0.2

Session info

## copy/paste output
sessionInfo()

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rtweet_1.0.2  remotes_2.4.2 twitteR_1.1.9

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.9           prettyunits_1.1.1    ps_1.7.1             assertthat_0.2.1     digest_0.6.29        utf8_1.2.2           R6_2.5.1            
 [8] stats4_4.1.3         evaluate_0.17        httr_1.4.4           ggplot2_3.3.6        pillar_1.8.1         rlang_1.0.6          progress_1.2.2      
[15] curl_4.3.3           rstudioapi_0.14      data.table_1.14.2    callr_3.7.2          jquerylib_0.1.4      conStruct_1.0.4      rmarkdown_2.17      
[22] stringr_1.4.1        htmlwidgets_1.5.4    loo_2.5.1            bit_4.0.4            munsell_0.5.0        compiler_4.1.3       xfun_0.33           
[29] rstan_2.21.7         pkgconfig_2.0.3      askpass_1.1          pkgbuild_1.3.1       rstantools_2.2.0     htmltools_0.5.3      openssl_2.0.3       
[36] tidyselect_1.2.0     tibble_3.1.8         gridExtra_2.3        codetools_0.2-18     matrixStats_0.62.0   fansi_1.0.3          crayon_1.5.2        
[43] dplyr_1.0.10         withr_2.5.0          grid_4.1.3           jsonlite_1.8.2       gtable_0.3.1         lifecycle_1.0.3      DBI_1.1.3           
[50] magrittr_2.0.3       StanHeaders_2.21.0-7 scales_1.2.1         RcppParallel_5.1.5   cli_3.2.0            stringi_1.7.6        cachem_1.0.6        
[57] bslib_0.4.0          ellipsis_0.3.2       vctrs_0.4.2          generics_0.1.3       rjson_0.2.21         tools_4.1.3          bit64_4.0.5         
[64] glue_1.6.2           hms_1.1.2            parallel_4.1.3       processx_3.7.0       fastmap_1.1.0        yaml_2.3.5           inline_0.3.19       
[71] colorspace_2.0-3     knitr_1.40           sass_0.4.2          
llrs commented

Hi, you can post questions about rtweet in the rOpenSci forum, as other members of the community might also help you solve your problems, this is only for bugs (or feature requests).

I don't know where this functions come from: TwListToDF(), SearchTwitteR(). Could you elaborate on where, and how are these functions defined? The intended purpose of this is already in rtweet.

As I mentioned in this comment you can use the devel version with just a warning and without problems with the edit fields:

dataBTC1 <- search_30day("Bitcoin analysis", n = 100, env_name = "Tweets30", 
                         fromDate = "20220915000", toDate = "202210130000")

The conversion from the list you get from parse = FALSE to a data.frame is internal. But if you want to parse it in your own format you can write your own function/method and use the raw output of twitter with it. There is no built in magic way to convert Twitter data to structured data and lots of efforts have gone to make it easy for users of rtweet to use it.

llrs commented

Well, I found out that TwListToDF and SearchTwitteR are from twitteR which is archived in github but still not in CRAN. So I wouldn't expect it to work with current rtweet versions. Closing this issue now

Ah, okay thank you!