ropensci-archive/rtweet

`save_as_csv` not working: "data frame still contains recursive columns!"

Closed this issue · 5 comments

Problem

I've retreived some tweets and want to store it in a csv file. save_as_csv() and write_as_csv() would fail stating that:

> save_as_csv(rt, "rt.csv")
Error in `vectbl_as_row_location()`:
! Can't subset rows with `i`.
✖ Logical subscript `i` must be size 1 or 997, not 43.
Run `rlang::last_error()` to see where the error occurred.
Warning message:
In flatten(x) : data frame still contains recursive columns!

Expected behavior

I'd like my dataframe to be flattened and stored in a csv file.

Reproduce the problem

library(rtweet)

# Autenticate
# auth_setup_default() # No longer working.
auth <- rtweet_app()

rt <- search_tweets("#rstats", n = 1000, include_rts = FALSE, token = auth)

write_as_csv(rt, "rt.csv")

Same goes if I simply try to write a subset of data (i.e. save_as_csv(head(rt), "rt.csv")) or if I do another query, so I don't think is about the specific data on my data frame.

rtweet version

## copy/paste output
packageVersion("rtweet")

‘1.1.0’

Session info

> sessionInfo()
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

Matrix products: default
BLAS/LAPACK: /opt/conda/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] rtweet_1.1.0

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.10       compiler_4.1.3    pillar_1.8.1      later_1.3.0       prettyunits_1.1.1 tools_4.1.3       progress_1.2.2   
 [8] digest_0.6.31     bit_4.0.5         jsonlite_1.8.4    evaluate_0.20     lifecycle_1.0.3   tibble_3.1.8      pkgconfig_2.0.3  
[15] rlang_1.0.6       cli_3.6.0         curl_5.0.0        yaml_2.3.6        xfun_0.36         fastmap_1.1.0     withr_2.5.0      
[22] httr_1.4.4        knitr_1.41        vctrs_0.5.2       askpass_1.1       hms_1.1.2         bit64_4.0.5       glue_1.6.2       
[29] R6_2.5.1          fansi_1.0.4       rmarkdown_2.20    magrittr_2.0.3    promises_1.2.0.1  ellipsis_0.3.2    htmltools_0.5.4  
[36] renv_0.16.0       httpuv_1.6.8      utf8_1.2.2        openssl_2.0.5     crayon_1.5.2    
llrs commented

Hi, I assume you tried that several times because a warning or an error should have explained that this function no longer works with the latest output of the rtweet functions.

The reason this fails and I didn't decide to flatten the information is explained in this post (basically I found no good way to flatten the nested data in a tweet). There is a specific section explaining how to save the data. You should decide how to flatten your dataset, but I recommend to save it via saveRDS(rt, "rt.RDS").

Thanks for your reply, @llrs

Yes, you're right: I've tried several times before and I got that warning. I was confused, though, by the message warning and thought there was a typo or some error in the writing. After all, if the functions are not longer working for 1.x, why is this function included in that version?

But then, I tried finding info about saving data and didn't find anything on the docs, so thanks for pointing out to this blog post.

I did save the dataframe as a RDS object, but this only solves my problem partially, as I wanted to store it into a CSV so I could read it from a different software later.

Because of this and the error in the authentication (#756 ) I may revert back to 0.7.0

llrs commented

write_as_csv is still there because there is a function in rtweet that allows to read a csv and revert to the old structure read_twitter_csv. You might edit or filter the data and save back to csv. I will remove it next release. Thanks for the hint!

To save it as csv you'll need to decide what to do with the user data, and all the data about entities that each tweet provides. I was thinking for 6 months for such way last time I updated rtweet and I didn't arrive to any good solution or got any proposal. If you find a way to save the output of the function as a flat csv that makes sense I will gladly incorporate it to rtweet and restore the functionality of save_as_csv and read_twitter_csv for the current output.

The authentication error, as far as I know, is a problem on the Twitter side: some users report it works while some others it doesn't. But we can discuss this in the other issue if you have more information. I hope you have a great analysis (by the way, there is a newer version of rtweet with some bugs fixed, you might want to update if you want to keep using rtweet > 0.7 with less bugs).

thank you so much, Lluís! I have little time for this project, hence my intention to reverting back to the version I used to work with before and used to work for me in previous code, but I'll have to reconsider the strategy and maybe deciding on how to flatten the dataframe. In my case, probably storing multiple values separated by some delimitator, or store values as a vector that I can then expand if needed. but problably none of those approaches make sense, as I do not intend to solve in a minute what you couldn't in six months.

llrs commented

Oh, understood, familiarity counts.
You might solve the problem for yourself, as in you will keep the fields you need/want (probably text, id_str of the tweet, id_str of the user, and username) but me as a package maintainer I cannot guess how best to organize 2 urls and 3 media in a tweet while keeping all the information.
Best of luck with your project! You might want to submit it to rOpenSci use cases