cjbarrie/academictwitteR

[BUG] Defensive programming: build_query

chainsawriot opened this issue · 2 comments

Please confirm the following

  • I have searched the existing issues
  • The behaviour of the program is deviated from what is described in the documentation.
  • I can reproduce this problem for more than one time.
  • This is NOT a 3-digit error -- it does not display an error message like something went wrong. Status code: 400.
  • This is a 3-digit error and I have consulted the Understanding API errors vignette and the suggestions do not help.

Describe the bug

Although it is quite clear that build_query (and functions depending on it, e.g. get_all_tweets) accepts character and character vector, it is not enforced by the program. One symptom of it is that build_query can actually accept a 1-d data frame and weirdness like #230 arises. Also, it generates some queries that surely will give a 400 error.

Expected Behavior

Two solutions: coerce anything to vector or simply raise an error when the input of build_query is not character or character vector. My preferred solution is the latter.

Steps To Reproduce

require(academictwitteR)
#> Loading required package: academictwitteR
x <- data.frame(username = c("a", "b", "c"))
build_query(query = x)
#>   username
#> 1        a
#> 2        b
#> 3        c
build_query(users = x)
#> [1] " (from:c(\"a\", \"b\", \"c\"))"
build_query(reply_to = x)
#> [1] " (to:c(\"a\", \"b\", \"c\"))"
build_query(retweets_of = x)
#> [1] " (retweets_of:c(\"a\", \"b\", \"c\"))"

get_all_tweets(users = x, start_tweets = "2021-01-01Z00:00:00", end_tweets = "2021-02-01Z00:00:00")
#> Warning: Recommended to specify a data path in order to mitigate data loss when
#> ingesting large amounts of data.
#> Warning: Tweets will not be stored as JSONs or as a .rds file and will only be
#> available in local memory if assigned to an object.
#> query:   (from:c("a", "b", "c"))
#> Error in make_query(url = endpoint_url, params = params, bearer_token = bearer_token, : something went wrong. Status code: 400

Created on 2021-08-24 by the reprex package (v2.0.0)

Environment

#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.33        magrittr_2.0.1    rlang_0.4.11      fansi_0.5.0      
#>  [5] stringr_1.4.0     styler_1.4.1      highr_0.9         tools_4.1.1      
#>  [9] xfun_0.24         utf8_1.2.2        withr_2.4.2       htmltools_0.5.1.1
#> [13] ellipsis_0.3.2    yaml_2.2.1        digest_0.6.27     tibble_3.1.3     
#> [17] lifecycle_1.0.0   crayon_1.4.1      purrr_0.3.4       vctrs_0.3.8      
#> [21] fs_1.5.0          glue_1.4.2        evaluate_0.14     rmarkdown_2.9    
#> [25] reprex_2.0.0      stringi_1.7.3     compiler_4.1.1    pillar_1.6.2     
#> [29] backports_1.2.1   pkgconfig_2.0.3

Anything else?

No response

Thanks for raising this. I think the best option here is to raise an error when input is not character vector, as you say. This has the added benefit of enforcing proper usage (e.g., not passing a data.frame object to build_query())

#349 is also a manifestation of this because a "column" in a tibble is not a vector.