tidytags: Simple Collection and Powerful Analysis of Twitter Data

Question

tidytags: Simple Collection and Powerful Analysis of Twitter Data

bretsw opened this issue 5 years ago · 143 comments

Date accepted: 2022-01-31
Submitting Author Name: Bret Staudt Willet
Submitting Author Github Handle: @bretsw
Other Package Authors Github handles: (comma separated, delete if none) @jrosen48
Repository: https://github.com/bretsw/tidytags
Version submitted: 0.1.0
Submission type: Standard
Editor: @maelle
Reviewers: @llrs, @marionlouveaux

Due date for @llrs: 2021-04-19 Due date for @marionlouveaux: 2021-04-27

Archive: TBD
Version accepted: TBD

Paste the full DESCRIPTION file inside a code block below:

Package: tidytags
Version: 0.1.0
Title: Simple Collection and Powerful Analysis of Twitter Data
Authors@R: c(
    person("K. Bret", "Staudt Willet", , 
      email = "bret@bretsw.com", role = c("aut", "cre"),
      comment = c(ORCID = "0000-0002-6984-416X")
    ),
    person("Joshua M.", "Rosenberg", ,
      role = c("aut"),
      comment = c(ORCID = "0000-0003-2170-0447")
    )
  )
Description: {tidytags} coordinates the simplicity of collecting tweets over time 
    with a [Twitter Archiving Google Sheet](https://tags.hawksey.info/) (TAGS) and the utility of the 
    [{rtweet} package](https://rtweet.info/) for processing and preparing additional Twitter metadata. 
    {tidytags} also introduces functions developed to facilitate systematic yet 
    flexible analyses of data from Twitter.
License: GPL-3
URL: https://bretsw.github.io/tidytags/, https://github.com/bretsw/tidytags
Depends: 
    R (>= 4.0)
Imports:
    dplyr (>= 0.8),
    googlesheets4 (>= 0.2),
    purrr (>= 0.3),
    readr (>= 1.3),
    rlang(>= 0.4),
    rtweet (>= 0.7),
    stringr (>= 1.4),
    tibble (>= 3.0), 
    tidyr (>= 1.0),
    tidyselect (>= 1.0)
Suggests:
    beepr,
    covr,
    ggplot2,
    knitr,
    longurl,
    mapsapi,
    mapview,
    rmarkdown,
    testthat,
    tidyverse,
    urltools,
    usethis
Encoding: UTF-8
VignetteBuilder: knitr
LazyData: TRUE
RoxygenNote: 7.1.0

Scope

Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
- data retrieval
- data extraction
- data munging
- data deposition
- workflow automataion
- version control
- citation management and bibliometrics
- scientific software wrappers
- field and lab reproducibility tools
- database software bindings
- geospatial data
- text analysis
Explain how and why the package falls under these categories (briefly, 1-2 sentences):

{tidytags} allows for both simple data collection and thorough data analysis. In short, {tidytags} first uses a Twitter Archiving Google Sheet (TAGS) to easily collect tweet ID numbers and then uses the R package {rtweet} to re-query the Twitter API to collect additional metadata. {tidytags} also introduces new functions developed to facilitate systematic yet flexible analyses of data from Twitter.

Who is the target audience and what are scientific applications of this package?

The target users for {tidytags} are social scientists (e.g., educational researchers) who have an interest in studying Twitter data but are relatively new to R, data science, or social network analysis. {tidytags} scaffolds tweet collection and analysis through a simple workflow that still allows for robust analyses.

Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?

{tidytags} wraps together functionality from several useful R packages, including {googlesheets4} to bring data from the TAGS tracker into R and {rtweet} for retrieving additional tweet metadata. The contribution of {tidytags} is to bring together the affordance of TAGS to easily collect tweets over time (which is not straightforward with {rtweet}) and the utility of {rtweet} for collecting additional data (which are missing from TAGS). Finally, {tidytags} reshapes data in preparation for geolocation and social network analyses that should be accessible to relatively new R users.

If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.

Technical checks

Confirm each of the following by checking the box.

I have read the guide for authors and rOpenSci packaging guide.

This package:

does not violate the Terms of Service of any service it interacts with.
has a CRAN and OSI accepted license.
contains a README with instructions for installing the development version.
includes documentation with examples for all functions, created with roxygen2.
contains a vignette with examples of its essential functions and uses.
has a test suite.
has continuous integration, including reporting of test coverage using services such as Travis CI, Coveralls and/or CodeCov.

Publication options

Do you intend for this package to go on CRAN?
Do you intend for this package to go on Bioconductor?
Do you wish to automatically submit to the Journal of Open Source Software? If so:

JOSS Options

The package has an obvious research application according to JOSS's definition.
- The package contains a paper.md matching JOSS's requirements with a high-level description in the package root or in inst/.
- The package is deposited in a long-term repository with the DOI:
- (Do not submit your package separately to JOSS)

Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options

The package is novel and will be of interest to the broad readership of the journal.
The manuscript describing the package is no longer than 3000 words.
You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
(Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
(Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
(Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

I agree to abide by rOpenSci's Code of Conduct during the review process and in maintaining my package should it be accepted.

maelle commented 5 years ago

Yay 🎉

Answer 1 · 2020-06-10T14:57:29.000Z

Hello @bretsw and many thanks for your submission! We have discussed the package and have deemed it to be in scope.

@geanders will you be your handling editor 🙂

Answer 2 · 2020-06-10T15:12:12.000Z

Thank you, @annakrystalli! And thank you in advance, @geanders, for working with us. We are looking forward to your feedback.

Answer 3 · 2020-07-22T14:41:56.000Z

Hi @geanders, thank you so much for considering {tidytags} for review. @jrosen48 and I were just thinking about this and thought to reach out to inquire about the status of the review. We recognize that this is likely to be a very busy and hectic time and so understand if the answer is simply that it's 'in-process'. But would you be able to let us know? Thanks again for considering this. //Bret

Answer 4 · 2020-10-05T07:31:27.000Z

👋 @bretsw, some first remarks from me (another editor).

I am generally wondering how the reviewers can test the package. For instance is the TAGS used in the vignette usable by anyone? I see the vignette chunks aren't evaluated, why is it so? Not that I have anything against pre-compiling vignettes in such a case!
Could you add documentation on data protection? See https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research I saw a related open issue in your repo.
In the pkgdown reference page, it would make sense to group functions.
The README content could be partly re-used as a Get Started vignette (just name a vignette tidytags.Rmd). How to re-use chunks.
Relatedly, although the README structure is clear, I'd appreciate a setup checklist to provide an overview. Or maybe a setup vignette like the rtweet vignette about secrets.

* Have a TAGS with blablabla ready (blablabla being the public URL)
* Setup rtweet credentials if you want to use blablabla
* Setup Google geocoding credentials if you want to use blablabla

In the TAGS setup explanation maybe some screenshots would make sense. I say maybe as a) they easily go stale b) your text seems quite clear (not tested yet) and a screenshot wouldn't replace instructions.
Regarding tests am I correct that they're skipped everywhere but locally? Is it both because you are using authentication & Twitter data you don't own/can't share? I am actually working on the "HTTP testing in R" book so I'd recommend choosing an approach with some sort of fake data. I can help more once I know more about your constraints.

Answer 5 · 2020-10-14T18:44:52.000Z

@maelle, thank you for this initial feedback. @jrosen48 and I chatted today and divided up tasks. We're aiming to get back to you on these things by the end of next week (10/23).

Answer 6 · 2020-10-15T06:27:29.000Z

Thanks for the update! 👍 Don't hesitate to ask any question here.

Answer 7 · 2020-10-24T04:26:42.000Z

@maelle, thank you again for your feedback! Below are responses from @jrosen48 and I, and we have pushed all changes to the repo: https://github.com/bretsw/tidytags

We're looking forward to more dialogue on this!
//Bret

I am generally wondering how the reviewers can test the package. For instance is the TAGS used in the vignette usable by anyone? I see the vignette chunks aren't evaluated, why is it so? Not that I have anything against pre-compiling vignettes in such a case!

Our response: The example vignette is usable by anyone. We’ve just set the chunks as eval=FALSE because many of the processes are slow, even when limiting the examples to small data. But all the code should work and anyone can view/pull data from the TAGS tracker linked to in the vignette.

Could you add documentation on data protection? See https://devguide.ropensci.org/policies.html#ethics-data-privacy-and-human-subjects-research I saw a related open issue in your repo.

Our response: We added several paragraphs to a new “Considerations Related to Ethics, Data Privacy, and Human Subjects Research” in the README.
Regarding the open issue in the repo (Issue #2), we are considering this enhancement for a future version of {tidytags}.

In the pkgdown reference page, it would make sense to group functions.

Our response: We have grouped the functions. See https://bretsw.github.io/tidytags/reference/index.html

The README content could be partly re-used as a Get Started vignette (just name a vignette tidytags.Rmd). How to re-use chunks.

Our response: We have created a “Getting started with tidytags” vignette (https://bretsw.github.io/tidytags/articles/setup.html).

Relatedly, although the README structure is clear, I'd appreciate a setup checklist to provide an overview. Or maybe a setup vignette like the rtweet vignette about secrets.

Our response: We have added duplicate checklists in both the README and in the new “Getting started with tidytags” vignette.

In the TAGS setup explanation maybe some screenshots would make sense. I say maybe as a) they easily go stale b) your text seems quite clear (not tested yet) and a screenshot wouldn't replace instructions.

Our response: We have added screenshots to the “Getting started with tidytags” vignette.

Regarding tests am I correct that they're skipped everywhere but locally? Is it both because you are using authentication & Twitter data you don't own/can't share? I am actually working on the "HTTP testing in R" book so I'd recommend choosing an approach with some sort of fake data. I can help more once I know more about your constraints.

Our response: We have set up tests for many of the functions, often using fake data. The functions that we cannot test for CRAN/Travis (i.e., we can only test locally) are those that are querying either the Twitter or Google Maps API. For example, add_users_data() uses a Twitter API key, so we cannot test all aspects of it for CRAN or Travis.

Answer 8 · 2020-10-24T11:31:17.000Z

Thanks a lot to both of you, awesome work! I have a few comments already

I think it'd be good to have a process in place for re-computing the vignette. See for instance https://ropensci.org/technotes/2019/12/08/precompute-vignettes/
Regarding the ethics section of the README, it's very thoughtful! I'd prefer it to be in one of the vignettes too, because users don't usually access the README locally. We can hope they'll read the pkgdown website instead of the local docs, but who knows. Therefore if it were me I'd make the ethics guidance a re-usable chunk. https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/
To make the editors checks I'd need a contributing guide, which reviewers will need too. E.g. I have no TAGS so do I create one (if so how) or is there one developers (and editors/reviewers) of the package can access as a sort of sandbox? (much easier if it's technically feasible)
Examples
- https://docs.ropensci.org/osfr/CONTRIBUTING.html
- https://docs.ropensci.org/ruODK/CONTRIBUTING.html
  Once I know more, I'll give more precise suggestions of how to add tests for the API stuff. We require a test coverage of 75% before review and I'll help you get there as smoothly as possible!

Now I'll go create my credentials, that part I can already follow.

Answer 9 · 2020-10-24T11:32:36.000Z

PS: I like the phrasing "Pain Point" 😂

Answer 10 · 2020-10-24T11:50:34.000Z

Is there no free tier for Google geocoding? If so then my opencage suggestion is becoming more important. 🤔

Answer 11 · 2020-10-24T11:52:32.000Z

well i guess the free credits one gets when creating the billing account count as free tier maybe 🤔

Answer 12 · 2020-10-24T11:58:54.000Z

A small note, in the rtweet docs for creating a token at the very end there is a step allowing you to check your token is available in a new session.
What could a similar check be for the Google Geocoding API?

And even more ambitious and not necessary would be a sitrep function for tidytags, that you'd run to check Twitter and Google geocoding are now set up, and if not pointers to relevant docs. (à la devtools::dev_sitrep())

Answer 13 · 2020-10-24T12:00:18.000Z

Last question before I log off, could the docs state what kind of restrictions can be applied to the Google geocoding API key for tidytags stuff to work?

Answer 14 · 2020-11-06T14:35:42.000Z

👋 @bretsw @jrosen48! Any update? (Acknowledging this is not an easy year/week)

Answer 15 · 2020-11-06T14:44:50.000Z

Hi @maelle! Thank you so much for checking in. I apologize for our delay. This week has been my big annual conference (AECT), and last week I was getting things ready, so I've been sidetracked. I give my last presentation tomorrow, so I'll respond to your comments and suggestions here next week. Thanks for bearing with us!

Answer 16 · 2020-11-06T14:50:39.000Z

No problem, thanks for the update and good luck with your presentation 🎤 🙂

Answer 17 · 2020-11-16T07:07:05.000Z

👋 @bretsw @jrosen48! Have you had time to look at this again? Thank you!

Answer 18 · 2020-11-24T18:07:07.000Z

Hi @maelle, thank you again for your patience, and your helpful comments and suggestions. We have quite a few updates for your review. We list our response to your comment here:

I think it'd be good to have a process in place for pre-computing the vignette. See for instance https://ropensci.org/technotes/2019/12/08/precompute-vignettes/

Our response: I have added the file tidytags-with-conf-hashtags.Rmd.orig for precomputing this vignette (which requires both Twitter API and OpenCage API keys, and takes a long time to compute). I also added the precompile.R file for a quick script for refreshing this vignette in the future.

Regarding the ethics section of the README, it's very thoughtful! I'd prefer it to be in one of the vignettes too, because users don't usually access the README locally. We can hope they'll read the pkgdown website instead of the local docs, but who knows. Therefore if it were me I'd make the ethics guidance a re-usable chunk. https://www.garrickadenbuie.com/blog/dry-vignette-and-readme/

Our response: We have added the reusable fragment “ethics.Rmd” to both the README (where it was already) and the “tidytags-with-conf-hashtags.Rmd” vignette. We also created the reusable fragment “getting-help.Rmd” for both the README and the “setup.Rmd” vignette.

To make the editors checks I'd need a contributing guide, which reviewers will need too. E.g. I have no TAGS so do I create one (if so how) or is there one developers (and editors/reviewers) of the package can access as a sort of sandbox? (much easier if it's technically feasible). Examples: https://docs.ropensci.org/osfr/CONTRIBUTING.html, https://docs.ropensci.org/ruODK/CONTRIBUTING.html. Once I know more, I'll give more precise suggestions of how to add tests for the API stuff. We require a test coverage of 75% before review and I'll help you get there as smoothly as possible!

-Our response: We have added CODE_OF_CONDUCT.md and CONTRIBUTING.md

Outside of my role as editor, let me introduce you to opencage, where a lot of work by @dpprdan is happening in the dev branch these days. Not sure you even need an alternative to Google maps. 😉 Two advantages: 1. no "billing account" for free accounts. 2. use of open data so you're allowed to keep it. Is there no free tier for Google geocoding? If so then my opencage suggestion is becoming more important. 🤔 Well i guess the free credits one gets when creating the billing account count as free tier maybe 🤔 Sign up: https://opencagedata.com/pricing

Our response: I have switched geocode_tags() to draw from OpenCage API instead of Google Maps API. The geocode_tags() documentation has been updated, and the setup.Rmd and tidytags-with-conf-hashtags.Rmd vignettes have been adjusted as well.

A small note, in the rtweet docs for creating a token at the very end there is a step allowing you to check your token is available in a new session. What could a similar check be for the Google Geocoding API? And even more ambitious and not necessary would be a sitrep function for tidytags, that you'd run to check Twitter and Google geocoding are now set up, and if not pointers to relevant docs. (à la devtools::dev_sitrep())

Our response: Following the {rtweet} model, I have added a “Authorization in future R sessions” section to the setup.Rmd vignette for both the Twitter API key and the OpenCage API key.

Last question before I log off, could the docs state what kind of restrictions can be applied to the Google geocoding API key for tidytags stuff to work?

Our response: I have added a paragraph in the setup.Rmd vignette that compares the price points between Google Maps API and OpenCage API. I conclude with “OpenCage does seem to be a bit flexible if the 2,500 queries per day are exceeded. However, if you greatly exceed this limit, they send a warning and ask you to upgrade to a paid plan.”

Outstanding issues:

I followed some examples for our Contributing Guide, but I don’t know if what I’ve written is sufficient.
I would love further guidance on increasing our test coverage, which is way below 75%.

Thank you again for working with us! @jrosen48 and I are looking forward to continuing to develop {tidytags}.

//Bret

Answer 19 · 2020-11-25T13:20:37.000Z

Thanks a ton for all your work!

Regarding the code of conduct, awesome! Note that after acceptance the rOpenSci COC will apply cf https://devguide.ropensci.org/collaboration.html#coc-file
I am glad about OpenCage, not only because it's a nice package 😁 but also because setup is so much easier.
Speaking of setup docs, they are really good!
Now regarding the contributing guide what's missing is a sandbox TAGS like OSF's development environment/ODK sandbox. Could reviewers (and the editor?) get access to a TAGS used for testing the package? (even if ideally once I start looking for reviewers I hope to find TAGS users who'd use the package on their own data)
Speaking of testing. You will want both tests that use cached responses of some sort, and tests making real requests at least once in a while.
- The rOpenSci "HTTP testing in R" book is undergoing a major update (by me 🙂 ) in particular I've worked on demos of the three tools for HTTP testing there are in R. See ropensci-books/http-testing#47 and the Netlify preview linked from the PR checks https://5fbb93389e79d3867339005d--http-testing-book.netlify.app/ (the Whole games section is the one with the demos)
- Regarding tests with real requests especially on CI the best guidance out there is https://gargle.r-lib.org/articles/articles/managing-tokens-securely.html
- If I get access to a sandbox TAGS I can help with setting up tests, e.g. making a PR adding a first vcr test if that helps.

Answer 20 · 2020-11-30T17:57:36.000Z

Hi @maelle! I've added an openly shared TAGS tracker to the CONTRIBUTING guide. See https://github.com/bretsw/tidytags/blob/master/CONTRIBUTING.md#prerequisites

I mention this in the guide, but the TAGS itself is read-only in the web, because the purpose of {tidytags} is to read the tracker archive into R and let you do all your analyses there.

I'll look more at the testing examples you've listed (THANK YOU!), but I wanted to quickly get you the TAGS sandbox first.

Answer 21 · 2020-12-11T13:00:22.000Z

👋 @bretsw! Sorry for the delay, thanks, having the TAGS tracker is awesome.

I've merged my PR to the HTTP testing in R book and added some advanced chapters. https://books.ropensci.org/http-testing/index.html

Do you have any specific questions regarding test setup, that I could look into?

Speaking of testthat, with testthat newest version context() is deprecated, what's used as context instead is the name of the test file (that hopefully reflects the name of the R file).

Also note that you don't need to load tidytags manually (library(tidytags)) in tests as it's loaded by testthat.

Answer 22 · 2020-12-11T13:01:27.000Z

reg testthat's new version https://testthat.r-lib.org/articles/third-edition.html

Answer 23 · 2020-12-16T22:22:56.000Z

Hi @maelle! Thanks again for pointing me in all the right directions. I think I've gotten vcr tests working with Twitter and OpenCage APIs. At least it looks good on my local machine. I just pushed a big update so we'll see. When I ran covr::package_coverage() locally it looked like the tests are covering around 85% now. I've gotten tests for all functions, in any case.

It does look like R-CMD-check is failing on Github though.

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

Let me know what's next! THANK YOU, as always, for all your help.

Answer 24 · 2020-12-17T10:40:29.000Z

Awesome, I'm going to have a look!

I see that the README links to an older version of the R packages book reg licencing. Here's the updated chapter https://r-pkgs.org/license.html

Answer 25 · 2020-12-17T11:01:23.000Z

Looking at the dependencies of the package, why are there dependencies on covr, devtools, goodpractice, rcmdcheck, spelling, styler?

Answer 26 · 2020-12-17T11:16:31.000Z

I see a security problem, I've opened an issue, please look at this urgently ropensci-archive/tidytags#30 @bretsw (at least inactivate the token that was leaked).

Feel free to tell me where in the docs we could have warned you more (this coming from someone who leaked their own GitHub PAT a few weeks ago 😬 )

Answer 27 · 2020-12-17T11:19:32.000Z

(the link https://books.ropensci.org/http-testing/vcr-security.html#if-the-secret-is-in-a-request-header was updated two days ago, and the features described are also very new in vcr cc @sckott)

Answer 28 · 2020-12-17T11:51:21.000Z

there's no setup for also running real requests once in a while, correct? We don't have requirements around HTTP testing yet (just the test coverage one) but real requests can be useful. For them you'd need an encrypted Twitter token for instance, which means more security-related work. https://books.ropensci.org/http-testing/real-requests-chapter.html & https://books.ropensci.org/http-testing/security-chapter.html

Answer 29 · 2020-12-17T12:10:32.000Z

A few last comments for today

the current vignette building approach is fine, but for info with vcr::inject_cassette()/vcr::eject_cassette() (and the proper vcr security configuration) you could instead use vcr cassettes in the vignettes.
you might want to take some notes around your test setup in CONTRIBUTING.md, see https://books.ropensci.org/http-testing/contributor-friendliness.html
in the contributing guide what does "common sense" mean?
the checks in my typo fix PR failed, could you ensure tests pass for forks? I suspect it's because there's no fake Twitter httr token file around. Maybe something like below can help

httr::oauth2.0_token(
  endpoint = httr::oauth_endpoints("twitter"),
  app = httr::oauth_app("foobar", "foobar", "foobar"),
  credentials = list(access_token = "foobar")
)

I might hold off looking for reviewers until January (I'll be on vacation for two weeks, and I suspect it'll be hard to get responses). In any case you've done awesome work on the package, thank you.

Answer 30 · 2020-12-17T12:24:56.000Z

(I see the checks don't pass for the repo either, hence the coverage badge still showing low coverage?)

Answer 31 · 2020-12-17T12:27:08.000Z

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

This might be worth a question in vcr repo / in rOpenSci forum.

Answer 32 · 2021-01-28T15:15:46.000Z

One place where I got stuck and couldn't find my way through was on a vcr test for get_url_domain(). The vcr test worked with long URLS but not shortened ones (e.g., bit.ly). Seems like there's something going on in a dependency package that I'm not catching yet. Anyway, set those test to skip() for now.

This might be worth a question in vcr repo / in rOpenSci forum.

Opened an issue in the vcr repo: ropensci/vcr#220

Answer 33 · 2021-02-24T12:12:12.000Z

@ropensci-review-bot help

Answer 34 · 2021-02-24T12:12:13.000Z

Hello @maelle, here are the things you can ask me to do:


# List all available commands
@ropensci-review-bot help

# Show our Code of Conduct
@ropensci-review-bot code of conduct

# Switch to "seeking reviewers"
@ropensci-review-bot seeking reviewers

# Approve the package
@ropensci-review-bot approve

# Add a user to this issue's reviewers list
@ropensci-review-bot add xxxxx to reviewers

# Remove a user from the reviewers list
@ropensci-review-bot remove xxxxx from reviewers

# Assign a user as the editor of this submission
@ropensci-review-bot assign @username as editor

# Remove the editor assigned to this submission
@ropensci-review-bot remove editor

# Close the issue
@ropensci-review-bot approve

# Close the issue
@ropensci-review-bot out of scope

Answer 35 · 2021-03-02T13:06:04.000Z

I've updated our codecov too and tidytags is at 88% coverage.

Also, I've made a number of updates trying to get CI tests to work. I updated geocode_tags() to reflect the new function from OpenCage, oc_forward_df() which does exactly what I had baked into my older code.

So, CI tests are currently passing on Mac and Ubuntu but the Windows test is returning a strange error:
Error: Package suggested but not available: 'ggraph'

See https://github.com/bretsw/tidytags/runs/2013153747

Have you seen something like this before? The ggraph package definitely still exists and is usable on my local machine (and apparently on Mac and Ubuntu generally). Any ideas there?

Answer 36 · 2021-03-02T13:19:40.000Z

Thanks for the update, this is awesome! (and I'd say a tribute to @dpprdan's new opencage functions)

Usually when I see such errors it happens right after a package update on CRAN, and so the binary is not available. It doesn't seem to be the case here.

I was thinking about suggesting you tweak the workflow so that remotes would error at the dependency installation step, not later (no need to do the check if not all dependencies are present) but then I remembered the tidyverse team seems to be switching their workflows to using pak instead: https://github.com/r-lib/pkgdown/blob/9c95f00d2505ddd83c4722e019202370715d3a3d/.github/workflows/R-CMD-check.yaml#L54

I think using pak instead of remotes might be a good idea and with a bit of luck you'd get no error / a more informative failure on Windows?

Answer 37 · 2021-03-02T13:21:47.000Z

Very interesting! I'll give pak a try and report back. Thanks @maelle!

Answer 38 · 2021-03-02T14:52:37.000Z

@maelle - Success! pak seemed to do the trick. tidytags is now passing CI tests for all OS platforms and showing 88% test coverage. I'll tackle real tests now, but this is a breakthrough moment!

Answer 39 · 2021-03-03T16:06:04.000Z

Hi @maelle, I'm working on setting up real tests for CI, and I wanted to run my plan by you to see what you think.

I've created a new file, "weekly-check.yaml" for the real tests. I've scheduled the tests to occur once every week at midnight Sundays with - cron: '0 0 * * Sun'.

tidytags functions want to find API keys from environmental variables, so I've stored opencage and rtweet keys in GitHub Secrets. I'll add them then with things like OPENCAGE_KEY: ${{ secrets.OPENCAGE_KEY }} and TWITTER_API_KEY: ${{ secrets.TWITTER_API_KEY }} (there are five secrets for Twitter, as listed at https://docs.ropensci.org/rtweet/articles/auth.html). rtweet will create a Twitter token with rtweet::create_token() so I'm adding that to the steps in the .yaml file. The token gets stored in (and then automatically accessed through) an .rds file. I've added "*.rds" to .gitignore so the token in the .rds file shouldn't be exposed at all.

I think that's it. What do you think? Is this a good plan? Am I missing something obvious that will unwittingly expose a secret?

Answer 40 · 2021-03-05T05:58:43.000Z

You mean https://docs.ropensci.org/rtweet/articles/auth.html#2-access-token-secret-method? That sounds great and I should document this in the book as it sounds easier than encrypting.

As this thread taught us both how many ways there are to leak secrets 😅 (not your fault!): the token will be in an app dir or so, not in the current directory, correct? Just to make sure it won't get leaked in a check .tar.gz in case of failured.

Also, can you confirm you use a special Twitter account for the app and token, not your account?

Answer 41 · 2021-03-08T12:57:48.000Z

Hi, @maelle. I set up a special Twitter account for the app and token, and the real CI tests are queued to run in about 5 minutes. So, we'll see.

In the meantime, now the regular build has failed all CI tests: https://github.com/bretsw/tidytags/runs/2056401038

Is this because R updated to v4.1 on Friday 3/5? Any advice on how I can help tidytags catch up?

Answer 42 · 2021-03-08T15:53:32.000Z

In some of the runs it seems the error comes from vcr. Could your tidytags example sheet have changed somehow?

Btw the link to that sheet contains a key https://github.com/bretsw/tidytags/runs/2056400999#step:10:142 Is this expected?

Answer 43 · 2021-03-08T15:54:17.000Z

(the real tests might be different tests, or at least some of the tests might be skipped when vcr is off, depending on what the testthat expectations are)

Answer 44 · 2021-03-08T16:27:44.000Z

In some of the runs it seems the error comes from vcr. Could your tidytags example sheet have changed somehow?

Btw the link to that sheet contains a key https://github.com/bretsw/tidytags/runs/2056400999#step:10:142 Is this expected?

Nope, this was not expected. That was my Google API key, which is now deleted. I need to dig into this, because the Google API key was only used back when tidytags used Google Maps for geocoding. I would not have expected that key to be called at all since we switched to OpenCage.

Answer 45 · 2021-03-08T16:30:41.000Z

maybe it's needed for the raw URL to the TAGS?

Answer 46 · 2021-03-08T17:16:40.000Z

I just re-recorded all the vcr cassettes after deleting that Google API key. All tests passed locally. CI tests just failed again with same error as before, with same old key in the raw URL. I'll keep digging, but I'm confused so far.

Answer 47 · 2021-03-09T08:56:37.000Z

It seems a key is needed to read the sheets that don't require a token. https://github.com/tidyverse/googlesheets4/blob/8cd9a75ba17d064a76b5a8bf0b8c9dbfa91f2907/R/request_generate.R#L23

So you need to add this key to the vcr secret filtering (and to provide it for real requests). I do not know where the key in the error log comes from.

Now as to why the request is failing, the request does not look different to me in your cassette https://sheets.googleapis.com/v4/spreadsheets/18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8?fields=spreadsheetId%2Cproperties%2CspreadsheetUrl%2Csheets.properties%2CnamedRanges&key=AIzaSyDKRsnYs5G4c8y4BMlXLKTKMTheNXrsNEM vs what's done on GHA https://sheets.googleapis.com/v4/spreadsheets/18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8?fields=spreadsheetId%2Cproperties%2CspreadsheetUrl%2Csheets.properties%2CnamedRanges&key=AIzaSyDKRsnYs5G4c8y4BMlXLKTKMTheNXrsNEM so that's worth opening a vcr issue.

I hope this key has been invalidated btw.

Answer 48 · 2021-03-09T08:59:36.000Z

Also note, regarding API keys that are used in the query parts of URLs, that vcr now lets you more specifically filter them cf https://docs.ropensci.org/vcr/articles/configuration.html#filter-query-parameters

Answer 49 · 2021-03-09T09:02:23.000Z

so in your package (and that'll be precious information for users / contributors) the secrets are the Twitter token, the OpenCage API key but also some sort of googlesheets4 authentication, be it a token or a key.

Answer 50 · 2021-03-09T12:46:09.000Z

Thank you, @maelle, as ever for your insight and guidance here. I wrestled with this through the rest of the day yesterday and ended up in this same spot: there's a key needed for Google Sheets, which I missed because this gets saved to the local environment in a way that seems persistent. I've started the process of figuring out how to best save and call the Google API key. I'll need to rewrite a tidytags function or two, document the process of getting and using the key in the setup vignette, re-record vcr cassettes, etc. I have a plan at least. I'll keep you posted!

Answer 51 · 2021-03-09T18:10:09.000Z

That's pretty awful that Google makes you put an api key in a query param. I'd expect better from them. Anyway, hopefully the new filter query params option will work.

Answer 52 · 2021-03-09T18:55:33.000Z

Thanks Scott! Certainly caught me off guard yesterday. I am excited to implement the new vcr feature at least. I'll keep you posted with how it's going!

Answer 53 · 2021-03-17T18:36:38.000Z

I though I had this Google API key issue figured out, but no such luck. @maelle, how familiar are you with googlesheets4?

Something is happening with the stored Google API key that I'm not understanding. Somehow my old Google API key is stored somewhere in a way that I can't seem to change or access, until I record vcr cassettes and see that it is exposed in the request URL. Which should be fine, because I've revoked that token and now have a new one. However, my new API key won't work when I run googlesheets4::gs4_auth_configure(api_key = Sys.getenv("GOOGLE_API_KEY")), but running googlesheets4::gs4_deauth() (which sets the API token to NULL) and googlesheets4::gs4_auth_configure(api_key = NULL) (which sets the API key to NULL) somehow lets me query the sheets API. That is, with both a NULL key and a NULL token, I can successfully run googlesheets4::range_read(googlesheets4::gs4_examples("deaths")) or perform my tidytags package tests (locally).

In sum, there's an old, deactivated API key stored somewhere I can't locate and being accessed in a way I can't decipher. The old API key is still currently exposed in several vcr cassettes (in the "fixtures" directory), but I'm ok with this for now because the key is actually decommissioned.

Any ideas?

Answer 54 · 2021-03-18T11:37:06.000Z

Not familiar at all!

If I follow correctly there are two problems

reg the API key in vcr cassettes you need to tweak vcr configuration. https://docs.ropensci.org/vcr/articles/configuration.html#filter_query_parameters Like for request headers, with vcr latest version (not on CRAN yet though) you can filter query parameters.
now as to where that key is saved, could you try searching for the key string on your computer? Could https://googlesheets4.tidyverse.org/reference/gs4_auth_configure.html help?

The next step would be to ask for help on RStudio community forum (since googlesheets4 is an RStudio package, I'd expect more users there than on rOpenSci forum).

Answer 55 · 2021-03-19T11:57:39.000Z

@maelle, I figured it out! I was getting ready to post to the RStudio community forum, and first I looked everything over one more time. I altered the restrictions to the Google API key in the Cloud Console setup, and this did the trick! The issue wasn't with my code but the restrictions. I've updated the tidytags setup vignette to make this clearer.

Answer 56 · 2021-03-19T12:08:44.000Z

🎉 👏 so only tests with real requests are "needed" before we proceed IIRC.

Answer 57 · 2021-03-19T12:12:33.000Z

Yes! I'm on it today or early next week. So close.

Answer 58 · 2021-03-22T10:52:10.000Z

Hi i@maelle, I've set up tests with real requests (https://github.com/bretsw/tidytags/blob/master/.github/workflows/weekly-check.yaml), but they have not been seeming to run at the scheduled time:

on:
  schedule:
    - cron:  '0 6 * * MON,WED,FRI'

Do you see anything obviously wrong? I've tried to reference the rladies example (https://github.com/rladies/meetupr/blob/master/.github/workflows/with-auth.yaml) for inspiration and search elsewhere, but it's not clear to me why the scheduler isn't doing anything. I previously schedule for Sunday midnight but nothing happened over the weekend either.

I'll ask in the RStudio Community forum (https://community.rstudio.com/t/testthat-motivation/27251/4) if there's nothing readily apparent to you.

Answer 59 · 2021-03-22T11:47:18.000Z

Hello! It seems the problem is not the cron syntax but your referring to matrix.config.os without defining it (it can't be shared betwen workflow files).

Answer 60 · 2021-03-22T12:03:56.000Z

Isn't the matrix defined on lines 28-30? This reflects lines 26-28 in the rladies' meetupr yaml.

Answer 61 · 2021-03-22T12:09:18.000Z

Right but it seems it isn't parsed? The meetupr YAML doesn't work either 😅 https://github.com/rladies/meetupr/actions/runs/673097825

Answer 62 · 2021-03-22T12:09:59.000Z

Maybe I'm missing something, but it seems to me that it did run: https://github.com/bretsw/tidytags/actions/runs/675062243

Answer 63 · 2021-03-22T13:31:38.000Z

Thanks @dpprdan for catching that this actually did run (yay!) and @maelle for demonstrating the solution for parsing in meetupr. I'm testing with tidytags now and will report back. I really appreciate the community support!

Answer 64 · 2021-03-22T13:34:27.000Z

I am still not sure I understand my own meetupr YAML file so I updated it. Therefore only @dpprdan deserves thanks. 😂

Answer 65 · 2021-03-22T15:21:21.000Z

Hi @maelle, looks like everything is working with tidytags:

CI tests work with vcr when I push changes.
CI tests work with real requests, now scheduled for every Monday, Wednesday, and Friday

I think(?) I've checked everything off the list!

Answer 66 · 2021-03-23T13:14:19.000Z

@ropensci-review-bot seeking reviewers

Answer 67 · 2021-03-23T13:14:41.000Z

Please add this badge to the README of your package repository:

[![Status at rOpenSci Software Peer Review](https://badges.ropensci.org/382_status.svg)](https://github.com/ropensci/software-review/issues/382)

Furthermore, if your package does not have a NEWS.md file yet, please create one to capture the changes made during the review process. See https://devguide.ropensci.org/releasing.html#news

Answer 68 · 2021-03-23T13:25:23.000Z

For info I made a call on Twitter https://twitter.com/ma_salmon/status/1374349523900899328 (if Twitter isn't appropriate for this, then for what is it useful 😁 ) hoping to find someone using TAGS in particular. I'll contact potential reviewers (using TAGS or not) within the next few days.

Answer 69 · 2021-03-23T13:34:42.000Z

Sounds good! I've retweeted your call. Thank you!

Answer 70 · 2021-03-29T10:42:01.000Z

@ropensci-review-bot add @llrs to reviewers

Answer 71 · 2021-03-29T10:42:30.000Z

That can't be done if there is no editor assigned

Answer 72 · 2021-03-29T10:43:37.000Z

@ropensci-review-bot assign @maelle as editor

Answer 73 · 2021-03-29T10:43:39.000Z

Assigned! @maelle is now the editor

Answer 74 · 2021-03-29T10:43:42.000Z

@ropensci-review-bot add @llrs to reviewers

Answer 75 · 2021-03-29T10:43:45.000Z

@llrs added to the reviewers list. Review due date is 2021-04-19. Thanks @llrs for accepting to review! Please refer to our reviewer guide.

Answer 76 · 2021-03-29T11:19:54.000Z

@bretsw please don't forget to add the badge mentioned in #382 (comment) 🙂

Answer 77 · 2021-03-29T11:30:47.000Z

Thanks for the reminder, I totally missed that prompt, probably from skimming past messages from the review bot. Sorry bot! I'll add NEWS.md next.

Answer 78 · 2021-03-29T13:52:07.000Z

@ropensci-review-bot add @marionlouveaux to reviewers

Answer 79 · 2021-03-29T13:52:24.000Z

@marionlouveaux added to the reviewers list. Review due date is 2021-04-27. Thanks @marionlouveaux for accepting to review! Please refer to our reviewer guide.

Answer 80 · 2021-03-29T13:52:37.000Z

As discussed with @marionlouveaux, amending the due date for review to 2021-04-27 to accommodate @marionlouveaux's schedule.

Answer 81 · 2021-03-29T13:53:31.000Z

Thanks @llrs and @marionlouveaux for accepting to review! 🙏

Answer 82 · 2021-04-04T13:34:41.000Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally

The setup vignette is not clear what steps are necessary and which not, I would suggests adding titles and an index.

he vignettes have a "Pain point #4" which I couldn't find referenced anywhere. Also, perhaps using the more descriptive title would make it easier for users to know what it is about (Removing the Pain point reference entirely of the title of the section). Not that they aren't pain points but just redirect users when needed to the solutions/documentation as they go.
However, most of the code chunks of the vignette are not run (as reported by BiocCheck):

    * WARNING: Evaluate more vignette chunks.
        # of code chunks: 8
        # of eval=FALSE: 5
        # of nonexecutable code chunks by syntax: 0
        # total unevaluated 5 (62%)

And of those run are adding documentation or set up of the vignette. Perhaps some kind of setup specific for the vignettes could be used, otherwise they defeat their purpose and turn into plain READMEs. (I know it is not easy for CRAN, so maybe set them up as articles just on the website but outside CRAN?)

To create the google API key step is not clear enough (perhaps a redesign on API configuration interface?). An indication to use Google Sheet API to the question "Find out what kind of credentials you need?", would be helpful.

It should be pointed out that OpenCage Geocoding API key is not needed to use the package. Also the discussion about the price and API limits might be good for an issue but doesn't fit well on the vignette (I've seen that @maelle asked for this, but now that the package is settle in Open Cage maybe it is no longer needed or it can be reduced).

On the chunk about "dplyr::glimpse(example_after_rtweet)" I get a different result 2204 rows compared to the 2,215 reported on the vignette.

When I run the following code chunk I get an error (as I don't have the package longurl yet)

example_domains <- get_url_domain(example_urls)

Before using a package in Suggests, it should be tested if they can be loaded (you can use rlang::is_installed(longurl or requireNamespace("longurl", quietly = TRUE)).

Last, I don't know how to push data to the google sheet TAGS created on the first vignette.

Function Documentation: for all exported functions
Name of the function repeated on the description on add_users_data, I think it is not needed.
Examples (that run successfully locally) for all exported functions

Examples cannot be run without the authentication setup and there is no mention of this on the help pages. Perhaps a minor comment will remind users.

Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).

There isn't any BugReports but on the vignettes there is info about how to get help.
Would suggest to add the issues link on the description too. The contributing file is extensive and well organized.

For packages co-submitting to JOSS

The package has an obvious research application according to JOSS's definition

The package contains a paper.md matching JOSS's requirements with:

A short summary describing the high-level functionality of the software
Authors: A list of authors with their affiliations
A statement of need clearly stating problems the software is designed to solve and its target audience.
The authors say: "Yet, many approaches to collecting social media data in the moment require important technical skill that may dissuade social scientists from getting started". However this packages requires authentication in 2 or 3 API and a special google sheets. I have some doubts tidytags will attract the target audience.
References: with DOIs for all those that have one (e.g. papers, datasets, software).
None of the references have dois or urls to online resources.

There's an additional " on the yaml heading of paper.md that prevented viewing the paper.

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.

I got 2 tests that failed and 3 warnings (besides 2 that were skipped). The test-get_url_domain.R:14:5 test reported domain4 not equal to "npr.org", on the browser I get asked for cookie consent on the browser, when run locally, outside testthat or vcr, I get the url of choice.npr.org.
The other failing test are weird (as I don't get them when I run them on the R console but only on the build/check Rstudio panel).

I have a development version installed of vcr and one of the warnings is related to it. The new version warns when the cassettes are empty, this in my experience means that the test it not conclusive, but this could also be related to not having the geo code API enabled.
The other warnings are on test-get_url_domain.R, lines 3 and 32, Invalid URL I'm not sure why, because when I paste on my browser I get redirected to https://www.aect.org/about_us.php. (BTW perhaps the link can be changed to https instead of http).

Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 4

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The package rtweet is experimenting drastic changes (I'm involved on rtweet maintenance) and there will be a major release with breaking changes. Probably it will break this package (the recommendation about the token will change for instance), so be ready to update it accordingly.

The package contains relative few simple functions that provide more data from twitter or make it easier to analyze it. I have not analyzed Twitter data (except for an overview of a user account), so don't know how useful the data is. I am not a user of TAGs but I'm a bit puzzles how to add information to the google sheet: if I'm a new user how should I do it? I mean I can get the template but how I fill it? I think this package would be easier for non technical people if it included a function to add the information gathered via rtweet or processed with the package back to the original google sheet.

Haven't fully read the paper.md for JOSS but I think it is short enough and comprenhsive of the package.

From a more technical point of view, I have some comments about the code and the package:

There are 75 lines longer than 80 characters, try to reduce them. Probably it is just a matter of style and perhaps creating new shorter variables

Also namespaces in Imports field not imported from: ?gargle? ?readr?. All declared Imports should be used.

The get_char_tweet_ids function could be improved, with only one argument if it is a data.frame then extract the status_id and get the ID via id_str. If it is an url you can just extract the last numbers with gsub("https?\\://twitter.com\\/.+/statuses/", "", df$status_url), no need to modify the data.frame and then extract the vector again.

On process_tweets you can simplify the is_self_reply to ifelse(.data$is_reply & .data$user_id == .data$reply_to_user_id, TRUE, FALSE).

On get_upstream_replies the examples are not informative, as there are no replies to get data from on the example dataset. You make multiple calls to pull_tweet_data, some of them might be unnecessary. The process_tweets can be called just once at the end instead of multiple times and on each loop run. This should speed up the process. Also, if there are at most 90000 tweets taken from each run, then you can estimate the number of iterations needed and inform the user. This might make the wait easier. Perhaps it would be better to use lookup_many tweets as it does a similar process. However, users might hit the rate limit and I don't see any information being passed to the user regarding this.

Looking at create_edgelist, it calls process_tweets and also get_replies, get_retweets, get_quotes, get_mentions which they call process_tweets too. Perhaps some internal functions could be created to avoid calling process_tweets multiple times on the same data.

Answer 83 · 2021-04-05T07:00:31.000Z

Thanks a lot for your review @llrs! 🚀

Note that regarding JOSS, we've just changed the process as JOSS will be the ones determining whether the software fits in their scope.

Answer 84 · 2021-04-22T08:02:37.000Z

@llrs which rtweet version did you use for your review, by the way?

@bretsw do you use rtweet CRAN version or the GitHub version with the newer changes?

Thank you 🙂

Answer 85 · 2021-04-22T08:39:35.000Z

@maelle I used the CRAN version

Answer 86 · 2021-04-22T11:56:02.000Z

@maelle I use the CRAN version as well

Answer 87 · 2021-04-23T19:50:44.000Z

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Briefly describe any working relationship you have (had) with the package authors.
As the reviewer I confirm that there are no conflicts of interest for me to review this work (If you are unsure whether you are in conflict, please speak to your editor before starting your review).

Documentation

The package includes all the following forms of documentation:

A statement of need clearly stating problems the software is designed to solve and its target audience in README
In Overview of the README, I would add a sentence to explain what is TAGS, and explain a bit more in detail how rtweet and opencage are used in tidytags and what else is provided by tidytags.
Installation instructions: for the development version of package and any non-standard dependencies in README
Vignette(s) demonstrating major functionality that runs successfully locally
Function Documentation: for all exported functions
Examples (that run successfully locally) for all exported functions
Except for lookup_many_tweets
Community guidelines including contribution guidelines in the README or CONTRIBUTING, and DESCRIPTION with URL, BugReports and Maintainer (which may be autogenerated via Authors@R).
Missing BugReports

Functionality

Installation: Installation succeeds as documented.
Functionality: Any functional claims of the software been confirmed.
Performance: Any performance claims of the software been confirmed.
Automated tests: Unit tests cover essential functions of the package
and a reasonable range of inputs and conditions. All tests pass on the local machine.
7 tests out of 69 failed.
Packaging guidelines: The package conforms to the rOpenSci packaging guidelines

Estimated hours spent reviewing: 10h

Should the author(s) deem it appropriate, I agree to be acknowledged as a package reviewer ("rev" role) in the package DESCRIPTION file.

Review Comments

The {tidytags} package gives the possibility to read a TAGS tracker, which is a Google app continuously collecting tweets from Twitter, based on predefined search criteria and collection frequency. It provides wrappers to {rtweet} and {opencage} functions to simplify the retrieval of metadata, either not fetched by TAGS or not existing in Twitter (in the case of geocoding). In addition, it provides functionnalities to compute additional descriptive variables about the collected tweets and to visualise relationships between tweets. The {tidytags} package interacts with 3 APIs (Google spreadsheets, Twitter and OpenCage) and one Google app (TAGS). For this reason, the set up is a bit long and tedious when done from scratch. The package itself contains a small number of functions, that are well documented.

I used the {pkgreviewr} from RopenSci to conduct my review (a big thanks to the authors of this package). I configured TAGS and created a Gooogle API key. I already had the configuration for {rtweet} and {opencage}. I could run {tidytags} on my own TAGS tracker (and it worked!).

My main comments concern:

the description of the package goals. I didn't get a clear idea of what {tidytags} is doing at the first glimpse on the README. Some more sentences in the Overview paragraph would help. In addition, I would recommend adding a sketch with the main functions and the link with the different APIs. I would also add a checklist of all the things you need to have in order to be set up (more precise than the 4 pain points).
simplification of code for the functions get_replies, get_quotes, get_retweets and get_mentions. I would create an internal get_content function that takes as input parameters df and type, type being "reply", "quote", "retweet" or "mentions".
failing tests: 7 tests out of 69 failed. They are all related to vcr cassettes. These tests pass if I delete the fixtures folder prior to running tests.

I didn't read the paper submitted to JOSS. I am pasting the details of my review in a second comment.

Answer 88 · 2021-04-23T19:54:12.000Z

Session Info

    > sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=fr_FR.UTF-8       LC_NUMERIC=C               LC_TIME=fr_FR.UTF-8        LC_COLLATE=fr_FR.UTF-8     LC_MONETARY=fr_FR.UTF-8   
 [6] LC_MESSAGES=fr_FR.UTF-8    LC_PAPER=fr_FR.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidytags_0.1.2 devtools_2.3.2 usethis_2.0.1  magrittr_2.0.1

loaded via a namespace (and not attached):
  [1] uuid_0.1-4              systemfonts_1.0.1       igraph_1.2.6            lazyeval_0.2.2          sp_1.4-5                crosstalk_1.1.1     
  [7] leaflet_2.0.4.1         ggplot2_3.3.3           urltools_1.7.3          digest_0.6.27           leafpop_0.0.6           htmltools_0.5.1.1   
 [13] viridis_0.5.1           leaflet.providers_1.9.0 fansi_0.4.2             memoise_2.0.0           covr_3.5.1              googlesheets4_0.3.0 
 [19] remotes_2.2.0           readr_1.4.0             graphlayouts_0.7.1      svglite_2.0.0           askpass_1.1             prettyunits_1.1.1   
 [25] colorspace_2.0-0        ggrepel_0.9.1           rtweet_0.7.0            xfun_0.22               dplyr_1.0.5             leafem_0.1.3        
 [31] callr_3.6.0             crayon_1.4.1            jsonlite_1.7.2          roxygen2_7.1.1          attachment_0.2.1        brew_1.0-6          
 [37] glue_1.4.2              xmlparsedata_1.0.5      polyclip_1.10-0         gtable_0.3.0            gargle_1.1.0            webshot_0.5.2       
 [43] pkgbuild_1.2.0          scales_1.1.1            DBI_1.1.1               opencage_0.2.2          Rcpp_1.0.6              viridisLite_0.3.0   
 [49] units_0.7-1             proxy_0.4-25            praise_1.0.0            clisymbols_1.2.0        stats4_4.0.4            xopen_1.0.0         
 [55] htmlwidgets_1.5.3       rex_1.2.0               httr_1.4.2              RColorBrewer_1.1-2      ellipsis_0.3.1          pkgconfig_2.0.3     
 [61] farver_2.1.0            sass_0.3.1              utf8_1.2.1              crul_1.1.0              labeling_0.4.2          tidyselect_1.1.0    
 [67] rlang_0.4.10            munsell_0.5.0           cellranger_1.1.0        tools_4.0.4             cachem_1.0.4            cli_2.4.0           
 [73] generics_0.1.0          evaluate_0.14           stringr_1.4.0           fastmap_1.1.0           yaml_2.2.1              processx_3.5.0      
 [79] knitr_1.31              fs_1.5.0                tidygraph_1.2.0         purrr_0.3.4             satellite_1.0.2         ggraph_2.0.5        
 [85] xml2_1.3.2              compiler_4.0.4          rstudioapi_0.13         curl_4.3                png_0.1-7               e1071_1.7-6         
 [91] testthat_3.0.2          tibble_3.1.0            tweenr_1.0.2            bslib_0.2.4             stringi_1.5.3           pkgreviewr_0.2.0    
 [97] cyclocomp_1.1.0         ps_1.6.0                desc_1.3.0              lattice_0.20-41         whoami_1.3.0            classInt_0.4-3      
[103] goodpractice_1.0.2      vctrs_0.3.7             pillar_1.6.0            lifecycle_1.0.0         triebeard_0.3.0         jquerylib_0.1.3     
[109] rcmdcheck_1.3.3         raster_3.4-5            mapview_2.9.0           R6_2.5.0                KernSmooth_2.23-18      gridExtra_2.3       
[115] sessioninfo_1.1.1       codetools_0.2-18        MASS_7.3-53.1           assertthat_0.2.1        pkgload_1.2.0           openssl_1.4.3       
[121] rprojroot_2.0.2         withr_2.4.1             httpcode_0.3.0          hms_1.0.0               lintr_2.0.1             grid_4.0.4          
[127] tidyr_1.1.3             class_7.3-18            rmarkdown_2.7           googledrive_1.0.1       sf_0.9-8                ggforce_0.3.3       
[133] base64enc_0.1-3         ratelimitr_0.4.1

Test installation

Local installation took several minutes (approx. 3 to 5 minutes) because there is many dependencies. On my machine, it had to install 35 packages

3 dependencies.

Installation details

Installing 35 packages: fauxpas, selectr, broom, forcats, uuid, ids, googledrive, gargle, data.table, blob, polyclip, tweenr, webmockr, rvest, modelr, haven, googlesheets4, dtplyr, dbplyr, ratelimitr, satellite, leafpop, graphlayouts, tidygraph, ggrepel, ggforce, audio, vcr, tidyverse, opencage, mapview, longurl, ggraph, beepr, rtweet
Installing packages into ‘/home/marion/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
also installing the dependencies ‘cli’, ‘lubridate’, ‘pillar’

Check package integrity

run checks on `tidytags` source

Recommendation: In Contributing.md, remind potential contributors to follow the Getting started with tidytags guide before proceeding to a check on the package. Without the API keys, it doesn’t
work.

7 failed tests, all related to vcr. These tests pass if I delete the fixtures folder. NB: the errors message contain my secret tokens for
Twitter, so I removed most of the URLs and replaced it by “………..”.

Error message for failed tests

══ Failed tests ════════════════════════════════════════════════════════════════
── Error (test-add_users_data.R:14:3): user data is added properly ─────────────
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/users/lookup.json?screen_name=gsa_aect%2CAECT...........
vcr is currently using the following cassette:
  - ../fixtures/users_info.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-add_users_data.R:14:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::add_users_data(el) test-add_users_data.R:16:4
  4.   └─rtweet::lookup_users(all_users)
  5.     ├─base::do.call("lookup_users_", args)
  6.     └─rtweet:::lookup_users_(...)
  7.       └─rtweet:::.user_lookup(users, token)
  8.         └─rtweet:::TWIT(get = get, url, token)
  9.           └─httr::GET(url, ...)
 10.             └─httr:::request_perform(req, hu$handle$handle)
 11.               └─httr:::perform_callback("request", req = req)
 12.                 └─webmockr:::callback(...)
 13.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 14.                     └─private$request_handler(req)$handle()
 15.                       └─eval(parse(text = req_type_fun))(self$request)
 16.                         └─err$run()
 17.                           └─self$construct_message()
── Error (test-get_upstream_replies.R:2:3): get_upstream_replies() finds additional replies ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
POST https://api.twitter.com/1.1/statuses/lookup.json?id=...............
vcr is currently using the following cassette:
  - ../fixtures/upstream_replies.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-get_upstream_replies.R:2:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::pull_tweet_data(tags_data) test-get_upstream_replies.R:4:4
  4.   ├─base::ifelse(...)
  5.   ├─base::ifelse(...)
  6.   └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ]))
  7.     ├─base::do.call("lookup_statuses_", args)
  8.     └─rtweet:::lookup_statuses_(...)
  9.       └─rtweet:::.status_lookup(statuses[from:to], token = token)
 10.         └─rtweet:::TWIT(get = get, url, token)
 11.           └─httr::POST(url, ...)
 12.             └─httr:::request_perform(req, hu$handle$handle)
 13.               └─httr:::perform_callback("request", req = req)
 14.                 └─webmockr:::callback(...)
 15.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 16.                     └─private$request_handler(req)$handle()
 17.                       └─eval(parse(text = req_type_fun))(self$request)
 18.                         └─err$run()
 19.                           └─self$construct_message()
── Error (test-get_upstream_replies.R:34:3): get_upstream_replies() works with no new replies found ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/statuses/lookup.json?id=NA&tweet_mode=extended..............
vcr is currently using the following cassette:
  - ../fixtures/upstream_replies_empty.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-get_upstream_replies.R:34:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::get_upstream_replies(sample_data) test-get_upstream_replies.R:35:4
  4.   ├─base::nrow(pull_tweet_data(id_vector = unknown_replies$reply_to_status_id))
  5.   └─tidytags::pull_tweet_data(id_vector = unknown_replies$reply_to_status_id)
  6.     ├─base::ifelse(...)
  7.     ├─base::ifelse(...)
  8.     └─rtweet::lookup_statuses(id_vector[1:n])
  9.       ├─base::do.call("lookup_statuses_", args)
 10.       └─rtweet:::lookup_statuses_(...)
 11.         └─rtweet:::.status_lookup(statuses[from:to], token = token)
 12.           └─rtweet:::TWIT(get = get, url, token)
 13.             └─httr::GET(url, ...)
 14.               └─httr:::request_perform(req, hu$handle$handle)
 15.                 └─httr:::perform_callback("request", req = req)
 16.                   └─webmockr:::callback(...)
 17.                     └─webmockr::HttrAdapter$new()$handle_request(req)
 18.                       └─private$request_handler(req)$handle()
 19.                         └─eval(parse(text = req_type_fun))(self$request)
 20.                           └─err$run()
 21.                             └─self$construct_message()
── Error (test-lookup_many_tweets.R:3:3): lookup_many_tweets() retrieves additional metadata like pull_tweet_data() ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/statuses/lookup.json?id=................
vcr is currently using the following cassette:
  - ../fixtures/lookup_many.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-lookup_many_tweets.R:3:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::pull_tweet_data(sample_tags, n = 10) test-lookup_many_tweets.R:5:4
  4.   ├─base::ifelse(...)
  5.   ├─base::ifelse(...)
  6.   └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ]))
  7.     ├─base::do.call("lookup_statuses_", args)
  8.     └─rtweet:::lookup_statuses_(...)
  9.       └─rtweet:::.status_lookup(statuses[from:to], token = token)
 10.         └─rtweet:::TWIT(get = get, url, token)
 11.           └─httr::GET(url, ...)
 12.             └─httr:::request_perform(req, hu$handle$handle)
 13.               └─httr:::perform_callback("request", req = req)
 14.                 └─webmockr:::callback(...)
 15.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 16.                     └─private$request_handler(req)$handle()
 17.                       └─eval(parse(text = req_type_fun))(self$request)
 18.                         └─err$run()
 19.                           └─self$construct_message()
── Error (test-pull_tweet_data.R:7:3): pull_tweet_data() is able to retrieve additional metadata starting with dataframe ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/statuses/lookup.json?id=.................
vcr is currently using the following cassette:
  - ../fixtures/metadata_from_df.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:7:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::pull_tweet_data(sample_tags, n = 10) test-pull_tweet_data.R:8:4
  4.   ├─base::ifelse(...)
  5.   ├─base::ifelse(...)
  6.   └─rtweet::lookup_statuses(get_char_tweet_ids(df[1:n, ]))
  7.     ├─base::do.call("lookup_statuses_", args)
  8.     └─rtweet:::lookup_statuses_(...)
  9.       └─rtweet:::.status_lookup(statuses[from:to], token = token)
 10.         └─rtweet:::TWIT(get = get, url, token)
 11.           └─httr::GET(url, ...)
 12.             └─httr:::request_perform(req, hu$handle$handle)
 13.               └─httr:::perform_callback("request", req = req)
 14.                 └─webmockr:::callback(...)
 15.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 16.                     └─private$request_handler(req)$handle()
 17.                       └─eval(parse(text = req_type_fun))(self$request)
 18.                         └─err$run()
 19.                           └─self$construct_message()
── Error (test-pull_tweet_data.R:28:3): pull_tweet_data() is able to retrieve additional metadata starting with tweet IDs ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/statuses/lookup.json?id=...................
vcr is currently using the following cassette:
  - ../fixtures/metadata_from_ids.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:28:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::pull_tweet_data(id_vector = sample_tags$id_str, n = 10) test-pull_tweet_data.R:29:4
  4.   ├─base::ifelse(...)
  5.   ├─base::ifelse(...)
  6.   └─rtweet::lookup_statuses(id_vector[1:n])
  7.     ├─base::do.call("lookup_statuses_", args)
  8.     └─rtweet:::lookup_statuses_(...)
  9.       └─rtweet:::.status_lookup(statuses[from:to], token = token)
 10.         └─rtweet:::TWIT(get = get, url, token)
 11.           └─httr::GET(url, ...)
 12.             └─httr:::request_perform(req, hu$handle$handle)
 13.               └─httr:::perform_callback("request", req = req)
 14.                 └─webmockr:::callback(...)
 15.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 16.                     └─private$request_handler(req)$handle()
 17.                       └─eval(parse(text = req_type_fun))(self$request)
 18.                         └─err$run()
 19.                           └─self$construct_message()
── Error (test-pull_tweet_data.R:49:3): pull_tweet_data() is able to retrieve additional metadata starting with tweet URLs ──
Error: 

================================================================================
An HTTP request has been made that vcr does not know how to handle:
GET https://api.twitter.com/1.1/statuses/lookup.json...............
vcr is currently using the following cassette:
  - ../fixtures/metadata_from_urls.yml
    - record_mode: once
    - match_requests_on: method, uri
Set `VCR_VERBOSE_ERRORS=TRUE` for more verbose errors
If you're not sure what to do, open an issue https://github.com/ropensci/vcr/issues
& see https://books.ropensci.org/http-testing
================================================================================


Backtrace:
     █
  1. ├─vcr::use_cassette(...) test-pull_tweet_data.R:49:2
  2. │ └─cassette$call_block(...)
  3. └─tidytags::pull_tweet_data(...) test-pull_tweet_data.R:50:4
  4.   ├─base::ifelse(...)
  5.   └─rtweet::lookup_statuses(get_char_tweet_ids(url_vector[1:n], url_vector = url_vector[1:n]))
  6.     ├─base::do.call("lookup_statuses_", args)
  7.     └─rtweet:::lookup_statuses_(...)
  8.       └─rtweet:::.status_lookup(statuses[from:to], token = token)
  9.         └─rtweet:::TWIT(get = get, url, token)
 10.           └─httr::GET(url, ...)
 11.             └─httr:::request_perform(req, hu$handle$handle)
 12.               └─httr:::perform_callback("request", req = req)
 13.                 └─webmockr:::callback(...)
 14.                   └─webmockr::HttrAdapter$new()$handle_request(req)
 15.                     └─private$request_handler(req)$handle()
 16.                       └─eval(parse(text = req_type_fun))(self$request)
 17.                         └─err$run()
 18.                           └─self$construct_message()

[ FAIL 7 | WARN 0 | SKIP 2 | PASS 62 ]
Error: Test failures

check `tidytags` for goodpractice:

In all your functions (.R files) and in the tests listed below, the package {goodpractice} detected long code lines (above 80 characters).

Test files with long code lines

tidytags/tests/testthat/test-geocode_tags.R
tidytags/tests/testthat/test-get_char_tweet_ids.R
tidytags/tests/testthat/test-get_url_domain.R
tidytags/tests/testthat/test-lookup_many_tweets.R
tidytags/tests/testthat/test-pull_tweet_data.R

Check package metadata files

README

Instead of “Simple Collection and Powerful Analysis of Twitter Data”, I would write “Simple Collection and Powerful Analysis of Twitter Data collected with TAGS”.
In Overview, I would add a sentence to explain what is TAGS. For instance, I would write: {tidytags} retrieves tweet data collected by a Twitter Archiving Google Sheet (TAGS), gets additional metadata from Twitter via the {rtweet} package, and from OpenCage using the opencage package, and provides additional functions to facilitate a systematic yet flexible analyses of data from Twitter. TAGS is based on Google spreadsheets. A TAGS tracker continuously collects tweets from Twitter, based on predefined search criteria and collection frequency." and I would add a link to the vignettes directly there.
In the Setup section, in addition to linking to the Getting started vignette, I would add a checklist of what should be set up in the end, like this:
"To use tidytags at its full capacity, you should have the following things set up:
- Google account
- Twitter account
- TAGS configuration (copy TAGS to your Google account, set access to Twitter, publish the TAGS tracker to the web, share the spreadsheet TAGS with anyone with link) or use the TAGS tracker already in place
- Google API key : set environment variable GOOGLE_API_KEY in .Renviron
- (optional) rtweet API key TWITTER_PAT to .rtweet_token.rds file in .Renviron
- (optional) opencage account and OPENCAGE_KEY in .Renviron
In Getting help, there is a typo: “You may also wish too try some general troubleshooting strategies:”
In Considerations related to ethics…, on word is missing: “In short, please remember that most (if not all) of the you collect may be
about people” and duplicated sentence: “{tidytags} should be used in strict accordance with Twitter’s developer terms.”
In addition to that, RopenSci development guide about readme section
suggests to add a Brief demonstration usage directly in the README, which is missing here. It also encourages to add a paragraph about
how to cite the package, which is also missing. I tried citation(package = "tidytags"). It gives a warning because there
is no date field in description. In the end, the RopenSci development guide also suggests to organize the badges on the README
in a table, when you have many badges, which, in my opinion, is the case here.

Contributing

Typo in Non-technical contributions to {tidytags}: “Both question askers and question answerers are welcome contibrutors!”
This is a great sentence that I would copy to the README (and maybe the Getting started vignette): “To test the {tidytags} package, you
can use an openly shared TAGS tracker that has been collecting tweets associated with the AECT 2019 since September 30, 2019. This
is the same TAGS tracker used in the Using tidytags with a conference hashtag vignette.”
Rather than “We don’t want you to spend a bunch of time on something that we don’t think is a good idea.”, I would write “We don’t want
you to spend a bunch of time on something that we don’t think is a real problem or an appropriate solution”.

DESCRIPTION

I think that you could remove: gargle, covr, roxygen2, tidyverse, usethis, webmockr.

Check documentation

test `tidytags` function help files:

Documentation of create_edgelist, get_quotes, and get_replies: Typo in “See Also Compare to other tidtags functions such as get_replies(), get_retweets(), get_quotes(), and get_mentions().”
Documentation of get_mentions: same name as a function from {rtweet}. The RopenSci documentation guide says “If there is potential overlap or confusion with other packages
providing similar functionality or having a similar name, add a note in the README, main vignette and potentially the Description field
of DESCRIPTION. Example in rtweet README, rebird README.” If possible, I would even change the name of these functions (for
instance, tt_get_mentions). Typo in “See Also Compare to other tidtags functions such as get_replies(), get_retweets(),
get_quotes(), and create_edgelist().”
Documentation of get_quotes and get_replies: The example returns an empty tibble (0 lines).
Documentation of get_retweets: as for get_mentions, the function has the same name as a function from {rtweet}
Documentation of get_upstream_replies: This function does more than just adding replies, it also computes new variables (“word_count”, “character_count”...). This is because get_upstream_replies() calls process_tweets(). It is not clearly stated in the documentation.
Documentation of lookup_many_tweets: Missing example.
Documentation of pull_tweet_data: I would add an intermediate line to avoid repetition of code in the examples like the example below:

    example_url <- "18clYlQeJOc6W5QRuSlJ6_v3snqKJImFhU42bRkM_OX8"          
    tags_content <- read_tags(example_url)          
    pull_tweet_data(tags_content[1:10, ])

And I would add some comments to explain the different examples.
I don’t understand the definition of id_vector and n, and why pull_tweet_data(tags_content[1:10, ]) returns only 7 lines, although there is 10 different tweet IDs in id_str according to unique(tags_content[1:10, ]$id_str).
As id_vector is the parameter statuses in rtweet::lookup_statuses, it would maybe be better to inherit the parameter. At least, I would use
the same vocabulary, and notably talk about “statuses” (a Twitter status is a tweet, a retweet, a quote, or a reply).

test `tidytags` vignettes:

For both vignettes, I would put more information in bold, because there is quite some text.

Comments on the Vignette Getting started with tidytags

Pain Point 2. is missing from the list in the intro paragraph: “- Getting and storing a Google API key”. I suggest adding the same checklist as in the README.
In Pain point 1, rather than “A core functionality of {tidytags} is collecting tweets continuously with a Twitter Archiving Google
Sheet (TAGS).”, I would write “A core functionality of {tidytags} is to retrieve tweets data from a Twitter Archiving Google Sheet
(TAGS). A TAGS tracker continuously collects tweets from Twitter, based on predefined search criteria and collection frequency.”
In Pain point 1, rather than “Here we offer a brief overview, but be sure to read through the information on…”, I would write “Here we
offer a brief overview on how to set up TAGS, but be sure to read through the information on…”.
Missing info in Pain Point 2: I lost some time trying to do the following steps:
1. Enable the Google Sheets API in the Google Developers Console.
2. Create an API key by clicking the CREATE CREDENTIALS button on the API credentials page. Name this key with a clearly identifiable title, such as “API key for tidytags.”
The link to the Google Developers Console redirected me on a page where I didn’t find how to enable the Google Sheets API. I ended searching for the Google Spreadsheet API in the Library and enabling it from there. Then I lost some additional time finding how to create credentials.

For each step, I would add an example with tidytags functions to test that the set up is correct (test API keys and test access to TAGS).

Comments on the vignette Using tidytags with a conference hashtag

Please, add the link to the online version of Fiesler & Proferes, 2018, ie https://journals.sagepub.com/doi/full/10.1177/2056305118763366
In “In sum, although a TAGS tracker is great for easily collecting tweets over time (breadth), it lacks depth in terms of metadata is returned related to the gathered tweets”, I would remove “is returned” and write “In sum, although a TAGS tracker is great for easily collecting tweets over time (breadth), it lacks depth in terms of metadata related to the gathered tweets”.
Maybe, since the vignette does not seem to compile on real data, say that the analysis was run at a certain date. As you add a snapshot of the network of users, you could add a snapshot of a map.
I am missing a sketch of the analysis workflow, and an explanation about the function categories (read_tags, pull_tweet_data and process tweets gather data and create new variables).

Inspect code:

I would remove all .DS_Store and apply usethis::git_vaccinate().
There is code duplication in the get_replies, get_quotes, get_retweets and get_mentions. I would create an internal get_content function that takes as input parameters df and type, type being “reply”, “quote”, “retweet” or “mentions”.
For the add_users_data() function, I would add a lookup_many_users function, similar to lookup_many_tweets and add some warnings in add_users_data() about the limit of 90 000 users, similar to what is in pull_tweet_data.
In my opinion, functions are missing some checks about input data type, for instance does df contain at least one row and is df containing certain column names (that are used in the function after)? does the GOOGLE_API_KEY or the OPENCAGE_KEY exist? in the get_url_domain, is x a character string? Is the edgelist really a dataframe with two columns named receiver and sender?
RopenSci development guide says to “Add #’ @nord to internal functions”. I would add it to the only internal function I found: tidytags:::length_with_na().

Answer 89 · 2021-04-23T20:04:51.000Z

Thank you, @llrs and @marionlouveaux, for your careful and thorough reviews of tidytags. @jrosen48 and I will start working through your comments and suggestions. Bear with us, it seems like there's a good bit to tackle. Thank you though—we know this is going to make the package better.

Answer 90 · 2021-04-26T06:29:28.000Z

Thank you @marionlouveaux for your in-depth review!

Answer 91 · 2021-05-20T08:37:53.000Z

👋 @bretsw @jrosen48! Any update? 😸

Answer 92 · 2021-05-20T20:28:59.000Z

Hi @maelle, no update yet. I've turned the two reviews into a long checklist of items, but @jrosen48 and I have been trying to wrap up our semester responsibilities. We're meeting on Monday to take the next steps.

Answer 93 · 2021-05-21T05:28:22.000Z

Great, thanks for the update!

Answer 94 · 2021-05-31T10:35:09.000Z

👋 @bretsw @jrosen48! Any update after your meeting?

Answer 95 · 2021-06-01T14:10:16.000Z

Hi @maelle! We talked through all the comments, and we're aiming to have our revisions done by the end of next week (June 11).

Answer 96 · 2021-06-25T07:09:08.000Z

For info I've applied a holding label at the authors' request. 🙂

Answer 97 · 2021-09-21T12:39:20.000Z

👋 @bretsw @jrosen48! Any update? 😸

Answer 98 · 2021-09-23T02:39:53.000Z

Hi @maelle! Thank you for checking in again. I think @jrosen48 and I are (finally) getting settled. We set a meeting on October 6 to start tackling the requested changes. Excited to get back to this.

Answer 99 · 2021-09-23T06:07:03.000Z

Great to read, thank you for the update!

Scope

Technical checks

Publication options

Code of conduct

Package Review

Documentation

For packages co-submitting to JOSS

Functionality

Review Comments

Package Review

Documentation

Functionality

Review Comments

Test installation

Check package integrity

run checks on tidytags source

check tidytags for goodpractice:

Check package metadata files

README

Contributing

DESCRIPTION

Check documentation

test tidytags function help files:

test tidytags vignettes:

Comments on the Vignette Getting started with tidytags

Comments on the vignette Using tidytags with a conference hashtag

Inspect code:

run checks on `tidytags` source

check `tidytags` for goodpractice:

test `tidytags` function help files:

test `tidytags` vignettes: