Error in `check_token_v2()`: ! A bearer `token` is needed for this endpoint.
Closed this issue · 6 comments
After loading the latest devel version (‘1.1.0.9001’
), I'm wondering if 'painless' streaming without providing a bearer token is still possible with this package.
One of the greatest features of this package was, that just using a regular account, you could authenticate and stream tweets.
Since you deprecated stream_tweets()
in favor of filtered_stream()
I'm wondering of this is even still possible.
library(rtweet)
auth_setup_default()
#> Using default authentication available.
#> Reading auth from '/Users/abuchmueller/Library/Preferences/org.R-project.R/R/rtweet/default.rds'
sample_stream(parse = F)
#> Error in `check_token_v2()`:
#> ! A bearer `token` is needed for this endpoint.
#> Backtrace:
#> ▆
#> 1. └─rtweet::sample_stream(parse = F)
#> 2. └─rtweet:::endpoint_v2(...)
#> 3. ├─httr2::req_url_path_append(req_v2(token), path)
#> 4. │ └─httr2:::check_request(req)
#> 5. │ └─httr2:::is_request(req)
#> 6. └─rtweet:::req_v2(token)
#> 7. └─rtweet:::check_token_v2(token)
#> 8. └─rlang::abort("A bearer `token` is needed for this endpoint.")
I'm confused. The documentation says token | Expert use only. Use this to override authentication for a single API call. In most cases you are better off changing the default for all calls. See auth_as() for details.
suggesting that bearer tokens are optional.
Am I missing something here or are bearer tokens now always required?
Also I have minor nitpick, for the time being that parsing is not supported, I would not set the default parameter parse
in filtered_stream()
to TRUE
.
Hi Andreas, there are several points I think are worth addressing in your post.
The latest development of rtweet is in the devel branch currently at 1.1.0.9003 version number. This is to make it easier for people to read the documentation close to what it is on CRAN and keep the devel branch "hidden" (it is documented in the contributing file). I mention this in case you want the latest features too and not only a working auth_setup_default()
function. Sorry about that!
Yes, it was cool as long as it worked until last November. The API no longer worked. I don't have the energy and time to support old APIs when it is clear that it won't be supported in the future, so rtweet had to switch to the new API v2 which has different requirements, works different and returns different output. See this post where I explain what I am working on, and some decision I'm facing.
The bearer token was the easiest and fastest way to implement support for the stream endpoints in API v2 with existing code. I am still struggling authenticating a user via an app with the OAuth2 as implemented by Twitter, so that users can use their own account to use the API. That feature will come, but according to this table, Twitter only allows using the bearer token for this endpoint. It might change in the future or it might not. But you will need to use the bearer token for now.
The documentation you quote is from an internal function. I don't see any mention to distinction of authentication mechanisms. In rtweet all arguments for authorization codes are named token
, and are used for bearer tokens or OAuth1.0 tokens (The one provided by auth_setup_default()
). The expert usage warning is because it could lead to surprise interactions where a user would set up a token but a different one could be sent to Twitter for pagination.
The default parse = TRUE
is to keep consistency with the other endpoints. At the time of release I hadn't figured out how to support parsing the output of the content which depends on user's arguments. rtweet should parse and transform the data to a nice data.frame in the future to still be the default.
Keeping TRUE as default will make it easier for me, developers and users to handle that transition: Once I implement it their code with parse = FALSE
will work still but new code won't need to handle the data directly once parse = TRUE
works without an error.
Doing the other way around would, go against the consistent behavior of rtweet and make it harder for me to later upgrade the package as I would need to notify developers and maintainers in advance that I was about to break (again) their code. But if you have arguments against this approach please let me know.
I hope this helps clarify the problems around the token and the streaming endpoints.
I went to see the issues at Twitmo:
- in the devel branch there is a function to make requests to Twitter's archive using the API v2. It might be ready/released next month (If I'm lucky).
- The issue with the open connections in the streaming function abuchmueller/Twitmo#13 is something I couldn't reproduce. Sometimes the streaming functions take too much time in my computer and I am not sure why. But if you have some feedback I'll try to fix it in rtweet.
- The problem with incorrect EOF could be rtweet fault's, I think I had fixed it here. But it might help if you pass
pagesize = 1
. As it is done inparse_stream
(which doesn't work for the new streaming files...)
Hi Lluis,
thanks for taking the time to write such a thorough response and also going through Twitmos issues.
Since Twitmo
heavily depends on rtweet
and I would like to bring it into a working state again, I'm also facing some decisions. So I am very interested which direction the development of rtweet
is taking (and also to avoid working on stuff that you might have already sorted out or will in the near future like parsing).
That feature will come, but according to this table, Twitter only allows using the bearer token for this endpoint. It might change in the future or it might not. But you will need to use the bearer token for now.
I'm still not sure I get this right. There is currently no way to give users API access on behalf of a Twitter account (like in rtweet v0.7) because Twitters V2 API doesn't allow this? I guess OAuth 1.0a User Context
is what I am looking for? This would be a bummer for me because it means that every user definitely will need a bearer token from now on. If this is true I also suspect given the current state of Twitter with it's shifting priorities that there won't happen much in that regard soon meaning I have little hope that OAuth 1.0a User Context for V2 streaming will come.
The documentation you quote is from an internal function
The documentation I quoted is fromfiltered_stream
, which is not internal, is it?
In the devel branch there is a function to make requests to Twitter's archive using the API v2. It might be ready/released next month (If I'm lucky).
This is great news for those with academic research access. Twitmo
however was conceptualised as a package towards less advanced users, that don't know their way around API's (yet), that abstracts all of this stuff away. Getting academic access is not trivial. If this cannot be true in the future anymore I'll have to reconsider if it makes sense to continue/pick up development again.
The issue with the open connections in the streaming function abuchmueller/Twitmo#13 is something I couldn't reproduce. Sometimes the streaming functions take too much time in my computer and I am not sure why. But if you have some feedback I'll try to fix it in rtweet.
Don't worry about that, this is more likely related to the jsonlite
package or my implementation. It's ugly and I should've just hidden the warning but harmless, I guess.
The problem with incorrect EOF could be rtweet fault's, I think I had fixed it here. But it might help if you pass pagesize = 1. As it is done in parse_stream (which doesn't work for the new streaming files...)
This was never a big problem, since I've used that regex you had in earlier rtweet versions to throw out bad lines. I think later on, bad lines weren't even written to the json file and thrown out upon streaming if I remember correctly.
What is currently an issue, is that parsing doesn't work because there seems to be changes in rtweets tweets_with_users
function. Did you change this function to be compatible with the V2 format? Then it would make sense that I get an out of bounds error, because the little example json I put in the package for demo purposes was streamed using v1.1.
library(Twitmo)
raw_path <- system.file("extdata", "tweets_20191027-141233.json", package = "Twitmo")
mytweets <- load_tweets(raw_path)
#> opening file input connection.
#> Found 167 records... Found 193 records... Imported 193 records. Simplifying...
#> closing file input connection.
#> Warning in tb$possibly_sensitive <- list(NA): Coercing LHS to a list
#> Error in x[["user"]]: subscript out of bounds
The default parse = TRUE is to keep consistency with the other endpoints. At the time of release I hadn't figured out how to support parsing the output of the content which depends on user's arguments. rtweet should parse and transform the data to a nice data.frame in the future to still be the default.
Keeping TRUE as default will make it easier for me, developers and users to handle that transition: Once I implement it their code with parse = FALSE will work still but new code won't need to handle the data directly once parse = TRUE works without an error.
Doing the other way around would, go against the consistent behavior of rtweet and make it harder for me to later upgrade the package as I would need to notify developers and maintainers in advance that I was about to break (again) their code.
The way CRAN handles downstream dependency management (by forcing downstream dependencies to work the the latest upstream package version, not allowing to lock to lower versions) code breakage of downstream dependencies is bound to happen all the time with R packages anyway. You can dribble around it but ultimately it's not your fault.
However getting greeted with an error when trying out a new package/function and only using default arguments throws new users kinda off and makes me not want to use the package/forget about it again. Ultimately, it's your decision.
I am very happy to talk with other maintainers depending on rtweet! I hoped that the update to v1.0.0 would spark some conversations: I want to improve how rtweet handles the API but at the same time not include too much to make redundant other packages (I must confess I have spent almost no time learning what they do).
- About the authentication: There are several ways to authenticate, for the API v2 at the streaming endpoints there is no way to give users API access on behalf of a Twitter account. Some endpoints of v2 still allow OAuth 1.0a User Context but I don't expect it to stick much longer (In the API v2 documentation it appears in smaller letters and they added OAuth2.0 support).
- About the documentation: My bad, I only found in that internal page and didn't realize it was reused for the new stream endpoints... which I shouldn't have! I'll change that. I get too used to the internals and I forget how it is for normal users.
- About the warning closing the connection. It might be a problem with
jsonlite
but as strange things happen with it I am not even 50% sure. - About the EOF. I got the same error a couple of times because somehow
httr2::req_stream
appends TRUE at the end of the file, but I thought I had handled this. - I didn't make big changes in
tweets_with_user
. I added a class and a few cosmetic changes to make it slightly faster (see the diff between versions), neither of these changes triggered any problem in rtweet tests (or its dependencies), but I might have missed something. This function is not compatible with the new output of API v2, so it can't be used for it. - About the examples and default code: I'll make the examples friendly for the moment (adding
parse = FALSE
). I need to give enough notice to package developers downstream when I submit to CRAN and I don't want to break user code more than needed. But yes, I need to work more on how I communicate with the users and keep in mind their experience.
Let me know if you have more feedback, I'm very happy to hear it.
I also try to update the default branch one week before submission to CRAN and announce it's imminent release in Twitter hopping to avoid this kind of problems. Next time I might write a blog post and wait longer.
My general idea of rtweet is to have a lean, consistent interface with Twitter API with sensible, rectangular data handling the authentications as easier as I can. In the future I will probably drop functions requiring dependencies not to the core of the package (ggplot2, igraph, magick, webshot, maps?) or that are more about processing the data, but I might incorporate some functions to create/explore network relationships through the API (ie, combining different rtweet functions, vectorizing some others, ...).
Well, all of this is moot, see the new announcement: To use the API all users should pay.
Well, all of this is moot, see the new announcement: To use the API all users should pay.
Well damn, that was what I meant with "shifting priorities" at Twitter ... EM bought a cashflow negative company at an inflated price, so the free lunch policy was bound to end.
Big blow to the research community, Twitter was a great place for researchers to sample text.
Maybe it's time to look into other social networks for sampling text ...
I'm about to release a version (check the master/NEWS file). I'm closing this question as I've changed the wording of the new endpoints.
Let me know if there is any trouble.