Support for full-archive / academic research track endpoints
Opened this issue ยท 11 comments
Hello ๐ this library has been great to use, it's been incredibly useful with the research work I have been doing.
Currently, if my understanding is correct this only supports the "standard" search endpoints search/tweets.json
After doing some digging I've found that this endpoint won't offer the amount of data needed for the research I'm doing, thus I would need something like this endpoint for full-archive search which seems to only be offered under the v2 of the twitter API, /2/tweets/search/all
(which is only for the academic research product track, interestingly)
I'm not sure if that is something this project supports, in the case that it's not supported yet I'm assuming there's a fair amount of technical debt in order to make this library compatible?
Since most of my research work depends on this Scala library, I'd be happy to contribute if need be :)
Hi @TheConner,
I am glad this library has been useful!
Unfortunately, we do not support any v2 endpoint yet (simply due to lack of time) - but this shouldn't be too hard to do if you want to give it a try! PRs are always welcome -- ( and happy to advise/support you in the process).
The authentication seems to be the same, although it seems like you need a token that is approved for Academic Research (which I do not have, but I am going to assume you do)...so that is something that you shouldn't worry about.
My suggestion is to create a new trait for the Rest client (similar to any of the ones listed here) in which you define the shape of your endpoint and the params to pass to it.
Cheers,
Daniela
Hi @DanielaSfregola
Thanks for the helpful response! I've found some time to get started on this, luckily I only need to implement a few endpoints. I've done some work to build this, but I do have two questions:
- It looks like in order to make this work I need to add bearer token support (from what I read in
http.clients.authentication
, there does not seem to be bearer token support) which only requires one header (for curl users,-H "Authorization: Bearer $BEARER_TOKEN"
). I can add this in and do the plumbing needed to make that work; however, it looks like everything in here is built around oauth, so I'm unsure how I should go about integrating this with my changes. Any advice on doing this would be appreciated! - Once I'm done the plumbing work for (1), I can get to testing the endpoint I've implemented. Should I just add new tests for all of my changes? If I do implement tests they will be dependent on a bearer token that I have that was given to me by my university, and I don't think they would be too happy if I shared it (I notice some hard-coded keys in the unit tests). With this in mind, how should I go about testing this & integrating those tests with the rest of your codebase?
Thank you for the guidance! I'm new to the Scala way of doing things, so I may be missing out on some things that are just common sense to more seasoned Scala developers.
Regards,
Conner
Edit: I think I found some answers to my bearer authentication issues in issue #237 - going to investigate the changes there :)
- The issue #237 is definitively a good starting point. You can even create a new client in which you assume that in initialization the user will give you the bearer token to use.
- Testing must be there, but as you said, we cannot put real tokens in it. In the current library, we completely stub (simulate the behaviour) of the twitter API by parsing json files -- there are plenty of examples on how to do that. But we can discuss this after you have a working-ish implementation.
Thanks for helping with the library! Do not worry if you are new to Scala, I will help you with that ;)
Also, you do not need to have a PR ready to ask for help - if you get stuck just ping me a branch and I can advise/help :)
Added a bearer token client in my fork - moving on to integrating it with full archive search. I looked through the tests and I couldn't find any for http.clients.OAuthClient
nor http.clients.Client
, so I'm unsure how I should test this new bearertokenclient. Let me know if there's any changes I can make so it's up to snuff before a PR
A question: with this new BearerTokenClient, I think it can be used in RestClient; however, it looks like that main rest client based off of Client
which is a OAuthClient
... So I'm thinking I could decouple Client
from OAuthClient
, such that we can pass in some generic provider for auth (i.e, oauth, bearer token ,etc), which will allow RestClient to dynamically pick a provider that could be derived from configuration. But this will require a bit of plumbing, and there's some magic going on inside there that I'm not familiar with. So if you have any ideas as to how to do this, I'm all ears ๐
We do not have tests just for the OAuthClient
as far as I can see (but we do have tests for OAuth1Provider!) - but I think we could add them.
Decoupling Client
and OAuthClient
I think is the right thing to do. Ideally, we could have people use:
TwitterRestClient(consumerToken, accessToken) // using current OAuth
TwitterRestClient(bearerToken) // using OAuth for bearer token
TwitterRestClient() // check the env variables and pick a OAuth strategy accordingly -- with a preference to the current OAuth? Or maybe we fail if we have env variables for both? Not sure yet, we can figure it out
Update: there is a bug in my initial implementation, although figuring out why this is happening is a tad cryptic. Oddly enough all the tests pass, but when I use twitter4s within my application I get
Exception in thread "main" java.lang.AbstractMethodError: Receiver class com.danielasfregola.twitter4s.http.clients.rest.RestClient does not define or inherit an implementation of the resolved method 'abstract void de$heikoseeberger$akkahttpjson4s$Json4sSupport$_setter_$de$heikoseeberger$akkahttpjson4s$Json4sSupport$$jsonSourceStringMarshaller_$eq(akka.http.scaladsl.marshalling.Marshaller)' of interface de.heikoseeberger.akkahttpjson4s.Json4sSupport.
at de.heikoseeberger.akkahttpjson4s.Json4sSupport.$init$(Json4sSupport.scala:96)
at com.danielasfregola.twitter4s.http.clients.rest.RestClient.<init>(RestClient.scala:17)
at com.danielasfregola.twitter4s.TwitterRestClient.<init>(TwitterRestClient.scala:41)
at com.danielasfregola.twitter4s.TwitterRestClient$.apply(TwitterRestClient.scala:91)
at com.danielasfregola.twitter4s.TwitterRestClient$.apply(TwitterRestClient.scala:75)
at ca.advtech.ar2t.data.TweetIngest.<init>(TweetIngest.scala:28)
at ca.advtech.ar2t.main$.main(main.scala:69)
at ca.advtech.ar2t.TestRunMain$.main(TestRunMain.scala:5)
at ca.advtech.ar2t.TestRunMain.main(TestRunMain.scala)
Continuing to investigate...
Something to do with JSON support when initializing the RestClient.... (I did a quick look at the code - just looking, didn't check it out -- and didn't see anything obvious)
It was due to a dependency issue, my application uses a different scala version (Apache spark is behind a few versions) so I had to build twitter4s for a different scala version & some of the dependency changes I made on my end didn't work. Also, building twitter4s as a jar and importing that jar in meant that I had to manually include dependencies
Anywho, after that and some other dependency pains I managed to fix the issues in my implementation, so now I'm sucessfully using twitter4s using bearer-token auth with the full-archive search endpoint!
PR is made, let me know if there's any changes you would like me to make to it :)