kili-technology/kili-blogger-program

[Article Proposal] Creating a dataset of tweets

fernandojvdasilva opened this issue · 42 comments

My resource

Twitter is a great resource to retrieve a vast amount of text to be used in datasets for many applications, such as sentiment analysis. In this article, we present the main APIs with some examples with code in Python, as well as restrictions and access policies.

Topic:

  • Search API (and example)
  • Streaming API (and example)
  • Account Activity API
  • Standard API restrictions

Outline:

My content is

  • A Kili Tutorial / Guide / How to article
  • [ x ] An Article

Hi @fernandojvdasilva
I think that this is a great topic and that it could help several people!
Can you provide us with a more detailed outline please ?

Hi @fernandojvdasilva I think that this is a great topic and that it could help several people! Can you provide us with a more detailed outline please ?

Hi @theodu!

Sure! Here are more details. Actually, I divided the previous "Standard API restrictions" section into the other sections and I added a section to talk about implementation issues related to the dataset collection.

  • Search API (and an example): If we want to create a dataset of tweets related to certain terms, we can use this API to search for them. In this section, we will describe how the search API can be used to retrieve tweets that mention certain terms and the API limitations and some tips to overcome them when collecting a dataset. We will also discuss other data that can be retrieved with this API, such as users, friends, followers, and trending topics.

  • Streaming API (and an example): If we only want to collect tweets posted from now on, we can use the streaming API to capture them. In this section, we will discuss how to monitor these tweets and capture them.

  • Account Activity API: We will briefly present this API and the possibilities to information such as follows, unfollows, deleted tweets, etc.

  • Collecting Dataset: In this section, we will discuss implementation issues when creating a dataset collection script. We will talk about creating a cron job for scheduling a script using the Search API and a background process to run the dataset collection script using the Streaming API. In the end, we will compare the computational resources needed in each case.

Hi @fernandojvdasilva
I really like this idea. It could be of great help for people doing NLP.
We currently have a lot of articles in the pipeline. I put this one on hold and I come back to you when we are ready to launch the writing of this article!

Hi @fernandojvdasilva!
I come back to you, the outline that you wrote is very interesting and it could help a lot of people to create a Twitter dataset.
Are you still up to writing such content?

Hi @theodu !

Sure, I can still write this article! Should I proceed to write the article straight away, or is there any other previous step?

Very nice !
You can begin to write it straight away!
Just make sure to read the writing guidelines in the FAQ before and follow them. You will find a template for your article in there too.

That's great @theodu! I'll take a look at the guidelines for sure and I'll start writing!

Hi @fernandojvdasilva, checking in! How is the writing going?

Hi @fernandojvdasilva no worries! Mid April is fine. Thanks :)

Hi @fernandojvdasilva ! Hope you're doing well. Have you been able to work on this article so far?

Hi @fernandojvdasilva, no worries! May 31st works :)

Hi @fernandojvdasilva! How is the writing going?
Happy to get some news

Hi @JustBrn! Yes, I've been working on the article and it's on track now!

Hi @fernandojvdasilva ! Hope you're doing good. How is the writing going? Let me know if you need anything.

Hi @JustBrn ! I'm so sorry for missing the deadline again, but I'm almost finish with the article, I just need to write the last section and review it. May I we rescheduled it for June/12, please?

Hi @fernandojvdasilva of course! Let us know when you are done and we can review it.

Hi @JustBrn ! Just finished the article for review and shared the Google Drive's link above.

Hi @fernandojvdasilva
I just reviewed your article. I found it very cool. Concise and easy to reproduce.
I validate it! @JustBrn
Our team will carry the writing on the website and we will warn you as soon as it is published.
You will then be able to send the invoice when you'll have the article's url once published.
Thanks for your work!

Best wishes
Théo

Hi @fernandojvdasilva

Thanks for this cool article!

Can you please provide us send some examples of tweets you've collected with your code?
We'd need it for illustration so that we can move on to the publication.

Best,
Marianne

Hi @mtakili ! Here are some sample tweets:

Tweets collected with the Stream API:

"Covid-19: NZ to stay in orange traffic light setting https://t.co/HTmYxmRxu9"
"RT @MissMia1988: Big thanks to all the people who took the Covid vaccine without a second
thought. Now the FDA is deciding we no longer nee…"
"Our new data shows that 70% of 10-year-olds in developing countries cannot read a basic text, worsened due to COVID-re…"

Tweets collected with Search API:
"Anthony Fauci says that he's experienced a rebound in Covid symptoms
after taking a course of Pfizer Inc.'s antiviral…"
"COVID isn’t over and no amount of mass gaslighting will change that"
"Even those who think they’ve experienced ‘Covid lite’ are in denial. So many I talk to say ‘It was…"

Thank you @fernandojvdasilva :) I'll come back to you once the article is published.

Have a nice one!

@theodu, here are the tweets. :)

Hey, hello @fernandojvdasilva. :)

Hope you've been doing good.

We're moving on to the publication process but got stuck for the visuals because we don't have enough tweets. Could you collect more of them with the Stream API and send them to us, please?

Best,
Marianne

Hello @fernandojvdasilva,

Thanks for these new tweets! Unfortunately, we need them to be complete. Can you send them without the ellipsis, please?

Best,
Marianne

Hey @fernandojvdasilva!

Status update: we'll publish your article Friday, next week. :)

I'll send you the link once it's live.

Thanks for your flexibility, cooperation, and work!

Cheers
Marianne

Hi @fernandojvdasilva!

Can you fill the thumbnail section of your article? I need at least a picture of you that you like so I use it for this article's promotion.

Talk soon,
Marianne

Hi @fernandojvdasilva,

Thanks for these updates.

We'll have to review the additional content you've created before posting it. As many of our teammates are off, it's likely to take a week or more. Sorry for this!

I'll let you know when we've determined a publishing date.

Best,
Marianne

Hi @fernandojvdasilva
Sorry for the time to give an answer and a review, a part of the team is currently in holidays.
Thank you to have taken the initiative to add this feature to show the whole tweet with your scrapper! It will indeed help people to use the code more easily!

Also, In the V2 version, in the doc here: https://docs.google.com/document/d/1nL3zfPAvv1lgq1ZYfGUdDPlkcgGssRur/edit, we allowed ourself to add a new part "Importing your dataset in Kili", to show how to import a tweet in our platform with a nice format. We have a feature to customize the format with which we import text assets in the platform and I tried to reproduce a tweet aspect. As it can be quite advanced to take this into hands, I spare you this and I wrote the code myself. Could you have a look at it and say if you accept to add this at the end of your article ?

Best
Théo

Hello @fernandojvdasilva!

Thanks again for this nice addition and your flexibility! We're very happy to announce to you it has been published!

Here's the link: https://kili-technology.com/blog/creating-a-dataset-of-tweets. You can now send your invoice as mentioned in the guidelines.

Also, can you send me your @ on Twitter and LinkedIn, so we're sure to mention you correctly when sharing this on social? Additionally, accordingly to the guidelines as well, please make sure to provide us with a photograph of you that you like –so we can include it in the visuals to be used.

Thanks for your help and congrats again. ;)

Hello @fernandojvdasilva!

Can you also email me (marianne.tavel@kili-technology.com) your bill? We've been experiencing some bugs and I'd like to make sure you get paid asap.

Best,
Marianne