/BirdWatch

Primary LanguageJupyter Notebook

Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?

(Paper) (Demo) (Slides)

Comparing BirdWatch against expert fact-checkers and computational methods.

BirdWatch is a crowdsourcing-based approach for veryfying misleading claims on Twitter. We analyze the initiative through the lens of the three main components of a factchecking pipeline: claim detection based on check-worthiness, evidence retrieval, and claim verification, while comparing with expert fact-checkers and computational methods. This repo contains the required code for running the experiments of the associated paper.

Installation

0. Clone repo

git clone https://github.com/MhmdSaiid/BirdWatch
cd BirdWatch

1. Install virtual environment

virtualenv birdwatch -p $(which python3)
source birdwatch/bin/activate

2. Install required repositories

pip install -r requirements.txt

3. Add virtualenv to jupyter kernel

pip install --user ipykernel
python -m ipykernel install --user --name=birdwatch

4. Download ClaimReview Data

wget -O data/UpdatedClaimReview.zip https://nextcloud.eurecom.fr/s/8fJkTEQH9QeaaxQ/download/UpdatedClaimReview.zip
unzip  data/UpdatedClaimReview.zip -d data/
rm -rf data/UpdatedClaimReview.zip

5. Download BerTopic Model (Optional)

wget -O BW_BertTopic.zip https://nextcloud.eurecom.fr/s/kqXfGg49iTEnJKR/download/BW_BertTopic.zip
unzip  BW_BertTopic.zip -d model/
rm -rf BW_BertTopic.zip

DataSet

The dataset BW_CR.csv contains the matched Birdwatch tweets with the ClaimReview fact-checks. BirdWatch dataset was obtained on 18th September 2021 05:02 PM (GMT+2) and the tweets were hydrated on 25th September 2021 09:26 AM (GMT+2). The columns of the dataset are:

  • tweetId: ID of the Tweet
  • Tweet: Tweet text
  • noteId: ID of BirdWatch note
  • summary: Written note by the BirdWatch user
  • classification: Label given by the BirdWatch user
  • CR Fact: Macthed ClaimReview fact-check
  • credibility: Label given by the expert
  • full_text: Full Tweet text containing characters (such as non-unicode) that were removed in the column Tweet

Notebooks

Analysis

Matching

Claim Selection

Evidence Retreival

Claim Verifiication

Contact Us

For any inquiries, feel free to contact us, or raise an issue on Github.

Reference

You can cite our work:

@inproceedings{saeed-etal-2022-birdwatch,
    title = "Crowdsourced Fact-Checking at Twitter: How Does the Crowd Compare With Experts?",
    author = "Saeed, Mohammed  and
      Traub, Nicolas  and
      Nicola, Maelle  and
      Demartini, Gianluca  and
      Papotti, Paolo",
    booktitle = "31st ACM International Conference on Information and Knowledge Management",
    month = oct,
    year = "2022",
    address = "Online and Atlanta, Georgia, USA",
    url = "https://arxiv.org/abs/2208.09214",
}

License

MIT