/ChatGPT_domain_rating

Code and data for paper "Large language models can rate news outlet credibility"

Primary LanguagePythonMIT LicenseMIT

Introduction

This repo contains the code and data for the paper Large language models can rate news outlet credibility.

We use the following data in our study:

Data Location Note
Aggregate domain rating list from Lin et al. https://github.com/hauselin/domain-quality-ratings Please download the data in their repo.
MBFC ratings /data/mbfc_ratings.csv We collected the data and share it here.
NewsGuard ratings N/A The data is proprietary, please contact newsguardtech.com to license the data.
ChatGPT ratings /data/chatgpt_ratings.csv.gz We share the responses from ChatGPT here.
Tranco list https://tranco-list.eu Please download the data from their website.

We also share the script we used to query the ChatGPT API at /scripts/query_domain_credibility.py. You will need an OpenAI API key, which can be applied at https://platform.openai.com .

Citation

If you use our data or code in your research, please cite our work as follows:

@article{yang2023large,
  title={Large language models can rate news outlet credibility},
  author={Yang, Kai-Cheng and Menczer, Filippo},
  journal={Preprint arXiv:2304.00228},
  year={2023}
}