Primary LanguageJupyter Notebook

Headline Popularity

Headline Popularity Dataset

The dataset was created as part of our work in "Learning to Generate Popular Headlines" (article's link). Due to Twitter policy, we are not allowed to share the data, but the procedure to obtain the dataset is simple as follows through Twitter API. We use the ID of the tweet from the Clickbait Challenge 2017 dataset ( link) to crawl the information regarding the tweets using Twitter API. The information is shown in the following table. The post_text and target_paragraphs fields were derived from the clickbait challenge dataset but the rest of the fields were obtained from Twitter. The description of the fields is as following:

Field Description
ID a unique Tweet identification number
post_text a posted headline on Twitter (i.e., title)
target_paragraphs a news article (i.e., body)
favorite_count a number of likes
retweet_count a number of retweets
created_at a timestamp of a Tweet
users_followers_count a number of followers of a news media's Twitter account
user_name a news media's user name (i.e., news media)
user_description further information regarding news media
user_url URL of news media


The trained models for headline generaion on Newsroom dataset are as follows:

Model Link
Fine-tuned of Prophet-Net Hprophetnet-large
Fine-tuned of BART HBART
Fine-tuned of T5 Ht5-small


Code Description
Tuning_headline_Popularity_Model.ipynb This code is used to train a transformer encoder model for headline popularity prediciton task.
Evaluator.ipynb This code is used to generate 10 variations of headlines for each news articles, select the most popular ones, and then calculate the evaluation metrics (i.e., ROUGE, BLEU, and METEOR)
Training_generator_models.ipynb This code is used to train a transformer model on headline generation task

For more information, please read the following article:

author={Omidvar, Amin and An, Aijun},
journal={IEEE Access},
title={Learning to Generate Popular Headlines},