This repository contains the postprint, the dataset, the code and interactive news story summaries used in the "SocialTree: Socially Augmented Structured Summaries of News Stories" paper by Gevorg Poghosyan and Georgiana Ifrim presented at the 30th ACM Conference on Hypertext & Social Media (HT '19).

The authors' version of the work is available locally in this repository. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of 30th ACM Conference on Hypertext & Social Media (HT '19), https://doi.org/10.1145/3342220.3343668.

Interactive examples

The rendered examples are available also online at https://gevra.github.io/socialtree/.

The interactive_summaries directory contains 6 example story summaries extracted from a multi-source dataset of news. These examples were used for the user study described in the paper.
The interactive_summaries_WaPo directory contains 6 example story summaries from The Washington Post news article collection.
Each .html file in interactive_summaries directory (along with the identically named directory holding the supporting files) presents summaries produced for the same story using different methods.

Please make sure your browser doesn't block any javascript running. Some adblockers may block it.

Dataset(s)

The data folder contains the dataset used in the paper. The a coma-separated file contains 290657 articles in the period from 15.07.2015 to 24.05.2017.
In addition to the full dataset, data directory contains the retrieved articles for each of the queries used in the user study.
Similarly, data_WaPo folder contains 184759 tagged articles of The Washington Post dataset.

These datasets were produced with Hashtagger+ tool described in this paper and available at https://github.com/gevra/hashtagger_plus_offline.
A collection of 198 million high quality news-related hashtagged tweets used for creating the tagged article datasets is available at https://doi.org/10.6084/m9.figshare.7932422 for 15.07.2015-24.05.2017 period.

To avoid copyright infingement, we share only the article URL, the tag profile and the query relevance score.
The code for article crawling and processing is included in the main package.

Code

The current implementation of SocialTree extraction requires Python 3.6 or later. You can install required packages (we recommend using a virtual environment) running pip install -r requirements.txt.
Open socialtree.ipynb Jupyter notebook and generate summaries from scratch.

This code is using an implementation of the Eclat algorithm by Christian Borgelt.
Please, download it here and save it in the code directory where socialtree.ipynb is located.
Make sure to change the permissions to make eclat executable!

The code comparing to the state-of-the-art methods requires full article text.
In the next release of the code we'll provide the code for this comparison and also a code for crawling articles using the provided URLs.