urlExpander
urlExpander is a Python package for quickly and thoroughly expanding shortened URLs.
About
urlExpander is inteded to be used by social media researchers who want to do analysis of links.
Analytics and ad-based services make such analysis difficult. Aside from collecting in-depth user engagement data, these services obfuscate the destination of the shortened URLs.
urlExpander was created to address this challenge in a scalable and robust manner. It does so by providing utility functions to convert Tweets into link datasets, filter for known for link-shortening services (like bit.ly), resolve shortened links, and parse the title and meta description from webpages.
This package differs from other approaches because it handles ad-based urls (like adf.ly, lnx.lu, linkbucks.com, and adfoc.us) thanks to the Unshortenit library, as well as resolves redirects to defunct websites (like blacktolive.com). Most importantly, urlExpander and offers multithreaded url expansion.
The multithreaded url expansion was created to overcome the bottleneck of mass link expansion through parallelization, minimizating http requests, caching results, and chunking the input into smaller pieces.
Installation
pip install urlexpander
Quickstart
import urlexpander
urlexpander.expand('https://trib.al/xXI5ruM')
returns
'https://www.breitbart.com/video/2017/12/31/lindsey-graham-trump-just-cant-tweet-iran/'
The function shines given a massive list of urls to unshorten:
resolved_links = urlexpander.expand(list_of_short_urls,
chunksize=1280,
n_workers=64,
cache_file='tmp.json')
Check out this Jupyter Notebook for a more in-depth quickstart!
More Examples
Links as Data
How to extract links from congressional Tweets, preprocess them, and use them as features to predict poltical affiliation. View the Notebook on GitHub | NbViewer | Slides| Binder
Documentation
We'll generate a readthedocs shortly!
Acknowledgements
urlExpander was written by Leon Yin with contributions by Megan Brown, Nicole Baram and Gregory Eady for the Social Media and Political Participation Lab at NYU.
Please cite urlExpander in your publications if it helps your research. Here is an example BibTeX entry:
@misc{leon_yin_2018_1345144,
author = {Leon Yin},
title = {SMAPPNYU/urlExpander: Initial release},
month = aug,
year = 2018,
doi = {10.5281/zenodo.1345144},
url = {https://doi.org/10.5281/zenodo.1345144}
}
Please also send us your work :)
Research Output
urlExpander is being used is several forthcoming publications from the SMaPP Lab (and perhaps from you?). We'll keep a running tally here.
Yin, Leon, Franziska Roscher, Richard Bonneau, Jonathan Nagler, and Joshua A. Tucker. 2018. “Your Friendly Neighborhood Troll: The Internet Research Agency’s Use of Local and Fake News in the 2016 US Presidential Campaign.” SMaPP Data Report. 2018:01