/urlExpander

🌬️urlExpander is a Python package for expanding shortened links (urls).

Primary LanguagePythonMIT LicenseMIT

urlExpander

PyPI PyPI DOI

urlExpander is a Python package for quickly and thoroughly expanding shortened URLs.

About

urlExpander is inteded to be used by social media researchers who want to do analysis of links.

Analytics and ad-based services make such analysis difficult. Aside from collecting in-depth user engagement data, these services obfuscate the destination of the shortened URLs.

urlExpander was created to address this challenge in a scalable and robust manner. It does so by providing utility functions to convert Tweets into link datasets, filter for known for link-shortening services (like bit.ly), resolve shortened links, and parse the title and meta description from webpages.

This package differs from other approaches because it handles ad-based urls (like adf.ly, lnx.lu, linkbucks.com, and adfoc.us) thanks to the Unshortenit library, as well as resolves redirects to defunct websites (like blacktolive.com). Most importantly, urlExpander and offers multithreaded url expansion.

The multithreaded url expansion was created to overcome the bottleneck of mass link expansion through parallelization, minimizating http requests, caching results, and chunking the input into smaller pieces.

Installation

pip install urlexpander

Quickstart

import urlexpander
urlexpander.expand('https://trib.al/xXI5ruM')

returns

'https://www.breitbart.com/video/2017/12/31/lindsey-graham-trump-just-cant-tweet-iran/'

The function shines given a massive list of urls to unshorten:

resolved_links = urlexpander.expand(list_of_short_urls, 
                                    chunksize=1280, 
                                    n_workers=64,
                                    cache_file='tmp.json')

Check out this Jupyter Notebook for a more in-depth quickstart!

More Examples

Links as Data
How to extract links from congressional Tweets, preprocess them, and use them as features to predict poltical affiliation. View the Notebook on GitHub | NbViewer | Slides| Binder

Documentation

We'll generate a readthedocs shortly!

Acknowledgements

urlExpander was written by Leon Yin with contributions by Megan Brown, Nicole Baram and Gregory Eady for the Social Media and Political Participation Lab at NYU.

Please cite urlExpander in your publications if it helps your research. Here is an example BibTeX entry:

@misc{leon_yin_2018_1345144,
  author       = {Leon Yin},
  title        = {SMAPPNYU/urlExpander: Initial release},
  month        = aug,
  year         = 2018,
  doi          = {10.5281/zenodo.1345144},
  url          = {https://doi.org/10.5281/zenodo.1345144}
}

Please also send us your work :)

Research Output

urlExpander is being used is several forthcoming publications from the SMaPP Lab (and perhaps from you?). We'll keep a running tally here.

Yin, Leon, Franziska Roscher, Richard Bonneau, Jonathan Nagler, and Joshua A. Tucker. 2018. “Your Friendly Neighborhood Troll: The Internet Research Agency’s Use of Local and Fake News in the 2016 US Presidential Campaign.SMaPP Data Report. 2018:01