Extract social media links from websites.
Many websites reference their facebook, twitter, linkedin, youtube accounts and these can be invaluable to gather 360 degree information about a company.
This library allows to extract links or handles for the most commonly used international social media networks.
- Free software: MIT license
- Python versions: 2.7, 3.4+
- Extract social media links/handles from html content
- Attempts to extract links/handles also from widgets, scripts, etc.
- Supports most widely used social networks
- youtube
- github
- google plus
- snapchat
- flickr
- periscope
- telegram
- soundcloud
- feedburner
- vimeo
- slideshare
- vkontakte
import requests
from html_to_etree import parse_html_bytes
res = requests.get('https://techcrunch.com/contact/')
tree = parse_html_bytes(res.content, res.headers.get('content-type'))
set(find_links_tree(tree))
{'http://pinterest.com/techcrunch/',
'http://www.youtube.com/user/techcrunch',
'http://www.linkedin.com/company/techcrunch',
'https://www.facebook.com/techcrunch',
'https://flipboard.com/@techcrunch',
'http://instagram.com/techcrunch',
'https://plus.google.com/+TechCrunch',
'https://instagram.com/techcrunch',
'https://twitter.com/techcrunch'}
- currently finds all social media links on a page
- need to look into finding most relevant links based on link location, link context, company name, etc.
This package was created with Cookiecutter and the fluquid/cookiecutter-pypackage project template.