/SocialMediaScraper

Primary LanguagePythonApache License 2.0Apache-2.0

SocialMediaScraper

social_media_scraper

Scraper for twitter, LinkedIn and Xing accounts.

Input data example is in example.csv file. All Requirements are listed in requirements.txt file, but also Gecko driver must be in PATH.

Running application

python -m social_media_scraper [-h] -m MODE [-i INPUT] [-o OUTPUT] [-lb LOWER_BOUND] [-ub UPPER_BOUND] [-g GECKODRIVER] [-d] [-s] [-int] [-tp TWITTER_PROFILE]

Arguments

-h, --help show this help message and exit

-m MODE, --mode MODE Scrape user account, match identity by data or both (pass acc, id or full respectively)

-i INPUT, --input INPUT Input file location

-o OUTPUT, --output OUTPUT Output file location

-lb LOWER_BOUND, --lower_bound LOWER_BOUND Request frequency lower bound

-ub UPPER_BOUND, --upper_bound UPPER_BOUND Request frequency upper bound

-g GECKODRIVER, --geckodriver GECKODRIVER Set path for geckodriver

-d, --debugging Runs application in debug mode (will log debug logs into console)

-s, --sql log sql into console

-int, --interface Run app in account scraping mode with interface

-tp TWITTER_PROFILE, --twitter_profile TWITTER_PROFILE Firefox profile path for twitter (Should have GoodTwitter addon because of redesign)

Example command

python -m social_media_scraper -m full -i "./example-identification.csv" -o "./output.db" -lb 1 -ub 3 -d -s -tp "/path/to/firefox/profile/folder/" -g "/path/to/driver/geckodriver.exe"

company_scraper

Scraper for scraping google news and kununu pages of companies

Input data exapmple is in example-company.csv file. Before starting application make sure, that all required packages are installed with command:

pip install aiohttp lxml newspaper3k yarl

Running application

python -m company_scraper [-h] -l LANGUAGE -i INPUT -o OUTPUT

Arguments:

-h, --help Show help message and exit

-l LANGUAGE, --language LANGUAGE Set language for news articles ('en' or 'de')

-i INPUT, --input INPUT Set input file for processing

-o OUTPUT, --output OUTPUT Output file for scraping results

Example command:

python -m company_scraper -l "de" -i "example-company.csv" -o "data.db"