Scraper for twitter, LinkedIn and Xing accounts.
Input data example is in example.csv file. All Requirements are listed in requirements.txt file, but also Gecko driver must be in PATH.
python -m social_media_scraper [-h] -m MODE [-i INPUT] [-o OUTPUT] [-lb LOWER_BOUND] [-ub UPPER_BOUND] [-g GECKODRIVER] [-d] [-s] [-int] [-tp TWITTER_PROFILE]
-h, --help show this help message and exit
-m MODE, --mode MODE Scrape user account, match identity by data or both (pass acc, id or full respectively)
-i INPUT, --input INPUT Input file location
-o OUTPUT, --output OUTPUT Output file location
-lb LOWER_BOUND, --lower_bound LOWER_BOUND Request frequency lower bound
-ub UPPER_BOUND, --upper_bound UPPER_BOUND Request frequency upper bound
-g GECKODRIVER, --geckodriver GECKODRIVER Set path for geckodriver
-d, --debugging Runs application in debug mode (will log debug logs into console)
-s, --sql log sql into console
-int, --interface Run app in account scraping mode with interface
-tp TWITTER_PROFILE, --twitter_profile TWITTER_PROFILE Firefox profile path for twitter (Should have GoodTwitter addon because of redesign)
python -m social_media_scraper -m full -i "./example-identification.csv" -o "./output.db" -lb 1 -ub 3 -d -s -tp "/path/to/firefox/profile/folder/" -g "/path/to/driver/geckodriver.exe"
Scraper for scraping google news and kununu pages of companies
Input data exapmple is in example-company.csv file. Before starting application make sure, that all required packages are installed with command:
pip install aiohttp lxml newspaper3k yarl
python -m company_scraper [-h] -l LANGUAGE -i INPUT -o OUTPUT
-h, --help Show help message and exit
-l LANGUAGE, --language LANGUAGE Set language for news articles ('en' or 'de')
-i INPUT, --input INPUT Set input file for processing
-o OUTPUT, --output OUTPUT Output file for scraping results
python -m company_scraper -l "de" -i "example-company.csv" -o "data.db"