Webscraping and beyond
In addition to webscraping using the Beautiful Soup package, Python enables analyses beyond scraping such as preprocessing of hidden characters, merging different data, summary statistics, and visualizations.
The Jupyter notebook Webscraping-script.ipynb can be found in the GitHub repository.
Requirements
- python 2.7 (3.5. may produce some errors)
- pandas
- BeautifulSoup
- requests
- csv
- re
- urllib2
- datetime
- os
- sys
- matplotlib
Blog
A link to the original blog: https://rrighart.github.io/Webscraping/
Website
Remote data science service for small and larger projects: https://www.rrighart.com
Any questions or remarks, reach out to me: rrighart@googlemail.com