MiloshBo/webscraping_modules

webscraping_modules

This repository contains examples of what can be done with different modules used for webscraping

webbrowser: https://docs.python.org/3.7/library/webbrowser.html#webbrowser.open

requests: http://docs.python-requests.org/en/master/ http://dev.mobify.com/blog/http-requests-are-hard/

BeautifulSoup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#

CSS selectors: https://www.w3schools.com/cssref/css_selectors.asp

Selenium with XPath: https://selenium-python.readthedocs.io/index.html https://www.red-gate.com/simple-talk/dotnet/.net-framework/xpath,-css,-dom-and-selenium-the-rosetta-stone/ https://www.w3schools.com/xml/xpath_intro.asp https://www.w3.org/TR/xpath/all/ http://www.zvon.org/comp/r/tut-XPath_1.html https://www.red-gate.com/simple-talk/dotnet/.net-framework/xpath,-css,-dom-and-selenium-the-rosetta-stone/ https://msdn.microsoft.com/en-us/enus/library/ms256471

lxml: https://lxml.de/tutorial.html http://stanford.edu/~mgorkove/cgi-bin/rpython_tutorials/webscraping_with_lxml.php

csv: https://docs.python.org/3/library/csv.html https://automatetheboringstuff.com/chapter14/ https://docs.python.org/3/library/io.html

pdf: https://automatetheboringstuff.com/chapter13/ https://pythonhosted.org/PyPDF2/index.html https://media.readthedocs.org/pdf/pdfminer-docs/latest/pdfminer-docs.pdf https://textract.readthedocs.io/en/latest/ https://stackoverflow.com/questions/34837707/how-to-extract-text-from-a-pdf-file

MS Word: https://automatetheboringstuff.com/chapter13/ https://python-docx.readthedocs.io/en/latest/

time: https://automatetheboringstuff.com/chapter15/ https://docs.python.org/3/library/profile.html

Razni Izvori

https://python-forum.io/Thread-Web-Scraping-part-1

https://python-forum.io/Thread-Web-scraping-part-2

books: Web Scraping with Python by Ryan Mitchell