Bachelor of Business Information Technology
Undergraduate-Project-2020/
├── backend
│ ├── analysis.py
│ ├── api
│ │ └── api.py
│ ├── crawler.py
│ ├── database.py
│ ├── db_check.py
│ ├── db_data
│ │ ├── analyzed_data.json
│ │ ├── products.db
│ │ └── sample_data.json
│ ├── driver
│ │ └── chromedriver
│ └── pycache
│ └── database.cpython-37.pyc
├── frontend
│ └── FrontEnd here
├── LICENSE
├── README.md
└── try-out
├── ...
id | product_name | demand |
---|---|---|
0 | product name | 5 |
1 | product name | 8 |
- id: Primary key
- product_name: Text
- demand: INTEGER
Although websites are increasingly becoming interactive and user-friendly, this has the reverse effect on web crawlers.
- Nowadays, modern websites use a lot of dynamic coding practices which are not at all crawler friendly. Some of the examples are lazy image loading, infinite scrolling, or elements loaded via AJAX calls, which makes it difficult to crawl even for Googlebot.
- Modern websites heavily rely on JavaScript to load dynamic elements.
You can detect if a web pages uses asynchronous loading or if it is a dynamic page by viewing the page source (if you right click on the page, you will find option View Page Source). If, upon searching the content you are looking for, you cannot find it then it is probable that Javascript renders the content.
- Modern websites are Javascript rendered pages which makes them difficult for web scrapers.
The Selenium WebDriver is one of the most popular tools for Web UI Automation. It allows for the automatic execution of the actions performed in a web browser window like navigating to a website, filling out forms (including dealing with text boxes, radio buttons, and drop-downs), submitting the forms, browsing through web pages, handling pop-ups, and so on.
Sometimes, fetching content from dynamic sites is actually straightforward, as they are highly dependent on API calls. In asynchronous loading, most of the time, data is loaded by making GET and POST requests; you can watch these API calls in the Network tab of Developer Tools.
https://rocqjones.pythonanywhere.com/api/products/all
https://rocqjones.pythonanywhere.com/api/products=fruits
https://rocqjones.pythonanywhere.com/api/products=cerials
https://rocqjones.pythonanywhere.com/api/products=vegetables