/MSBD5014A

The solution of MSBD5014A Independent Project.

Primary LanguagePython

README.md

This code is the solution of MSBD5014 Independent Project.

In this project, I implemented two network crawlers which could obtain the whole hotel list and corresponding comments from customers.

The first crawler is named as "Hotel", you can run it by executing the following command in terminal.

/(YourPath)/HotelCrawler: scrapy crawl Hotel -o hotel_list.json   

The second crawler is named as "hotel_comments", you can run it by excuting the following command in terminal.

/(YourPath)/SingleHotel: scrapy crawl hotel_comments -o comments_result.json

Environment:

  • Python 3.7.4
  • Scrapy 1.7.3
  • Pandas 0.25.2

Repository Structure:

  • HotelCrawler: Get the list of the hotels;
    • HotelCrawler/HotelCrawler/spiders/hotel.py: Code of Hotel crawler;
    • HotelCrawler/HotelCrawler/hotel_list.json: The result of the crawler;
  • SingleHotel: Get the reviews of each hotel;
    • SingleHotel/SingleHotel/spiders/hotel_spider.py: Code of hotel_comments crawler;
    • SingleHotel/comments_result.json: The result of the crawler;