This is a Python-based web scraper that extracts information about Marriott Hotels. The scraper is built using Scrapy, an open-source and collaborative web crawling framework written in Python.
The following software is required to run the scraper:
- Python 3.10
- Scrapy
-
Clone the repository:
git clone https://github.com/rsumit123/mariott-scraping-assignment
-
Go inside the repo
cd mariott-scraping-assignment
-
Open a pipenv shell
pipenv shell
-
Install the required packages:
pienv install
-
Navigate to the project directory:
cd scraping-assignment
-
Run the scraper:
python hotel_scraper.py
The scraper will start running and will extract information about hotel urls listed in the code, the csv output of all the mentioned urls can be found inside the
output
folder.
The extracted data will be saved in a CSV file named output/hotel_code_data.csv
. The CSV file will contain the following columns:
checkin
: The checkin timePerNight
: per night time without taxescheckout
: Checkout timeroomname
: Roomnamecheckout
: Checkout timeratename
: Name of the rate ex: FlexibleStayTotalwTaxes
: price including taxescurrency
: currencyavailability
: Availability of the room (NA if not present)cancelpolicy
: cancellation policy if mentioned else NApaymentpolicy
: payment policy if mentioned else NA
This project is licensed under the MIT License. See the LICENSE file for more information.