The airline reviews dataset is collected from the website: https://www.airlinequality.com/review-pages/a-z-airline-reviews/
The dataset is created using the following steps:
- Scraping: The names of all airlines are scraped from the website mentioned above.
- URL Formation: The URLs for each airline's review page are constructed based on the website's structure.
- Review Data Scraping: Each airline's review page is scraped to collect information regarding customer reviews.
The data collection process involves the following technologies:
- Beautiful Soup: Used for parsing the website and extracting relevant information.
- Pandas: Utilized for data storage and conversion to CSV
- Requests: Employed for making HTTP requests to fetch web pages.
- Unicodedata: Used for handling and processing Unicode characters.
The refined dataset is available for analysis and exploration at: https://www.kaggle.com/datasets/juhibhojani/airline-reviews