Hepsiburada Review Scraper

Hepsiburada review/comment and rating scraper. Turkish text dataset creator for data science and NLP projects. Nearly 30M reviews with category and product links can be crawled and used for text classification, sentiment analysis, text mining, NLP models etc. Supported by multithreading, written in Python.

Prerequisites

$ pip3 install -r requirements.txt

Installation

$ git clone https://github.com/0x01h/hepsiburada-review-scraper.git
$ cd hepsiburada-review-scraper
$ python3 hepsiburada.py

Usage

Program provides an human-friendly interactive shell for users.

Features

Shutdown computer after finishing: Optional choice for deep and long scrapings.
Threads: Try to give a proper number. (Recommended value is 64.)
Timeout: Giving a large number could result in long waiting times, small numbers could lead connection failures. (Recommended time range is 15-30 seconds.)
Pagination Depth: Maximum number of paginated review pages for each product.

You can track your progress via progress bars. categories.txt, products.txt, hepsiburada.txt will be saving to your current directory.

Do not scrape aggressively! Otherwise, you will be caught by captcha challenge!

For educational purposes only.

orhanabar/hepsiburada-review-scraper

Hepsiburada Review Scraper

Prerequisites

Installation

Usage

Features