Web Scraping Project with Node.js and Playwright

Table of Contents

  1. Introduction
  2. Features
  3. Requirements
  4. Installation
  5. Usage
  6. Contributing
  7. License

Introduction

This project scrapes content from a Shopee merchant page and individual product pages. The scraped data is then posted to our server via an API.

Features

  • Automates login to Shopee
  • Handles captcha via a Python script
  • Scrapes data from a Shopee merchant page
  • Scrapes details from individual product pages
  • Updates scraped data to the server using an API

Requirements

  • Node.js
  • Playwright
  • Python (for captcha handling)
  • account to scrap

Installation

  1. Clone this repository:

    git clone https://github.com/nsanzimfura-eric/web-scraping.git
  2. Navigate into the project directory:

    cd web-scraping
  3. Install dependencies:

    npm install
  4. Add your .env variables:

    cp .env.sample .env
  5. Update .env with your API endpoints and Shopee account details.

Usage

  1. To start the scraper:
    npm start

Contributing

I, Nsanzimfura Eric contributed to this web-scraping app, and an author.

License

This project is licensed under the MIT License - see the LICENSE.md file for details.