/p4n-scraper

Park4Night Scraper

Primary LanguageTypeScript

Park4Night Scraper

Note: This repository is intended only as a study project

Experimental scraper project to retrieve data from Park4Night built using Node.js, the Playwright API, and Supabase to store place detail page data from the new Park4Night website.

How

The enqueuePlaceList function initiates the getPlaceIdList function, which reads the range.json file and downloads the requested range of IDs from a Supabase table called places. This table contains all the available place IDs from Park4Night, retrieved from an old and now removed public endpoint, converted from JSON to SQL rows.

A file named queueList.json will be created, containing a list of IDs to be scanned. The extractData function will then be enqueued to process each ID. Once the dequeue process is completed, the program will execute the updateRange function to download the next set of IDs to be scanned.

To retrieve data such as contact information, you need to be logged in. You can set your PHPSESSID in the storageState file or use the login file to dynamically set the session. (Please note that the provided file is currently an example.)

Screenshot 2023-03-24 at 01 04 53

Installation

Download p4n-scaper repo and launch

  npm install

Environment Variables

To successfully run this project, please make sure to include the following environment variables in your .env file, see the env.example file

BASE_URL = https://www.park4night.com
BASE_PLACE_PAGE_URL = place
BASE_LOGIN_URL = auth/login
P4N_USERNAME
P4N_PASSWORD
SUPABASE_KEY
SUPABASE_URL
UPDATE_RANGE = 5000
CONCURRENT = 5

Disable javascript to scrape fast as hell
JAVASCRIPT = true

Enable only the scrape modules you need or add yours
no get images module are currently provided

GET_TITLE = true
GET_CONTACTS = true
GET_ADDRESS = true
GET_USEFUL_INFORMATION = true
GET_SERVICES = true
GET_ACTIVITIES = true
GET_LOWER_RATING_IDS = false

Usage

npm run start

Run Playwright tests

npm run test

Optional

To convert geojson data to Vector Tiles use Tippecanoe from MapBox.
Under folder json_to_geojson you will find an index.ts with a launchConversion function who can take a json file of places and trasform them to .geojson spatial data with the current proprierties:

title,
place_id,
code,