This scraper is straightforward and specifically designed to extract job listings from the Rozee.pk website. Built in Node.js using the Puppeteer and Cheerio libraries, it efficiently collects data across multiple pages and saves it to a structured CSV file. Simply replace the job link with a city-specific URL after applying filters and change the file name to suit your needs.Additionally, sample dataset files for several cities are included in the repository. These dataset files are as follows:
- Bahawalpur (
Bahawalpur_jobs.csv
) - Faisalabad (
Faisalabad_jobs.csv
) - Gujranwala (
Gujranwala_jobs.csv
) - Islamabad (
Islamabad_jobs.csv
) - Karachi (
Karachi_jobs.csv
) - Lahore (
Lahore_jobs.csv
) - Multan (
Multan_jobs.csv
) - Rawalpindi (
Rawalpindi_jobs.csv
)
- Scrapes job data across multiple pages with city-specific filters.
- Collects detailed information such as job titles, company names, skills, and more.
- Supports concurrent tab handling for faster performance.
- Saves output to a CSV file with customizable file names.
- Node.js (>=14.0.0)
- npm (Node Package Manager)
-
Clone this repository:
git clone https://github.com/NomanSiddiqui0000/Rozee.pk-jobs-Scrapper.git
-
Navigate to the project directory:
cd Rozee.pk-jobs-Scrapper
-
Install the required npm packages:
npm install
-
Open the
rozee.pk_Jobs_Scrapper.js
file and replace the URL in themainPage.goto()
function with the filtered job link for your city. -
Change the output file name in the
csvWriter
path if needed. -
Run the script:
node rozee.pk_Jobs_Scrapper.js
-
The data will be saved in the specified CSV file.
- City-Specific Filtering: Replace the default job link with a city-specific filtered link from Rozee.pk.
- CSV File Path: Update the file name in the
csvWriter
configuration to save the output as you prefer.
- Ensure your internet connection is stable for smooth scraping.
- This script is tailored for Rozee.pk's current structure. Any structural changes on the site may require updating the script.
This project is licensed under the MIT License.
Muhammad Noman GitHub Profile