This project is a simple data extraction and automation tool using JavaScript, Puppeteer, and Cheerio. The goal of the tool is to scrape product information from an ecommerce website, including the product title, price, and image URL.
- Scrape product information (title, price, and image URL) from an ecommerce website
- Easy customization to adapt to different websites
- Async/await syntax for better readability and error handling
- Proxy support to bypass IP-based blocking and improve anonymity
- Parallel scraping for improved performance and speed
- Node.js 12.x or higher
- Puppeteer
- Cheerio
- Clone the repository or download the
data_extraction_tool.js
file. - Install the required dependencies using npm:
npm install puppeteer cheerio
- Update the
urls
array in thedata_extraction_tool.js
file with the desired website URLs. - Modify the CSS selectors within the
scrapeProductData
function to match the structure of the target website's product elements. - If you want to use a proxy, update the
proxy
variable with your proxy server's address and port. If you don't want to use a proxy, set theproxy
variable tonull
. - Run the script using the following command:
node data_extraction_tool.js
The script will output the scraped product data to the console. You can modify the script to save the data to a file or perform other actions as needed.
To adapt the script to other websites, you may need to modify the following parts:
- Update the
urls
array with the desired website URLs. - Update the CSS selectors within the
scrapeProductData
function to match the structure of the target website's product elements. - Add or remove data fields as needed, adjusting both the scraping logic and the output data structure.
This project is licensed under the MIT License. See the LICENSE file for details.