## Difference between WebCrawlers and WebScrapers

Feature	Web Scraper	Web Crawler
Purpose	Extracts specific data from websites	Navigates and indexes web content
Functionality	Focuses on a specific set of data	Explores the web broadly
Use Case	Data harvesting, analysis	Search engine indexing, SEO analysis
Data Handling	Extracts and processes targeted data	Collects data from many sources
Complexity	Can be complex depending on the data	Generally simpler in design
Speed	Varies based on the data complexity	Usually faster at covering more ground
Customization	Highly customizable for data needs	Less need for customization

Open Source Web Scrapers

Puppeteer
Cheerio
brightData

Problems

Website block you by doing IP blocking and rate limiting, if sent too many requests
Dynamic content
Traditional web scralers are not able to handle dynamic content
Complex navigation is not always possible
IP rotation is not possible
Captcha
Human intrepretation while scraping

Steps to develop this web scrapper

Develop the UI
Create actions /lib/actions

export async function scrapeAndStoreProduct(productUrl: string) {
  if (!productUrl) return;

  try {
    const scrapedProduct = await scrapeAmazonProduct(productUrl);
  } catch (error: any) {
    console.log(error);
  }
}

Install packages axios npm i axios and cheerio npm i cheerio
Make scraper function /lib/scraper

"use server";

import axios from "axios";
import * as cheerio from "cheerio";
export async function scrapeAmazonProduct(url: string) {
if (!url) return;



const username = String(process.env.BRIGHT_DATA_USERNAME);
const password = String(process.env.BRIGHT_DATA_PASSWORD);
const port = 22225;
const session_id = (100000 * Math.random()) | 0;
const options = {
auth: {
username: `${username}-session-${session_id}`,
password,
},
host: "brd.superproxy.io",
port,
rejectUnauthorized: false,
};
try {
        const response = await axios.get(url, options);
        console.log(response.data);
    } catch (error: any) {
     console.log(error);
    }
}

Now after copying a link of the amazon product to the search bar, the scrapped html should display on the console.

Setting up cheerio for parsing the scrapped html content

CodeMaster17/spider-sense

Open Source Web Scrapers

Problems

Steps to develop this web scrapper