This project is an Express.js application bundled with Puppeteer for web automation. It runs inside a Docker container and exposes an API with two endpoints for web scraping tasks: taking screenshots and extracting webpage titles.
- Screenshot Endpoint: Takes a screenshot of a given webpage URL and returns it as a PNG.
- Title Scraper Endpoint: Scrapes and returns the title of a given webpage.
- Built using Puppeteer, which automates Chromium for headless browser tasks.
- Packaged in a Docker container for easy deployment.
- Docker: Ensure Docker is installed on your system.
- Node.js: Optional, if running outside Docker.
Method: POST
Description: Takes a screenshot of a webpage.
Request Body:
{
"url": "https://example.com"
}
Response: PNG image of the webpage.
Method: POST
Description: Extracts the title of a webpage.
Request Body:
{
"url": "https://example.com"
}
Response: The title of the webpage as plain text.
git clone <repository-url>
cd puppeteer-express-api
-
Build the Docker Image:
docker build -t puppeteer-express-api .
-
Run the Container:
docker run -p 3000:3000 puppeteer-express-api
-
Access the API at
http://localhost:3000
.
-
Install Dependencies:
npm install
-
Start the Application:
node app.js
-
Access the API at
http://localhost:3000
.
Use curl
to send a POST request:
curl -X POST -H "Content-Type: application/json" -d '{"url": "https://example.com"}' http://localhost:3000/screenshot > screenshot.png
Use curl
to send a POST request:
curl -X POST -H "Content-Type: application/json" -d '{"url": "https://example.com"}' http://localhost:3000/scrapeTitle
- Security: Ensure the API is secured before exposing it publicly to prevent misuse.
- Resource Limits: Puppeteer can be resource-intensive. Allocate sufficient memory when deploying.
This project is licensed under the MIT License.