A microservice to fetch URLs and render them as:
- HTML: GET /html?url=
- Markdown: GET /markdown?url=
- PNG screenshot: GET /image?url=
npm install
# or
yarn installyarn dev- Start the development server with hot reloading using nodemonyarn start- Start the service using Docker Compose
yarn test- Run all unit testsyarn test:watch- Run tests in watch modeyarn test:e2e- Run end-to-end testsyarn test:e2e:docker- Run end-to-end tests against Docker containeryarn test:all- Run all tests including build and e2e tests
yarn build- Build and start the Docker container
yarn examples:python- Run Python example scriptsyarn examples:javascript- Run JavaScript example scriptsyarn examples- Run all examples (requires build)
yarn dev
curl http://localhost:3000/html?url=https://example.com# Build and run using Docker Compose
yarn start
# Or manually
docker build -t web-capture .
docker run -p 3000:3000 web-captureGET /html?url=<URL>&engine=<ENGINE>Returns the raw HTML content of the specified URL.
Parameters:
url(required): The URL to fetchengine(optional): Browser engine to use (puppeteerorplaywright). Default:puppeteer
Examples:
# Using default Puppeteer engine
curl http://localhost:3000/html?url=https://example.com
# Using Playwright engine
curl http://localhost:3000/html?url=https://example.com&engine=playwrightGET /markdown?url=<URL>Converts the HTML content of the specified URL to Markdown format.
GET /image?url=<URL>&engine=<ENGINE>Returns a PNG screenshot of the specified URL.
Parameters:
url(required): The URL to captureengine(optional): Browser engine to use (puppeteerorplaywright). Default:puppeteer
Examples:
# Using default Puppeteer engine
curl http://localhost:3000/image?url=https://example.com > screenshot.png
# Using Playwright engine
curl http://localhost:3000/image?url=https://example.com&engine=playwright > screenshot.pngThe service supports both Puppeteer and Playwright browser engines:
- Puppeteer: Default engine, mature and well-tested
- Playwright: Alternative engine with similar capabilities
You can choose the engine using the engine query parameter or by setting the BROWSER_ENGINE environment variable.
Supported engine values:
puppeteerorpptr- Use Puppeteerplaywrightorpw- Use Playwright
Environment Variable:
export BROWSER_ENGINE=playwrightThe service is built with:
- Express.js for the web server
- Puppeteer and Playwright for headless browser automation and screenshots
- Turndown for HTML to Markdown conversion
- Jest for testing
UNLICENSED