playwright-go-server

Background

Often, search engines only return a webpage's URL along with some snippets. However, sometimes it is necessary to retrieve the complete webpage content. To address this, the playwright-go-server project was developed. It leverages browser automation technology to fetch the full HTML content of a webpage and supports converting it to Markdown format, which is more convenient for subsequent processing by large language models.

Features

Webpage Content Fetching: Uses a browser pool (based on Playwright) to fetch the full HTML content of a given URL.
Markdown Conversion: Converts the fetched HTML content into Markdown format for easier text processing and inference by large models.
Efficient and Stable: Implements lazy initialization of a global session pool to reuse browser instances, ensuring fast and efficient response.

Installation & Dependencies

Clone the repository:

git clone https://github.com/litongjava/playwright-go-server.git
cd playwright-go-server

Install Go dependencies:
```
go mod tidy
```
Install the HTML-to-Markdown conversion library:
```
go build
```

docker

docker build -t litongjava/playwright-go-server:1.0.0 .
docker run -dit --name playwright-go-server --net=host litongjava/playwright-go-server:1.0.0

Usage

The project provides an HTTP service with an endpoint to fetch webpage content and convert it based on the provided format.

Endpoint: /fetch
Query Parameters:
- url: The URL of the webpage to fetch (required)
- format: The format of the returned content (optional; when set to markdown, returns content in Markdown format; otherwise returns the raw HTML)

Example

Fetching Markdown formatted content:

GET /fetch?url=https://example.com&format=markdown

curl "http://localhost/fetch?url=https://www.kapiolani.hawaii.edu/&format=markdown"

Running the Server

Start the service using the following command:

go run main.go

Once the server is running, you can make HTTP requests to the endpoint.

Contributing

Contributions are welcome! Please feel free to open issues or submit pull requests to improve the project.

License

This project is licensed under the MIT License.