I am surprised that data collection for web agents is still hard. This is a hack done in one day to demonstrate a simple approach to data collection for web agents.
You can use the hosted version at bib.zhuhao.me or you can run your own frontend through
cd frontend
bun run dev
Run the backend server.
cd backend
# pip install uv # if you don't have uv installed.
uv run playwright install
uv run python main.py
Caution
Use http
only for local development purpose. Please use https://bib.zhuhao.me
or your self-hosted version together
with a securly hosted backend (using https
and wss
endpoints).
Navigate to http://bib.zhuhao.me and enter the URL you hosted the frontend on.
Put in the backend url (default is http://localhost:8000
).
When you input a URL, the backend will open the URL in a browser. The screenshot is updated at 33Hz, which can be tuned in the backend. Now it supports recording the following actions:
- Single click
- Hover
- Scroll
- Type
MIT License @ 2024 Hao Zhu
Cite as
@misc{bib,
author = {Hao Zhu},
title = {Browser in Browser data collection toolkit},
year = {2024},
howpublished = {\url{https://bib.zhuhao.me}}
}