GPT Crawler
Crawl a site to generate knowledge files to create your own custom GPT from one or multiple URLs
Example
Here is a custom GPT that I quickly made to help answer questions about how to use and integrate Builder.io by simply providing the URL to the Builder docs.
This project crawled the docs and generated the file that I uploaded as the basis for the custom GPT.
Try it out yourself by asking questions about how to integrate Builder.io into a site.
Note that you may need a paid ChatGPT plan to access this feature
Get started
Prerequisites
Be sure you have Node.js >= 16 installed
Clone the repo
git clone https://github.com/builderio/gpt-crawler
Install Dependencies
npm i
If you do not have Playwright installed:
npx playwright install
Configure the crawler
Open config.ts and edit the url
and selectors
properties to match your needs.
E.g. to crawl the Builder.io docs to make our custom GPT you can use:
export const config: Config = {
url: "https://www.builder.io/c/docs/developers",
match: "https://www.builder.io/c/docs/**",
selector: `.docs-builder-container`,
maxPagesToCrawl: 50,
outputFileName: "output.json",
};
See the top of the file for the type definition for what you can configure:
type Config = {
/** URL to start the crawl */
url: string;
/** Pattern to match against for links on a page to subsequently crawl */
match: string;
/** Selector to grab the inner text from */
selector: string;
/** Don't crawl more than this many pages */
maxPagesToCrawl: number;
/** File name for the finished data */
outputFileName: string;
/** Optional cookie to be set. E.g. for Cookie Consent */
cookie?: {name: string; value: string}
/** Optional function to run for each page found */
onVisitPage?: (options: {
page: Page;
pushData: (data: any) => Promise<void>;
}) => Promise<void>;
/** Optional timeout for waiting for a selector to appear */
waitForSelectorTimeout?: number;
};
Run your crawler
npm start
Upload your data to OpenAI
The crawl will generate a file called output.json
at the root of this project. Upload that to OpenAI to create your custom assistant or custom GPT.
Create a custom GPT
Use this option for UI access to your generated knowledge that you can easily share with others
Note: you may need a paid ChatGPT plan to create and use custom GPTs right now
- Go to https://chat.openai.com/
- Click your name in the bottom left corner
- Choose "My GPTs" in the menu
- Choose "Create a GPT"
- Choose "Configure"
- Under "Knowledge" choose "Upload a file" and upload the file you generated
Create a custom assistant
Use this option for API access to your generated knowledge that you can integrate into your product.
- Go to https://platform.openai.com/assistants
- Click "+ Create"
- Choose "upload" and upload the file you generated
(Alternate method) Running in a container with Docker
To obtain the output.json
with a containerized execution. Go into the containerapp
directory. Modify the config.ts
same as above, the output.json
file should be generated in the data folder. Note : the outputFileName
property in the config.ts
file in containerapp folder is configured to work with the container.
Contributing
Know how to make this project better? Send a PR!