Turn any developer documentation into a specialized GPT.
DevDocs to LLM is a tool that allows you to crawl developer documentation, extract content, and process it into a format suitable for use with large language models (LLMs) like ChatGPT. This enables you to create specialized assistants tailored to specific documentation sets.
- Web crawling with customizable options
- Content extraction in Markdown format
- Rate limiting to respect server constraints
- Retry mechanism for failed scrapes
- Export options:
- Rentry.co for quick sharing
- Google Docs for larger documents
- Set up the Firecrawl environment
- Crawl a website and generate a sitemap
- Extract content from crawled pages
- Export the processed content
- Firecrawl API key
- Google Docs API credentials (optional, for Google Docs export)
This project is designed to run in a Jupyter notebook environment, particularly Google Colab. No local installation is required.
Before running the notebook, you'll need to set a few parameters:
sub_url
: The URL of the documentation you want to crawllimit
: Maximum number of pages to crawlscrape_option
: Choose to scrape all pages or a specific numbernum_pages
: Number of pages to scrape if not scraping allpages_per_minute
: Rate limiting parameterwait_time_between_chunks
: Delay between scraping chunksretry_attempts
: Number of retries for failed scrapes
Contributions are welcome! Please feel free to submit a Pull Request.
Copyright (c) 2024-present, Alex Fazio