Pinned Repositories
java-logonce
A thread safe slf4j logger that only logs on the first message it sees. Backed by a Bloomer filter or Set.
pagescrape
Javascript node.js module to aid web page scrapping
image-search
Automated-AI-Web-Researcher-Ollama
A python program that turns an LLM, running on Ollama, into an automated researcher, which will with a single query determine focus areas to investigate, do websearches and scrape content from various relevant websites and do research for you all on its own! And more, not limited to but including saving the findings for you!
MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
committer-solr
Solr implementation of Norconex Committer. Should also work with any Solr-based products, such as LucidWorks.
colly
Elegant Scraper and Crawler Framework for Golang
rod
A Chrome DevTools Protocol driver for web automation and scraping.
spider
A web crawler and scraper for Rust
smr-co-uk's Repositories
smr-co-uk/crawlers
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
smr-co-uk/image-search
smr-co-uk/spider
A web crawler and scraper for Rust
smr-co-uk/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
smr-co-uk/rod
A Chrome DevTools Protocol driver for web automation and scraping.
smr-co-uk/Automated-AI-Web-Researcher-Ollama
A python program that turns an LLM, running on Ollama, into an automated researcher, which will with a single query determine focus areas to investigate, do websearches and scrape content from various relevant websites and do research for you all on its own! And more, not limited to but including saving the findings for you!
smr-co-uk/colly
Elegant Scraper and Crawler Framework for Golang
smr-co-uk/java-logonce
A thread safe slf4j logger that only logs on the first message it sees. Backed by a Bloomer filter or Set.
smr-co-uk/committer-solr
Solr implementation of Norconex Committer. Should also work with any Solr-based products, such as LucidWorks.
smr-co-uk/pagescrape
Javascript node.js module to aid web page scrapping