spider-rs/spider

A web crawler and scraper for Rust

RustMIT

Issues

Publish spider CLI binaries
#217 opened a month ago by alexkreidler
3
Panic with non ASCII string
#216 opened 3 months ago by ronanM
1
Help wanted: Reduce memory footprint
#204 opened 4 months ago by Falumpaset
5
It's colly not crolly
#214 opened 4 months ago by melroy89
2
Retrieve crawled markdown via API
#211 opened 4 months ago by culda
0
Broadcast never end when scraping with limit
#210 opened 4 months ago by DimitriTimoz
2
Memory leak caused by hashbrown
#207 opened 4 months ago by DimitriTimoz
8
support file:// urls
#197 opened 4 months ago by jmikedupont2
4
Memory leak
#208 opened 4 months ago by DimitriTimoz
0
Scrape with smart mode
#206 opened 4 months ago by DimitriTimoz
1
Retrieve response cookies
#202 opened 4 months ago by viktorholk
1
with_limit(1) does not work when "chrome" feature is enabled
#201 opened 4 months ago by viktorholk
2
Store referring links
#199 opened 5 months ago by LeoDog896
1
Running the example code results in an error
#198 opened 5 months ago by haijd
1
Command spider_cli: Short option names must be unique for each argument, but '-u' is in use by both 'url' and 'user_agent'
#195 opened 5 months ago by jmikedupont2
0
CLI: download files as they arrive?
#192 opened 5 months ago by gjtorikian
4
build.rs "wget" install in benches doesn't work on non-debian distros
#191 opened 6 months ago by soulwa
1
robots.txt files are not being respected correctly
#184 opened 7 months ago by div72
6
Can transform work properly?
#190 opened 6 months ago by ybsun0215
1
Budget not respected
#187 opened 7 months ago by CrazyDubya
1
Add DEPTH level next to each debug line [ENHANCEMENT]
#185 opened 7 months ago by Zabrane
3
Support COOKIE during the crawl [ENHANCEMENT]
#186 opened 7 months ago by Zabrane
4
Prebuilt binaries for Linux, macOS
#183 opened 7 months ago by Zabrane
11
Also extract urls that are pointing to other domains? [CLI]
#135 opened a year ago by sebs
20
Is it possible to extract broken links from the crawl?
#175 opened 9 months ago by metsis
6
Already crawled URL attempted as % encoded
#172 opened 9 months ago by apsaltis
3
Running with decentralized feature
#171 opened 9 months ago by zmedelis
1
Is it possible to dynamicall add links to crawl?
#170 opened 9 months ago by oiwn
7
Chrome flag chrome_intercept page hang.
#168 opened 10 months ago by j-mendez
1
Scraped html does not match the url - chrome [with_wait_for_idle_network]
#166 opened 10 months ago by esemeniuc
17
Some pages have 0 bytes from scraped page. After rerunning, different pages have 0 bytes
#165 opened 10 months ago by esemeniuc
11
Support ignoring SSL errors
#162 opened a year ago by superkelvint
4
Extracting all urls on a page
#160 opened a year ago by apsaltis
8
Scraping timeout Issue
#158 opened a year ago by virajk31
2
The result of get_html is garbled in case of Shift_JIS html
#144 opened a year ago by saito-kosuke
4
CLI - Not including the schema in -d parameter results in critical error
#150 opened a year ago by CirKu17
3
`with_on_link_find_callback` doesn't exist
#145 opened a year ago by SamuelMarks
2
Extract text from Html
#141 opened a year ago by MihirModi1421
1
only let me spider one url
#138 opened a year ago by sebs
4
cli parameters
#139 opened a year ago by sebs
1
cli tutorial store crawls result as json
#134 opened a year ago by sebs
2
Getting URL after redirect
#127 opened a year ago by joksas
4
error[E0061]: this function takes 2 arguments but 1 argument was supplied
#136 opened a year ago by roniemartinez
1
Add the ability to download not only html, but also all site assets: css, js, imgs, etc
#132 opened a year ago by namen3645
1
full-resource feature seems to be missing Javascript
#130 opened a year ago by Byter09
6
Blacklist regex for CLI does not seem to work
#129 opened a year ago by Byter09
2
Change API to builder pattern
#115 opened a year ago by roniemartinez
6
CPU usage is exceptionally high, reaching up to 1450%
#122 opened 2 years ago by mhmtbsbyndr
5
[Bug] Follows external website on redirect (302, 301, 3XX)
#119 opened 2 years ago by roniemartinez
3
[Bug] Trailing slash breaking with websites that don't allow it
#118 opened 2 years ago by roniemartinez
11