Issues
- 1
[BUG] download malfunctioning
#132 opened by kargaranamir - 0
[BUG] corrupt deflate stream
#131 opened by kargaranamir - 3
- 0
- 4
- 2
- 1
Option to keep documents that can't be identified
#88 opened by Uinelj - 2
[Feature request] Document how to set fasttext model
#106 opened by chris-ha458 - 4
[BUG] Error when downloading full CC snapshot
#94 opened by ngan-nt - 1
[BUG] Deduplication with Ungoliant
#93 opened by Hammamwa47 - 0
Automatically add binaries on releases
#92 opened by Uinelj - 1
Automate release and deployment to crates.io
#60 opened by Uinelj - 0
Blocklists checklist
#79 opened by Uinelj - 1
Avoid creating Blocklist for each shard
#74 opened by Uinelj - 3
[BUG] Cannot install via cargo
#77 opened by new5558 - 0
Configuration file for `ungoliant pipeline`
#75 opened by Uinelj - 1
Bug in `MeanLength` filter
#70 opened by sadra-barikbin - 0
[BUG] No hard fail when blocklist path is invalid
#64 opened by Uinelj - 1
- 0
Handle dependabot vulnerabilities
#58 opened by Uinelj - 1
Cache dependencies
#66 opened by Uinelj - 0
Rename `master` branch to `main` and protect it
#59 opened by Uinelj - 1
- 0
[BUG] Chavacano marked as "cbr" rather than cbk
#53 opened by Uinelj - 0
Feature: Add retry option on downloader
#33 opened by Uinelj - 1
Revamp the error reporting
#43 opened by Uinelj - 3
[Feature request] Pipeline remove download file after process and extract single language
#56 opened by acul3 - 2
Feature `std_rng` depends on `rand_hc` which is not an optional dependency
#55 opened by DavidNemeskey - 2
- 3
- 3
- 5
[BUG] Pipeline command not working
#41 opened by kirianguiller - 2
- 1
Feature: Failures handling
#4 opened by Uinelj - 1
Publish on crates.io
#17 opened by Uinelj - 1
Fix Cargo.toml errors for crates.io publishing
#44 opened by Uinelj - 1
- 0
Feature: Header/Footer annotation
#25 opened by Uinelj - 0
- 0
Feature: Multilingual documents
#27 opened by Uinelj - 2
Improve operation order in pipeline
#34 opened by Uinelj - 0
- 0
Feature: Pipeline and Benchmarking
#3 opened by Uinelj