Issues
- 8
ValueError: cannot find context for 'fork' & cannot pickle '_io.TextIOWrapper' object
#329 opened by Harry1035 - 7
error with" global flags not at the start of the expression at position 4" help~~~
#336 opened by JoeyHuhuu - 1
Is this project abandoned?
#335 opened by johann-petrak - 0
ValueError: ('Tokenizer not found in the following libraries: transformers, tokenizers, autotiktokenizer, tiktoken', 'Please install one of these libraries to use the chunker.')
#339 opened by qq452655434 - 1
How to store a document in a separate txt file instead of a single txt file containing multiple documents
#328 opened by hxy-62 - 0
OSS-Fuzz Integration
#334 opened by ennamarie19 - 0
Get all revisions content
#332 opened by abrahami - 0
pypi not updated with latest version (3.0.7)
#330 opened by JordanHanley - 3
--json flag is unrecognised
#274 opened by odebroqueville - 1
Option to remove blank pages?
#303 opened by AngledLuffa - 0
Wikidata Extraction
#325 opened by vishwa27yvs - 3
wikiextractor 3.0.6 not extracting
#306 opened by wayneworkman - 0
Parsing seems to exclude some part of the page
#324 opened by franluca - 4
文章摘要抽取不到数据
#277 opened by chen-better-and-better - 0
does not extract all wiki
#323 opened by Aeon-Transformer - 0
- 0
[Request for Help] Should I support a template file like `templates.txt` followed the arg `--templates`?
#320 opened by jacklanda - 2
Template errors in article
#314 opened by etoilestar - 0
Add feature to extractPage to also dump the extracted page to json/csv/txt
#317 opened by BwandoWando - 2
ptwiki-latest error
#305 opened by iwmo - 4
Question ValueError: cannot find context for 'fork'
#287 opened by yaoysyao - 28
Is Windows 10 supported?
#312 opened by nissansz - 0
Is Windows supported
#311 opened by nissansz - 1
Warning: Template Errors
#310 opened by fzweclipse - 0
- 0
Why was --keep_tables removed?
#308 opened by micimize - 5
Warning: Template Errors
#288 opened by maulidaannisa - 1
Option to drop section titles/headers
#293 opened by Matthieu-Tinycoaching - 0
Issues on newer (2023) and older (2019) dumps
#304 opened by JohnTailor - 1
Question: Cirrus Extractor vs. "normal" Extractor - who creates cleaner texts?
#282 opened by PhilipMay - 0
How to extract lists pages?
#302 opened by katzurik - 1
Various tags such as q, br, ins, del are not fitered out
#300 opened by adno - 0
- 0
- 0
Tables are not entirely filtered out
#298 opened by adno - 0
KeyError in 'page.append(listItem[n] % line)'
#295 opened by audreycs - 2
fails on the first file
#292 opened by vsraptor - 10
about "raise BdbQuit" problem
#290 opened by zhenjia2017 - 1
KeyError for producing HTML output with `--html`
#280 opened by cyk1337 - 0
ModuleNotFoundError: No module named '__main__.extract'; '__main__' is not a package
#291 opened by KangChou - 0
how to get mention/anchor by wikiextractor.
#286 opened by lshowway - 0
TagRE Causes Loss of Large Portion of Page Text
#285 opened by lorr1 - 0
- 3
Unwanted pdb tracing
#283 opened by shangw-nvidia - 0
Codec encoding errors in OutputSplitter
#281 opened by cBog - 1
- 0
Found a possible security concern
#278 opened by zidingz - 0
templates are not extracted correctly
#275 opened by vrnmthr - 0
Status of release 3.0.4 / 3.0.5?
#273 opened by PA1212 - 5
cannot serialize/pickle '_io.TextIOWrapper' object
#271 opened by kwon0408