bwbaugh/wikipedia-extractor
This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.
Python
Issues
- 1
Running on Portuguese Wiki in Windows.
#26 opened by vabatista - 0
NOTE: This is a mirror repo.
#28 opened by bwbaugh - 1
Version 2.52 (March 6, 2016) Seems not working
#25 opened by wenxiao - 1
OSError 12: cannot allocate memory
#24 opened by selaphy - 1
- 1
Readme not works!
#21 opened by ChameleonRed - 1
Error on Farsi wikipedia dump: "NameError: global name 'templatePrefix' is not defined"
#20 opened by ehsanasgarian - 4
Error on English wikipedia dump
#18 opened by avostryakov - 2
Is it possible to keep the redirects?
#17 opened by hanito - 1
Anchor is undefined
#16 opened by jmhessel - 1
Invalid syntax error when running
#15 opened by Lolologist - 1
wiki extractor results directories end up in QN
#11 opened by sylvia1 - 5
run error
#8 opened by yinhangbupt - 4
- 1
WikiExtractor crash consistently during extraction in folder JK with no error on output or log
#5 opened by linearregression - 2
title with colon won't be extracted
#4 opened by benck - 2
Extraction bug
#1 opened by DSblizzard - 2
link to the developers github repo
#14 opened by Munzey - 2
keeping links and sections
#9 opened by andresmit - 1
Max Template Extraction exceeded!
#12 opened by HughP - 1
Syntax Error
#10 opened by mykinator - 1