bwbaugh/wikipedia-extractor

This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.

Python

Issues

Running on Portuguese Wiki in Windows.
#26 opened 8 years ago by vabatista
1
NOTE: This is a mirror repo.
#28 opened 8 years ago by bwbaugh
0
Version 2.52 (March 6, 2016) Seems not working
#25 opened 8 years ago by wenxiao
1
OSError 12: cannot allocate memory
#24 opened 9 years ago by selaphy
1
WikiExtractor.py - --no-templates > data.txt lead to errors.
#22 opened 9 years ago by ChameleonRed
1
Readme not works!
#21 opened 9 years ago by ChameleonRed
1
Error on Farsi wikipedia dump: "NameError: global name 'templatePrefix' is not defined"
#20 opened 9 years ago by ehsanasgarian
1
Error on English wikipedia dump
#18 opened 9 years ago by avostryakov
4
Is it possible to keep the redirects?
#17 opened 9 years ago by hanito
2
Anchor is undefined
#16 opened 9 years ago by jmhessel
1
Invalid syntax error when running
#15 opened 9 years ago by Lolologist
1
wiki extractor results directories end up in QN
#11 opened 9 years ago by sylvia1
1
run error
#8 opened 9 years ago by yinhangbupt
5
Version 2.8 dead loops on "Reached max template recursion"
#6 opened 9 years ago
4
WikiExtractor crash consistently during extraction in folder JK with no error on output or log
#5 opened 9 years ago by linearregression
1
title with colon won't be extracted
#4 opened 9 years ago by benck
2
Extraction bug
#1 opened 9 years ago by DSblizzard
2
link to the developers github repo
#14 opened 9 years ago by Munzey
2
keeping links and sections
#9 opened 9 years ago by andresmit
2
Max Template Extraction exceeded!
#12 opened 9 years ago by HughP
1
Syntax Error
#10 opened 9 years ago by mykinator
1
'maximum template recursion' error after a few hours
#7 opened 10 years ago by agoyaliitk
1