bwbaugh/wikipedia-extractor
This is a mirror of the script by Giuseppe Attardi, and contains history before the official repo started: https://github.com/attardi/wikiextractor --- Extracts and cleans text from Wikipedia database dump and stores output in a number of files of similar size in a given directory.
Python
Stargazers
- 0x333333TikTok
- Akibalogh@DLC-link
- alijmlzdVVVERB
- andbberger
- andrecunhaCampinas, SP, Brazil
- antranappWorld
- attardiUniversitĂ di Pisa
- baiskBeijing, China
- bookcold
- corcra@microsoft
- derek-schultzBraze
- duanguoxue
- faradayAnkara, Turkey
- hohyon-ryu@oxopolitics
- inactivistCalifornia
- jcla1
- jpn-kjhong
- JT5DThe IMC Lab + Gallery
- kbaikovCheckmk GmbH
- kimduhoLos Angeles, California, US
- Marvin182
- NullExceptiontest_11team
- owoNew York University Abu Dhabi
- psmitInscripta.io
- rabintangTencent, Shenzhen
- remusaoMunich
- sanchaCA
- SeanTaterSpace and Time (@spaceandtimelabs)
- simongogeBay Inc
- sweinberg
- treperShanghai
- underspecifiedHonda Research Institute Japan
- whuwy
- willwest@Instagram
- yuryshulaev
- zachguoUnified Patents