wikipedia-corpus

There are 30 repositories under wikipedia-corpus topic.

howl-anderson/chinese-wikipedia-corpus-creator
Corpus creator for Chinese Wikipedia
Language:Python42 2 29
GermanT5/wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
Language:Python36 2 13
uma-pi1/OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
Language:Java36 5 26
todd-cook/ML-You-Can-Use
Practical ML and NLP with examples.
Language:Jupyter Notebook34 2 26
ayushidalmia/Wikipedia-Search-Engine
Involves building a search engine on the Wikipedia Data Dump using the data dump of 2013 of size 43 GB. The search results returns in real time.
Language:Python23 3 010
macbre/mediawiki-dump
Python package for working with MediaWiki XML content dumps
Language:Python20 2 93
kohjiaxuan/Wikipedia-Article-Scraper
A complete Python text analytics package that allows users to search for a Wikipedia article, scrape it, conduct basic text analytics and integrate it to a data pipeline without writing excessive code.
Language:Python17 2 07
OlehOnyshchak/pyWikiMM
Collects a multimodal dataset of Wikipedia articles and their images
Language:Python15 2 112
uma-pi1/OPIEC-pipeline
Language:Java14 5 02
wolfgarbe/WikipediaExport
Convert Wikipedia XML dump files to JSON or Text files
Language:C#5 3 00
kylemin/DeViSE
Implementation of DeViSE, including wordnet word2vec using gensim library (NIPS 2013)
Language:MATLAB4 1 00
ksipos/polysemy-assessment
Code and data for the paper 'Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings'
Language:Shell3 3 00
LeviMatheus/tcc-readability-score-level
Repositório para disponibilização de bases de dados do Wikipedia e Simple Wikipedia pré-processadas, além de scripts de pré-processamento e geração de bases em Python.
3 0 00
quqixun/ReadWiki-ZH
Convert WIKI dumped XML (Chinese) to human readable documents in markdown and txt.
Language:Python3 2 11
TomerAberbach/wikipedia-ngrams
📚 A Kotlin project which extracts ngram counts from Wikipedia data dumps.
Language:Kotlin3 3 0
ArisPan/wiki-query
A desktop application that searches through a set of Wikipedia articles using Apache Lucene.
Language:Java2 1 00
bashkirtsevich-llc/wiki-dump-parser
Wiki dump parser (jupyter)
Language:Jupyter Notebook1 2 0
OmerCohen71/IR-Wikipedia-Search-Engine
IR search Engine for Wikipedia app
Language:Jupyter Notebook1 1 00
vikash212000yadav/Basic-Chatbot
Interactive chatbot using python :)
Language:Jupyter Notebook1 2 00
Affenmilchmann/lingwiki
(Ongoing module in development) Getting Wikipedia articles parsed content. Created for getting text corpuses data fast and easy. But can be freely used for other purpuses too
Language:Python0 1 00
afuschetto/wiki-extractor
Command line tool to extract plain text from Wikipedia database dumps
Language:Python0 1 00
etcetra7n/wikibot
RNN model trained from wikipedia corpus
Language:Jupyter Notebook0 1 00
IDS-Mannheim/Wikipedia-Corpus-Builder
Builds Wikipedia corpora in I5 (a TEI-based format)
Language:Java0 3 220
jksware/ai-spanish-wikipedia-clustering
Clustering of Spanish Wikipedia articles.
Language:Python0 1 00
krishnadwypayan/SearchEngine
Language:Java0 1 01
moodser/splitter-transliteration
Python script to split the text generated by 'wikipedia parallel title extractor' into separate text files (separate file for each language)
Language:Python0 0 00
PJ-Duo/wiki-corpus
Create a wiki corpus using a wiki dump file for Natural Language Processing
Language:Python0 1 00
rajatyadav1994/Wise--WikiPedia-Search-Engine
A Search Engine built based on Wikipedia dump of 75GB. Involves creation of Index file and returns search results in real time
Language:Python0 1 00
Triansh/Wiki-Searcher
A search engine trained from a corpus of wikipedia articles to provide efficient query results.
Language:Python0 1 00
macbre/faroese-corpus
Some Faroese language statistics taken from fo.wikipedia.org content dump
Language:Python2 0

wikipedia-corpus

howl-anderson/chinese-wikipedia-corpus-creator

GermanT5/wikipedia2corpus

uma-pi1/OPIEC

todd-cook/ML-You-Can-Use

ayushidalmia/Wikipedia-Search-Engine

macbre/mediawiki-dump

kohjiaxuan/Wikipedia-Article-Scraper

OlehOnyshchak/pyWikiMM

uma-pi1/OPIEC-pipeline

wolfgarbe/WikipediaExport

kylemin/DeViSE

ksipos/polysemy-assessment

LeviMatheus/tcc-readability-score-level

quqixun/ReadWiki-ZH

TomerAberbach/wikipedia-ngrams

ArisPan/wiki-query

bashkirtsevich-llc/wiki-dump-parser

OmerCohen71/IR-Wikipedia-Search-Engine

vikash212000yadav/Basic-Chatbot

Affenmilchmann/lingwiki

afuschetto/wiki-extractor

etcetra7n/wikibot

IDS-Mannheim/Wikipedia-Corpus-Builder

jksware/ai-spanish-wikipedia-clustering

krishnadwypayan/SearchEngine

moodser/splitter-transliteration

PJ-Duo/wiki-corpus

rajatyadav1994/Wise--WikiPedia-Search-Engine

Triansh/Wiki-Searcher

macbre/faroese-corpus