wikipedia-crawler

There are 11 repositories under wikipedia-crawler topic.

Sarthakjain1206/Intelligent_Document_Finder
Document Search Engine Tool
Language:Python71 5 314
lehinevych/MediaWikiAPI
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
Language:Python38 4 2012
nazaninsbr/Wikipedia-Crawler
a crawler for Wikipedia (for now only the English pages)
Language:Python2 2 0
Smile040501/Search-Engine
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
Language:JavaScript1 2 00
TimurKasatkin/IR_system
Innopolis IR 2016 course semester project IR system part
Language:Scala1 4 00
adidottxt/wikipedia-crawler
python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡
Language:Python0 1 01
ambirpatel/Wikipedia-crawler
Web scraping is data scraping technique used for extracting data from websites.
Language:Jupyter Notebook0 1 00
jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
Language:Go1 0
mayankkumar2/wikipedia-index-scraper
The program can map out the shortest path between 2 wikipedia pages.
Language:Go2 0
Relex12/Wikipedia-Translate-Crawler
A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links
Language:Shell1 0
WillCaton2350/Wikipedia-WebCrawler
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.
Language:Python1 0

wikipedia-crawler

Sarthakjain1206/Intelligent_Document_Finder

lehinevych/MediaWikiAPI

nazaninsbr/Wikipedia-Crawler

Smile040501/Search-Engine

TimurKasatkin/IR_system

adidottxt/wikipedia-crawler

ambirpatel/Wikipedia-crawler

jamesponddotco/wikiextract

mayankkumar2/wikipedia-index-scraper

Relex12/Wikipedia-Translate-Crawler

WillCaton2350/Wikipedia-WebCrawler