wikipedia-crawler
There are 11 repositories under wikipedia-crawler topic.
Sarthakjain1206/Intelligent_Document_Finder
Document Search Engine Tool
lehinevych/MediaWikiAPI
Python wrapper for the MediaWiki API to access and parse data from Wikipedia
nazaninsbr/Wikipedia-Crawler
a crawler for Wikipedia (for now only the English pages)
Smile040501/Search-Engine
A search engine that takes keyword queries as input and retrieves a ranked list of relevant results as output. It scraps a few thousand pages from one of the seed Wiki pages and uses Elasticsearch for a full-text search engine.
TimurKasatkin/IR_system
Innopolis IR 2016 course semester project IR system part
adidottxt/wikipedia-crawler
python web crawler to test theory that repeatedly clicking on the first link on ~97% of wiki pages eventually leads to the wiki page for knowledge 📡
ambirpatel/Wikipedia-crawler
Web scraping is data scraping technique used for extracting data from websites.
jamesponddotco/wikiextract
[READ-ONLY] A word extractor for Wikipedia articles.
mayankkumar2/wikipedia-index-scraper
The program can map out the shortest path between 2 wikipedia pages.
Relex12/Wikipedia-Translate-Crawler
A Wikipedia crawler that gives the worst translated page around an english starting using hypertext links
WillCaton2350/Wikipedia-WebCrawler
Wikipedia Web Crawler written in Python and Scrapy. The ETL process involves multiple steps, extracting specific data from multiple wikipedia web pages/links using scrapy and organizing it into a structured format using scrapy items. Additionally, the extracted data is saved in JSON format for further analysis and integration into MySQL Workbench.