apache-nutch

There are 7 repositories under apache-nutch topic.

nasa-jpl-memex/nutch-python
Python port of Nutch that allows controlling Apache Nutch via its REST API.
Language:Python5 11 62
RonnyFalconeri/CrawlingSpider
A simple web crawler inside a docker container using Apache Nutch 1 and Solr.
Language:Dockerfile5 1 01
anshul1004/CountriesSearchEngine
A search engine built to retrieve geographical information of any country.
Language:Python4 1 14
asioso/elastic-6-nutch
Nutch 1.x Indexer Plugin that runs against ES6.7
Language:Java3 2 03
sbatururimi/nutch-test
Different example of using Nutch: with Solr, Selenium Hub, standalone web drivers
Language:Dockerfile2 3 00
AzeemQidwai/nutch-solr-mongodb
DataHarvest: Dockerized Web Crawling, Indexing, and Storage Solution
Language:Python0 1 01
SrideviAkondi/webcrawlers_java
The proposed system makes use of a crawler to gather information from every document on the website and store this information in the index. The index is a structured system of storing the unstructured data returned by the crawler. In this project, Nutch’s main component named ‘crawler’ is used for indexing and Solr is used for ‘searching’. The crawler fetches the pages and turns them into an inverted index. This inverted index (also called as ‘lucene index’) is used by the searcher to resolve user’s queries. Crawler and Searcher components can be scaled independently of each other.
Language:JavaScript0 1 00