apache-nutch

There are 7 repositories under apache-nutch topic.

  • nasa-jpl-memex/nutch-python

    Python port of Nutch that allows controlling Apache Nutch via its REST API.

    Language:Python51162
  • RonnyFalconeri/CrawlingSpider

    A simple web crawler inside a docker container using Apache Nutch 1 and Solr.

    Language:Dockerfile5101
  • anshul1004/CountriesSearchEngine

    A search engine built to retrieve geographical information of any country.

    Language:Python4114
  • asioso/elastic-6-nutch

    Nutch 1.x Indexer Plugin that runs against ES6.7

    Language:Java3203
  • sbatururimi/nutch-test

    Different example of using Nutch: with Solr, Selenium Hub, standalone web drivers

    Language:Dockerfile2300
  • AzeemQidwai/nutch-solr-mongodb

    DataHarvest: Dockerized Web Crawling, Indexing, and Storage Solution

    Language:Python0101
  • SrideviAkondi/webcrawlers_java

    The proposed system makes use of a crawler to gather information from every document on the website and store this information in the index. The index is a structured system of storing the unstructured data returned by the crawler. In this project, Nutch’s main component named ‘crawler’ is used for indexing and Solr is used for ‘searching’. The crawler fetches the pages and turns them into an inverted index. This inverted index (also called as ‘lucene index’) is used by the searcher to resolve user’s queries. Crawler and Searcher components can be scaled independently of each other.

    Language:JavaScript0100