/Kennedy

Kennedy: Crawler and Search Engine for Gemini space. Leverages techniques and architecture from early WWW crawlers like Mercator, Archive.org, and GoogleBot

Primary LanguageC#

Kennedy

Kennedy is a search engine for Gemini space. It consists of a crawler, backend, and Gemini app server. Kennedy leverages many of the techniques and architecture from early WWW crawlers and search engines like Mercator, Archive.org, and GoogleBot.

Kennedy running in Lagrange client

Demo

Visit gemini://kennedy.gemi.dev with a Gemini client or via an HTTP-to-Gemini proxy

Features

  • Full Text search, with Porter Stemming
  • Suggested queries. Kennedy recommends other queries if you don't get many results
  • Complex search queries. For example: "cats AND dogs", "(cats OR dogs) NOT birds".
  • Image Search! Kennedy indexes link text and path info to enable searching for images.
  • Content language classification using ngrams instead of MIME types lang= parameters
  • PageRank-dervied algorithm to better determine results relevance
  • Clean Snippets: Search results include a snippet of content which matches your query. Your keywords are [surrounded] with brackets, and gemtext formatting is removed to make it cleaner to read.
  • Lines count: To help you sort quick articles from longer form content, search results tell you how many lines are in the content. If you want to be nerdy, I also include byte sizes.

Why?

Many years ago the great British explorer George Mallory, who was to die on Mount Everest, was asked why did he want to climb it. He said, "Because it is there."

John F. Kennedy Address at Rice University, Sept. 12, 1962

Projects

  • Kennedy.Crawler - Crawler logic (Url Frontiers, Queues, etc)
  • Kennedy.CrawlData - Models and storage systems for documents, meta data, and full text search
  • Kennedy.Server - Gemini Server to handle queries and search results. Built on top of RocketForce, a .NET Gemini server and application framework
  • Kennedy.SearchConsole - Console app for running FTS queries. Used for testing