A full-text search engine written in pure Elixir.
Three goals: 1) a simple CLI, 2) a scalable API, and 3) shareable document repos.
BEAM, OTP, and GenStage give us the best possible foundation on which to build.
Searchex provides a search capability for full-text documents. Example document types include:
- text, markdown, XML and JSON files
- source code files
- product descriptions
- blog and forum posts
- chat rooms and twitter feeds
- web pages
Searchex allows you to create searchable Repos that can be shared over the Internet. See a sample repo on GitHub.
Searchex is a new project, usable for testing but not for production. For testing, we're using collections of up to 2,000 documents with 1MB of raw text. See the Roadmap for development plans.
A Searchex DOCUMENT
has two key elements:
-
document
META-DATA
, liketitle
,author_name
,publication_date
-
the
FULL-TEXT
of the document
Searchex organizes documents into separate COLLECTIONS
. Each collection has
two main elements:
-
the
CATALOG
, a table-like structure that contains the document ID, meta-data, and document location. -
the
INDEX
, an inverted index built for fast search and retrieval
Each collection is defined by a CONFIG
file, a yaml file that specifies
things like:
- document directories
- file types
- meta-data fields definitions and extraction regexes
- document separator regex (for multi-doc files)
The searchex
command-line program can manage config files, build catalogs and
indexes, and perform searches.
If you have Elixir 1.4+ enter this at the console:
mix escript.install hex searchex
Make sure ~/.mix/escripts
is on your path!
Elixir developers can embed Searchex into their applications.
Add searchex
to your list of dependencies in mix.exs
:
def deps do
[{:searchex, "~> 0.0.4"}]
end
Then run mix deps.get
View API documentation at https://hexdocs.pm/searchex
After the searchex
escript is installed...
-
Fetch a Searchex repository
searchex fetch elixir-search/sample
-
Run
searchex help # show help page searchex ls # list collections searchex info # show collection stats searchex query tiny . # list docs from collection: tiny searchex show tiny 1 # show the doc 1 from 'tiny' searchex query genesis 'cain abel' # query docs from collection: genesis
Note: the first time you run a query, Searchex will build a catalog and index. This can take a minute or two. After that, queries will run sub-second.
- Config management
- Porter stemming algorithm
- BM25 query algorithm
- Indexing Middleware
- LRU Cache
- Basic CLI
- Fetchable document repos
- Adapter Middleware (filesys, ecto)
- Incremental add/remove/update
- Server mode
- Phoenix/Firestorm integration
- Streaming document ingestion (GenStage/Flow)
- Git-based file-change detection
- Faceted Search
- LRU Registry
- Typeahead support
- Alerting
- Multi-collection search
- Configuration GUI
- Toolchain Integration (ExDoc, Hex, GitHub Issues)
- Searchable Tutorials (Elixir Blogs, Slide Decks, Videos)
- Output formatting plugins (Vim, Emacs, etc.)
- P2P Streaming
- Internationalization
- Dockerization
- Bayesian/ML Classifiers
- Searchex Website: http://searchex.org
- GitHub Source: https://github.com/elixir-search/searchex
- Sample Repository: https://github.com/elixir-search/sample
- StemEx: https://github.com/elixir-search/stem_ex
- Hex Package: https://hex.pm/packages/searchex
- API Documentation: https://hexdocs.pm/searchex