adept-dm/adept

Add search capability to Adept

Closed this issue · 5 comments

Currently it is not possible to search adept for coordinates.

A regex based search would be great where you could do something like: adept search akka and get all coordinates for akka would be great

Yeah, I think lucene could do the job. I am not sure if there are any alternatives that makes sense? In any case, it would be cool to try Lucene out and see how the code would look like and how well it would perform.

I was thinking it would be cool if the first time you do a search it would search and index or index first then search.

Then after that it would be fast, but you wouldn't waste time on indexing if there is no need for searches.

Yeah make sense. Though i did a bit of searching around and i didn't find
how one would index json files with lucine (afaik it treats everything as
text file) so maybe it would be easier to make our own index consisting of
simply a list of existing coords.

On Mon, Aug 19, 2013 at 3:27 PM, Fredrik Ekholdt
notifications@github.comwrote:

Yeah, I think lucene could do the job. I am not sure if there are any
alternatives that makes sense? In any case, it would be cool to try Lucene
out and see how the code would look like and how well it would perform.

I was thinking it would be cool if the first time you do a search it would
search and index or index first then search.

Then after that it would be fast, but you wouldn't waste time on indexing
if there is no need for searches.


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-22871384
.

It has been ages since I looked at Lucene, but there must be some way of index any structured document I would assume. This page seems a bit complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fields but the way I read it is that you can define a document stucture (a module in our case) and add searchable fields to it. I am not sure though - would be nice to look at a tutorial.

As you say, our own index might be just as good and maybe Lucene is overkill. If there was a library that had a B+ tree implementation that persisted stuff (or maybe use lucene's implementation) it would be nice though. With such an index we would be able to scale better and we could index other stuff as well (descriptions, universes, ...)

Or perhaps I have been thinking too much into it. Perhaps what we need now is to iterate through all the module files and get results from a reg exp. Most people have SSDs which should do this pretty quickly and currently there are not that many modules. It would be fast to implement and easy to replace with something more efficient once the time is right. WDYT?

I would use Lucene. It is a very clean library, with tons of traction, and
heavily optimized. You can define whatever searchable fields you want. You
may have to write explicit code to go from JSON to a Lucene document, or you
may find a library that does it, but either way it is simple.

Dean

From: Fredrik Ekholdt notifications@github.com
Reply-To: adept-dm/adept
<reply+i-17501929-199675c68aa765e3e049617009238dee3e55208f-237535@reply.gith
ub.com>
Date: Monday, August 19, 2013 1:24 PM
To: adept-dm/adept adept@noreply.github.com
Subject: Re: [adept] Add search capability to Adept (#4)

It has been ages since I looked at Lucene, but there must be some way of
index any structured document I would assume. This page seems a bit
complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fields but
the way I read it is that you can define a document stucture (a module in
our case) and add searchable fields to it. I am not sure though - would be
nice to look at a tutorial.

As you say, our own index might be just as good and maybe Lucene is
overkill. If there was a library that had a B+ tree implementation that
persisted stuff (or maybe use lucene's implementation) it would be nice
though. That way we could maintains perf.

Or perhaps I have been thinking too much into it. Perhaps what we need now
is to iterate through all the module files and get results from a reg exp.
Most people have SSDs and currently there are not that many modules. It
would be fast to implement and easy to replace with something more
efficient. WDYT?


Reply to this email directly or view it on GitHub
#4 (comment) .

Yeah i have no experience with lucine so i don't know. We can probably try
both if we want to. Using lucine looks simple enough, though (
http://www.lucenetutorial.com/lucene-in-5-minutes.html). Also, thanks to
storing all the metadata in the git, reindexing should be really simple and
quite fast (we have all the info about which files changed for free)

On Mon, Aug 19, 2013 at 10:31 PM, Dean Thompson notifications@github.comwrote:

I would use Lucene. It is a very clean library, with tons of traction, and
heavily optimized. You can define whatever searchable fields you want. You
may have to write explicit code to go from JSON to a Lucene document, or
you
may find a library that does it, but either way it is simple.

Dean

From: Fredrik Ekholdt notifications@github.com
Reply-To: adept-dm/adept

<reply+i-17501929-199675c68aa765e3e049617009238dee3e55208f-237535@reply.gith
ub.com>
Date: Monday, August 19, 2013 1:24 PM
To: adept-dm/adept adept@noreply.github.com
Subject: Re: [adept] Add search capability to Adept (#4)

It has been ages since I looked at Lucene, but there must be some way of
index any structured document I would assume. This page seems a bit
complicated: http://lucene.apache.org/core/3_0_3/fileformats.html#Fieldsbut
the way I read it is that you can define a document stucture (a module in
our case) and add searchable fields to it. I am not sure though - would be
nice to look at a tutorial.

As you say, our own index might be just as good and maybe Lucene is
overkill. If there was a library that had a B+ tree implementation that
persisted stuff (or maybe use lucene's implementation) it would be nice
though. That way we could maintains perf.

Or perhaps I have been thinking too much into it. Perhaps what we need now
is to iterate through all the module files and get results from a reg exp.
Most people have SSDs and currently there are not that many modules. It
would be fast to implement and easy to replace with something more
efficient. WDYT?


Reply to this email directly or view it on GitHub
#4 (comment) .


Reply to this email directly or view it on GitHubhttps://github.com//issues/4#issuecomment-22901725
.