ryanb/xapit

Auto-reload Database

ryanb opened this issue · 15 comments

Upon fetching the Xapit database, it should check to see if the database date modified has changed since last fetch. If it has then it should reload the database.

This has two key advantages. One is that it is easier to sync the database with a cron task of some kind (without using xapit-sync). Another is that this is a nice reload alternative for those using xapit-sync with Passenger.

Hello Ryan,

Until now, there is no way to xapit work with passenger? thx

It will work with Passenger, however there's no easy way for the database to be reloaded. Currently if you add a new record or change an existing one you'll have to re-index the database and then restart the web server (touch rst.txt) in order for the index changes to take effect.

xapit-sync doesn't do that? or part of the process? (re-index the database)

Nope, xapit-sync works with the idea of triggering a URL to reload the database when the syncing is done. This only works if you have a way to access each Rails instance (like if each one is on a separate port in mongrel).

With Passenger there's no way to send one request to every Rails instance so you can't ensure it is completely reloaded through triggering a URL. There is a ticket open on Passenger's bug tracker about this but I haven't seen much activity on it so I doubt it will be a feature anytime soon.

Hi Ryan,

this is just a brain-dump, but maybe it would work:
Could it be possible to have a "pool" of Xapian databases? So when we re-index, it's written to a different Xapian DB than the one that's (possibly) still in use by various Rails instances. But on the next request that involves the Xapian index, the newly generated DB would be used so that eventually all Rails instances would use the new one. Am I completely missing something here or could this work?

It would work very similar to that. The xapian database is stored in memory. Once it detects that it has changed it will automatically reload it from disk. I don't think there's any need for a pool of databases on disk since it's stored in memory.

Yes, I understand that. It was just a shot at trying to find a way to reload the database in Passenger without having to restart the whole app. So my idea was this: if it's impossible to reload the current xapian db in Passenger without restarting, would it - as an alternative - be possible to simple create a new (a second) xapian database and then tell rails/xapian to use that database on each new request, effectively bypassing to need to restart the whole app.

Telling it to reload the current database is as easy as telling it to use a different database. The problem is sending that message to each Rails instance in Passenger which (AFAIK) isn't possible.

So the Rails instance will need to monitor the database to see if it has been updated and auto-reload it when it has. This is the only way I know how to avoid a restart with passenger.

Passenger support or anyway there are multiple rails instances (in thread or processes), each of them will read and have search index in memory..

Xapit-sync is a clever start, but in long term each rails instance should be able to reopen the database on its own.

I am working on some rough strategy on my branch and here is how it works.

  1. Xapit::Config will maintain a sha1 hash of (filesize + filemtime) of main xapian index file (record.DB)
  2. All xapit database access goes through Xapit::Config.database call, therefore, its a good place to put the monitor and reopen the database when its expired.

The commit http://github.com/speedmax/xapit/commit/fe9c9cfbf83191a9ab4dbe4aaedb19060838ea42

Of course, this works in my irb session and its probably not thread safe

I see. I'm sure you've thought of this already, but I'm curious about your opinion: Could a Passenger-like "touch tmp/restart.txt" approach possibly work?
When the re-indexing is done, a file like tmp/reload_xapian_db.txt (the location could be set in xapit.yml) could be touched and each Rails instance checks if that file has been touched before it uses the Xapian DB. If it has been touched, it'll reload the xapian DB before it performs the search. Could that work?

actually, i just update the index using a rake tasks(Xapit.remove_database && Xapit.index_all) and no passenger refresh is required..

The commit i have done basically do the same thing like checking a file "tmp/reload_xapian_db.txt",

The difference is it monitor changes on those xapian index file instead, so no manual touch file is required. :)

Thanks for your work on this guys. Once I find the time I hope to merge this into the main branch.

Ryanb,

Any chance of having speedmax's changes on the main branch?

I am closing all issues because I have rewritten Xapit and want to focus on immediate bugs with this new version. If you find this issue still relevant, please comment here or open another issue and I'll take a look at it.