#README
Setup should be a three step dance.
- Look at config/database.yml and create the configured database and user. This is an exercise left to the reader, but in a nutshell: install postgres, create the db, create the user with sufficient privileges so that it can drop/create the database.
- rake db:reset
- rake db:migrate
If you're comfortable enough with postgres and intend to poke in the database at a lower level, then also set your schema search path:
alter database quran_dev set search_path = "$user", quran, content, audio, i18n, public;
The search engine used to query the Quran.
To run elasticsearch, in bash paste:
elasticsearch --config=/usr/local/opt/elasticsearch/config/elasticsearch.yml
To install: Web portal: sudo elasticsearch/bin/plugin -install mobz/elasticsearch-head
Github: https://github.com/mobz/elasticsearch-head
To run: Open in browser http://localhost:9200/_plugin/head/
If you’ve installed the .deb package, then the plugin exectuable will be available at /usr/share/elasticsearch/bin/plugin.
http://localhost:9200/_cat/indices?v
* Delete the index
* http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-delete-index.html
View mappings: in browser - http://localhost:9200/quran/_mapping
- This should come in handy if the rails extension just isn't cutting it. You can get to it via the model class,
e.g. Quran::Text.__elasticsearch__.client
- http://www.rubydoc.info/gems/elasticsearch-api/Elasticsearch/API
- https://github.com/elasticsearch/elasticsearch-ruby/tree/master/elasticsearch-api
- To create the ElasticSearch index:
rake es_tasks:setup_index
- To delete it, run
rake es_tasks:delete_index
- To delete and recreate only a single mapping from the rails console:
client = Quran::Ayah.__elasticsearch__.client
client.indices.delete_mapping index: 'quran', type: 'translation'
client.indices.put_mapping index: 'quran', type: 'translation', body: { translation: { _parent: { type: 'ayah' }, _routing: { required: true, path: 'ayah_key' }, properties: { text: { type: 'string', term_vector: 'with_positions_offsets_payloads' } } } }
Note: you will run into the problem of not having the arabic_synonyms.txt file in the proper location for elasticsearch. That's fine. The file is located in the public directory and should be placed in /etc/elasticsearch/analysis
on your server.
- Figuring out whats wrong with a query
-
Fire up a rails console:
r = Quran::Ayah.search( "allah light", 1, 20, [ 'content.transliteration', 'content.translation' ] ) debugme=r.instance_values['search'].instance_values['definition'][:body] print debugme.to_json, "\n"
{"query":{"bool":{"should":[{"has_child":{"type":"transliteration","query":{"match":{"text":{"query":"allah light","operator":"or","minimum_should_match":"3\u003c62%"}}}}},{"has_child":{"type":"translation","query":{"match":{"text":{"query":"allah light","operator":"or","minimum_should_match":"3\u003c62%"}}}}}],"minimum_number_should_match":1}}}
-
Copy and paste that output into the 'Any Request' tab of http://127.0.0.1:9200/_plugin/head/
- normalize western languages (stemming, etc.)
- factor in frequency, density, proximity to each other, and proximity to the beginning of the ayah (seems like it's not factored in)
- frequency, i.e. if 'allah light' matches 'allah' once, and 'light' twice in the same result, then that result needs a higher score than matching only 'allah' once and 'light' once
- density, i.e. if 'allah light' matches an ayah which is only 5 tokens long, e.g. 'allah word_a light word_b word_c' then this has a higher density then a match against a result which is 300 words long and should respectively have a higher score
- proximity to each other, i.e. 'allah light' matching 'allah word light word word word' gets a better score then a match against 'allah word word word word word word light'
- proximity to the beginning of the ayah, i.e. if 'allah light' matches a translation which is 'allah is the light of word word word word word word' then this should have a higher score then 'word word word word word word word allah word word word word light'
-
normalize arabic using techniques to-be-determined involving root, stem, lemma
-
improving relevance:
- this document: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/relevance-intro.html
- in combination with a rails console inspection of:
matched_children = ( OpenStruct.new Quran::Ayah.matched_children( query, config[:types], array_of_ayah_keys ) ).responses
##Usage
http://localhost:3000/surahs/1/ayat?audio=1&content=21&from=1&quran=1&to=10