klokantech/osmnames-sphinxsearch

Index v2.0 data

Closed this issue · 2 comments

Indexing v2.0 on our OVH server with SSD:

time bash sphinx-reindex.sh force
using config file '/etc/sphinxsearch/sphinx.conf'...
WARNING: key 'charset_type' was permanently removed from Sphinx configuration. Refer to documentation for details.
WARNING: key 'charset_type' was permanently removed from Sphinx configuration. Refer to documentation for details.
indexing index 'ind_charset'...
ERROR: index 'ind_charset': key 'path' not found.
indexing index 'ind_main_charset'...
ERROR: index 'ind_main_charset': key 'path' not found.
indexing index 'ind_name_exact_0'...
WARNING: index 'ind_name_exact_0': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 121.4 MB
sorted 19.9 Mhits, 100.0% done
total 5913694 docs, 121432184 bytes
total 66.615 sec, 1822885 bytes/sec, 88773.73 docs/sec
indexing index 'ind_name_prefix_0'...
WARNING: index 'ind_name_prefix_0': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 121.4 MB
sorted 19.9 Mhits, 100.0% done
total 5913694 docs, 121432184 bytes
total 69.825 sec, 1739083 bytes/sec, 84692.61 docs/sec
indexing index 'ind_names_prefix_0'...
WARNING: index 'ind_names_prefix_0': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 646.5 MB
sorted 80.8 Mhits, 100.0% done
total 5913694 docs, 646525853 bytes
total 86.124 sec, 7506888 bytes/sec, 68664.60 docs/sec
indexing index 'ind_names_infix_soundex_0'...
collected 5913694 docs, 646.5 MB
sorted 161.7 Mhits, 100.0% done
total 5913694 docs, 646525853 bytes
total 118.024 sec, 5477905 bytes/sec, 50105.74 docs/sec
indexing index 'ind_name_exact_1'...
WARNING: index 'ind_name_exact_1': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913692 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913692 docs, 121477354 bytes
total 61.757 sec, 1967000 bytes/sec, 95756.43 docs/sec
indexing index 'ind_name_prefix_1'...
WARNING: index 'ind_name_prefix_1': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913692 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913692 docs, 121477354 bytes
total 64.465 sec, 1884371 bytes/sec, 91733.90 docs/sec
indexing index 'ind_names_prefix_1'...
WARNING: index 'ind_names_prefix_1': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913692 docs, 646.4 MB
sorted 80.8 Mhits, 100.0% done
total 5913692 docs, 646402447 bytes
total 89.410 sec, 7229563 bytes/sec, 66140.54 docs/sec
indexing index 'ind_names_infix_soundex_1'...
collected 5913692 docs, 646.4 MB
sorted 161.6 Mhits, 100.0% done
total 5913692 docs, 646402447 bytes
total 122.165 sec, 5291202 bytes/sec, 48407.21 docs/sec
indexing index 'ind_name_exact_2'...
WARNING: index 'ind_name_exact_2': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913694 docs, 121459597 bytes
total 67.477 sec, 1799993 bytes/sec, 87639.12 docs/sec
indexing index 'ind_name_prefix_2'...
WARNING: index 'ind_name_prefix_2': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913694 docs, 121459597 bytes
total 62.580 sec, 1940855 bytes/sec, 94497.46 docs/sec
indexing index 'ind_names_prefix_2'...
WARNING: index 'ind_names_prefix_2': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913694 docs, 646.4 MB
sorted 80.8 Mhits, 100.0% done
total 5913694 docs, 646413395 bytes
total 87.360 sec, 7399346 bytes/sec, 67692.70 docs/sec
indexing index 'ind_names_infix_soundex_2'...
collected 5913694 docs, 646.4 MB
sorted 161.6 Mhits, 100.0% done
total 5913694 docs, 646413395 bytes
total 120.639 sec, 5358207 bytes/sec, 49019.40 docs/sec
indexing index 'ind_name_exact_3'...
WARNING: index 'ind_name_exact_3': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913696 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913696 docs, 121466616 bytes
total 62.329 sec, 1948772 bytes/sec, 94877.48 docs/sec
indexing index 'ind_name_prefix_3'...
WARNING: index 'ind_name_prefix_3': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913696 docs, 121.5 MB
sorted 19.9 Mhits, 100.0% done
total 5913696 docs, 121466616 bytes
total 63.470 sec, 1913757 bytes/sec, 93172.75 docs/sec
indexing index 'ind_names_prefix_3'...
WARNING: index 'ind_names_prefix_3': no morphology or wordforms, index_exact_words=1 has no effect, ignoring
collected 5913696 docs, 646.6 MB
sorted 80.8 Mhits, 100.0% done
total 5913696 docs, 646562409 bytes
total 87.515 sec, 7387942 bytes/sec, 67572.82 docs/sec
indexing index 'ind_names_infix_soundex_3'...
collected 5913696 docs, 646.6 MB
sorted 161.7 Mhits, 100.0% done
total 5913696 docs, 646562409 bytes
total 119.132 sec, 5427235 bytes/sec, 49639.47 docs/sec
skipping non-plain index 'ind_name_exact'...
skipping non-plain index 'ind_names_infix_soundex'...
skipping non-plain index 'ind_name_prefix'...
skipping non-plain index 'ind_names_prefix'...
total 94703600 reads, 68.395 sec, 0.6 kb/call avg, 0.0 msec/call avg
total 183294 writes, 35.450 sec, 407.7 kb/call avg, 0.1 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=20).


real	22m28.967s
user	15m6.852s
sys	1m45.588s
$ time du -hd1 index/
36G	index/

Done in 22.5 minutes
Index size in total is 36GB with 4 indexes: name_exact, name_prefix, names_prefix, names_infix_soundex; splitted into 4 threads.

There are only 14 invalid lines, reported in OSMNames repositary.