Inist-CNRS/node-sphinxapi

BUG:can't search utf8 string ( and can't display correct utf8 encoding result)

olddog opened this issue · 9 comments

When i query '测试' or other string。。。example: 'ふじこ' 。。node-sphinxapi display some error.Whether it not support the search for utf8 encoding string
btw: it do not display correctly results ( utf8 encoding)

the code 1:
var SphinxClient = require ("./node-sphinxapi/lib/sphinxapi.js"),
assert = require('assert');

var cl = new SphinxClient();
cl.SetServer('127.0.0.1', 9312);
cl.Query('忍者', 'xml', function(err, res) { 
      console.log(err, res);
});

I put 'console' in sphinxapi.js (line: 456)

    console.log(err).
            console.log(response).
    console.log(response.toString('utf8')).

the result is:
[Error: searchd error: invalid weight count -1914142587 (should be in 0..256 range)]
null

e:\node-sphinxapi\lib\sphinxapi.js:459
                var max_ = response.length
                 ^
TypeError: Cannot read property 'length' of null
    at E:\project\root\apps\app\minefield\gl\node-sphinxapi\lib\sphinxapi.js:459:22
    at Socket.<anonymous> (E:\project\root\apps\app\minefield\gl\node-sphinxapi\lib\sphinxapi.js:183:7)
    at Socket.emit (events.js:67:17)
    at TCP.onread (net.js:377:14)

the code 2:
var SphinxClient = require ("./node-sphinxapi/lib/sphinxapi.js"),
assert = require('assert');

var cl = new SphinxClient();
cl.SetServer('127.0.0.1', 9312);
cl.Query('loli', 'xml', function(err, res) { 
        console.log(err, res);
});

result is :
null [ { error: '',
warning: '',
status: [ 0 ],
fields: [ 'name' ],
attrs: [ [Object], [Object] ],
matches: [ [Object] ],
total: 1,
total_found: 1,
time: 0.001,
words: [ [Object] ] } ]
I put 'console' in sphinxapi.js (line: 456)
console.log(err).
console.log(response).
console.log(response.toString('utf8')).
its display some data like:
null
<Buffer 00 00 00 00 00 00 00 01 00 00 00 04 6e 61 6d 65 00 00 00 02 00 00 ...>
loli忍者 .......

touv commented

thanks for the issue. I just changed the calculation of the size of strings.
I think the problem was there.
Can you try and tell me if the problem is solved for you.

I have tested, it should return the data, but did not return any data.
Thank you.

touv commented

This behavior is it expected ? Can I close the issue ?

I mean.Although there is no error display, but the data returned is empty. . .It stands to reason there should have some content returned

touv commented

I'm not used to working with utf8 string and sphinx.
And I don't really know how Sphinx should be reacting

So, I build my own example,

I inserted some misc UTF-8 content in test's table (cf. /usr/share/doc/sphinxsearch/example-conf/example.sql) :

REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES
    ( 2, 1, 6, NOW(), 'utf8 test', 'xxx 测试测试 yyyy' ),
    ( 3, 2, 7, NOW(), 'utf8 sample 1', 'yyy ふじこ yyy ' ),
    ( 5, 2, 8, NOW(), 'utf8 sample 2', 'zzz 测试 zzzz' );

After indexing, I have :

$ usr/bin/search utf8
Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinx/sphinx.conf'...
index 'main': query 'utf8 ': returned 3 matches of 3 total in 0.000 sec

displaying matches:
1. document=2, weight=1500, group_id=1, date_added=Fri May 18 22:15:41 2012
    id=2
    group_id=1
    group_id2=6
    date_added=2012-05-18 22:15:41
    title=utf8 test
    content=xxx 测试测试 yyyy
2. document=3, weight=1500, group_id=2, date_added=Fri May 18 22:15:41 2012
    id=3
    group_id=2
    group_id2=7
    date_added=2012-05-18 22:15:41
    title=utf8 sample 1
    content=yyy ふじこ yyy 
3. document=5, weight=1500, group_id=2, date_added=Fri May 18 22:15:41 2012
    id=5
    group_id=2
    group_id2=8
    date_added=2012-05-18 22:15:41
    title=utf8 sample 2
    content=zzz 测试 zzzz

words:
1. 'utf8': 3 documents, 3 hits

But when I try to search utf8 strings, I have no result :

$ usr/bin/search 试测测试
Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/etc/sphinx/sphinx.conf'...
index 'main': query '试测测试 ': returned 0 matches of 0 total in 0.000 sec

words:

I try the same query with the PHP and Node API.
And I obtained the same behavior.

<?php

require_once './others/sphinxapi.php';

$cl = new SphinxClient();
$cl->SetServer('localhost', 19312);
$r = $cl->Query('测试测试');
var_dump($r);

and

var SphinxClient = require ("../lib/sphinxapi.js"),
    util = require('util'),
    assert = require('assert');

var cl = new SphinxClient();
cl.SetServer('localhost', 19312);
cl.Query('测试测试', function(err, result) { 
        assert.ifError(err);
        console.log(util.inspect(result, false, null, true));
});

Are you seeing the same thing ?

I do not know why, I also tested under the limestone (https://github.com/kurokikaze/limestone). This module seems to work well, or you look at this module may help you solve the problem?

touv commented

I know Limestone, It uses buffer_extras.js by Tim Caswelll to build the query. Here, I use a array and a reduce function. It's bit different. To find the bug I need to reproduce it. Can you send me your sphinx configuration ?

Ok. this is my sphinx configuration:

source xml
{
    type                    = xmlpipe2
      xmlpipe_command =  bin\cat e:/1.xml
}


index xml
{
    source            = xml
    path            = e:/data/xml
    docinfo            = extern
    mlock            = 0
    morphology        = none
    min_word_len        = 1
    html_strip                = 0

    charset_dictpath = C:/usr/local/etc/
    charset_type        = zh_cn.utf-8
}


indexer
{
    mem_limit            = 128M
}


searchd
{
    listen                  =   9312:mysql41
    read_timeout        = 5
    max_children        = 30
    max_matches            = 1000
    seamless_rotate        = 0
    preopen_indexes        = 0
    unlink_old            = 1
    pid_file = C:/usr/local/var/log/searchd_xml.pid
    log = C:/usr/local/var/log/searchd_xml.log
    query_log = C:/usr/local/var/log/query_xml.log 
    binlog_path =                                
}
touv commented

I have not been able to reproduce the bug. I readed several times the code source and I compared it with the others implementations and I don't known why there are a problem in your case.

If you are interested, you can try to compare the request generated by Limestone with that produced by Sphinxapi...

see here :
https://github.com/kurokikaze/limestone/blob/master/limestone.js#L511
and here :
https://github.com/lindory-project/node-sphinxapi/blob/master/lib/sphinxapi.js#L532