BUG:can't search utf8 string ( and can't display correct utf8 encoding result)
olddog opened this issue · 9 comments
When i query '测试' or other string。。。example: 'ふじこ' 。。node-sphinxapi display some error.Whether it not support the search for utf8 encoding string
btw: it do not display correctly results ( utf8 encoding)
the code 1:
var SphinxClient = require ("./node-sphinxapi/lib/sphinxapi.js"),
assert = require('assert');
var cl = new SphinxClient();
cl.SetServer('127.0.0.1', 9312);
cl.Query('忍者', 'xml', function(err, res) {
console.log(err, res);
});
I put 'console' in sphinxapi.js (line: 456)
console.log(err).
console.log(response).
console.log(response.toString('utf8')).
the result is:
[Error: searchd error: invalid weight count -1914142587 (should be in 0..256 range)]
null
e:\node-sphinxapi\lib\sphinxapi.js:459
var max_ = response.length
^
TypeError: Cannot read property 'length' of null
at E:\project\root\apps\app\minefield\gl\node-sphinxapi\lib\sphinxapi.js:459:22
at Socket.<anonymous> (E:\project\root\apps\app\minefield\gl\node-sphinxapi\lib\sphinxapi.js:183:7)
at Socket.emit (events.js:67:17)
at TCP.onread (net.js:377:14)
the code 2:
var SphinxClient = require ("./node-sphinxapi/lib/sphinxapi.js"),
assert = require('assert');
var cl = new SphinxClient();
cl.SetServer('127.0.0.1', 9312);
cl.Query('loli', 'xml', function(err, res) {
console.log(err, res);
});
result is :
null [ { error: '',
warning: '',
status: [ 0 ],
fields: [ 'name' ],
attrs: [ [Object], [Object] ],
matches: [ [Object] ],
total: 1,
total_found: 1,
time: 0.001,
words: [ [Object] ] } ]
I put 'console' in sphinxapi.js (line: 456)
console.log(err).
console.log(response).
console.log(response.toString('utf8')).
its display some data like:
null
<Buffer 00 00 00 00 00 00 00 01 00 00 00 04 6e 61 6d 65 00 00 00 02 00 00 ...>
loli忍者 .......
thanks for the issue. I just changed the calculation of the size of strings.
I think the problem was there.
Can you try and tell me if the problem is solved for you.
I have tested, it should return the data, but did not return any data.
Thank you.
This behavior is it expected ? Can I close the issue ?
I mean.Although there is no error display, but the data returned is empty. . .It stands to reason there should have some content returned
I'm not used to working with utf8 string and sphinx.
And I don't really know how Sphinx should be reacting
So, I build my own example,
I inserted some misc UTF-8 content in test's table (cf. /usr/share/doc/sphinxsearch/example-conf/example.sql) :
REPLACE INTO test.documents ( id, group_id, group_id2, date_added, title, content ) VALUES
( 2, 1, 6, NOW(), 'utf8 test', 'xxx 测试测试 yyyy' ),
( 3, 2, 7, NOW(), 'utf8 sample 1', 'yyy ふじこ yyy ' ),
( 5, 2, 8, NOW(), 'utf8 sample 2', 'zzz 测试 zzzz' );
After indexing, I have :
$ usr/bin/search utf8
Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx/sphinx.conf'...
index 'main': query 'utf8 ': returned 3 matches of 3 total in 0.000 sec
displaying matches:
1. document=2, weight=1500, group_id=1, date_added=Fri May 18 22:15:41 2012
id=2
group_id=1
group_id2=6
date_added=2012-05-18 22:15:41
title=utf8 test
content=xxx 测试测试 yyyy
2. document=3, weight=1500, group_id=2, date_added=Fri May 18 22:15:41 2012
id=3
group_id=2
group_id2=7
date_added=2012-05-18 22:15:41
title=utf8 sample 1
content=yyy ふじこ yyy
3. document=5, weight=1500, group_id=2, date_added=Fri May 18 22:15:41 2012
id=5
group_id=2
group_id2=8
date_added=2012-05-18 22:15:41
title=utf8 sample 2
content=zzz 测试 zzzz
words:
1. 'utf8': 3 documents, 3 hits
But when I try to search utf8 strings, I have no result :
$ usr/bin/search 试测测试
Sphinx 2.0.4-id64-release (r3135)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/etc/sphinx/sphinx.conf'...
index 'main': query '试测测试 ': returned 0 matches of 0 total in 0.000 sec
words:
I try the same query with the PHP and Node API.
And I obtained the same behavior.
<?php
require_once './others/sphinxapi.php';
$cl = new SphinxClient();
$cl->SetServer('localhost', 19312);
$r = $cl->Query('测试测试');
var_dump($r);
and
var SphinxClient = require ("../lib/sphinxapi.js"),
util = require('util'),
assert = require('assert');
var cl = new SphinxClient();
cl.SetServer('localhost', 19312);
cl.Query('测试测试', function(err, result) {
assert.ifError(err);
console.log(util.inspect(result, false, null, true));
});
Are you seeing the same thing ?
I do not know why, I also tested under the limestone (https://github.com/kurokikaze/limestone). This module seems to work well, or you look at this module may help you solve the problem?
I know Limestone, It uses buffer_extras.js by Tim Caswelll to build the query. Here, I use a array and a reduce function. It's bit different. To find the bug I need to reproduce it. Can you send me your sphinx configuration ?
Ok. this is my sphinx configuration:
source xml
{
type = xmlpipe2
xmlpipe_command = bin\cat e:/1.xml
}
index xml
{
source = xml
path = e:/data/xml
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
html_strip = 0
charset_dictpath = C:/usr/local/etc/
charset_type = zh_cn.utf-8
}
indexer
{
mem_limit = 128M
}
searchd
{
listen = 9312:mysql41
read_timeout = 5
max_children = 30
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
pid_file = C:/usr/local/var/log/searchd_xml.pid
log = C:/usr/local/var/log/searchd_xml.log
query_log = C:/usr/local/var/log/query_xml.log
binlog_path =
}
I have not been able to reproduce the bug. I readed several times the code source and I compared it with the others implementations and I don't known why there are a problem in your case.
If you are interested, you can try to compare the request generated by Limestone with that produced by Sphinxapi...
see here :
https://github.com/kurokikaze/limestone/blob/master/limestone.js#L511
and here :
https://github.com/lindory-project/node-sphinxapi/blob/master/lib/sphinxapi.js#L532