get unicode strings
markschultz opened this issue · 6 comments
when I do a get on a specific ID some fields are unprintable yet valid binary strings. when I do a search for the same object, the same fields are now printable. I've narrowed it down to fields that contain "'" (0x2019) or "—" (0x2014)
May I ask you to provide a bit more details? Would be nice to have a way to reproduce it (a bunch of iex>
lines would be helpful, take a look at #224 issue for example).
Thanks.
the elasticsearch object:
{
"MeetingTitle": "Planner— A",
"Id": 1
}
the method i'm using to search:
elasticquery = search [index: "v1", size: 1] do
query do
bool do
filter do
term "Id", "1" # in this case the object id is the same as the elasticsearch _id.
end
end
end
end
results = Tirexs.Query.create_resource(elasticquery)
# inspect [:_source][:MeetingTitle], see printable, valid, string
method for get:
results2 = get("v1/meetings/1")
# inspect [:_source][:MeetingTitle], see unprintable, valid, string
I'm able to do the same thing with curl and the strings both appear to be the same which leads me to believe the issue is not with elasticsearch or my data.
let me know if you need more details please.
here is what I have:
➜ tirexs git:(master) ✗ iex -S mix
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Interactive Elixir (1.2.3) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> Tirexs.HTTP.put("/index/type/1", [MeetingTitle: "Planner— A", Id: 1])
{:ok, 201,
%{_id: "1", _index: "index", _shards: %{failed: 0, successful: 1, total: 2},
_type: "type", _version: 1, created: true}}
iex(2)> {:ok, 200, %{_source: %{MeetingTitle: meeting_title}}} = Tirexs.HTTP.get("/index/type/1")
{:ok, 200,
%{_id: "1", _index: "index",
_source: %{Id: 1,
MeetingTitle: <<80, 108, 97, 110, 110, 101, 114, 195, 162, 194, 128, 194, 148, 32, 65>>},
_type: "type", _version: 1, found: true}}
iex(3)> String.valid?(meeting_title)
true
iex(36)> Kernel.is_bitstring(meeting_title)
true
Hope, it would be helpful for you )
yes, this is exactly what i'm seeing. if you try String.printable?(meeting_title)
i think you'll get false
. When I try to print that byte string i get Planner� A
When I get that meeting via search (see above search) inspecting the MeetingTitle in iex i get
_source: %{Id: 1,
MeetingTitle: "Planner— A"}
I see.I'll try to play with it. It looks like an elastic issue. Will ping you back.