Wubi: ordering of suggestions wrong (especially with wildcards)
Opened this issue · 10 comments
GoogleCodeExporter commented
With Wubi (structure-based Chinese table input), the use of the “z” key
practically converts the character input into a search-and-input.
The search is only useful when the results are properly presented. fcitx is
lacking here:
fcitx is ordering the results alphabetically of the input.
It should be ordered by _numbers of resulting characters_ first. And then by
other characteristics of the resulting characters[1]. The alphabetical use of
the input code makes no sense at all for structure-based inputs.
Most if not all of the time the search will be for the code of a single
character, this is why they should come first. Mixing those results with
phrases and spreading them over countless pages is close to useless.
If the user is actually searching for phrases, he can go forward in the result
list to find them there – all together.[2]
[1] ordering by code-point would come close to a useful “order by
structure”. Ideally, more sophistic ordering would be nice to have. Fake
example for “za” (where the wildcard comes first):
我工 → ga
主工 → xa
经工 → aa
我式 → ea
主式 → ya
经式 → za
(as you can see here, it is not ordered by code, but by representation of the
fixed code part (“a” being 工 and 式))
[2] depending on [1] it might make sense to put “2 or more” resulting
characters together after having single character results listed.
Original issue reported on code.google.com by Darten.b...@gmail.com
on 4 Jun 2014 at 3:53
GoogleCodeExporter commented
Is anyone reading this? I’ve another bug in the queue (early vs. late
auto-commit). But I feel reluctant filing it if nobody cares...
Original comment by Darten.b...@gmail.com
on 8 Jun 2014 at 11:57
GoogleCodeExporter commented
I don't actually use wubi, so I could not confirm your request. Can you show
some example of wubi wildcard feature on other platform?
Original comment by wen...@gmail.com
on 23 Jun 2014 at 3:16
GoogleCodeExporter commented
SCIM on Linux has a configuration switch which lets Wubi list shorter entries
first in the result list (i.e. single characters come first). Furthermore the
(only) Wubi wildcard key (“z”) is correctly interpreted as “?” in SCIM.
(I’m using ? and * here for their very well known meaning in shell file name
matching sense.) The entry of “zn” lists only possibilities matching
“?n”. Fcitx interprets it as “?n*”.
Another important issue: my intention with this bug report is not to make fcitx
act like any specific other implementation. So in reality it should not be
relevant what other platforms do (as they might be crappy as well). Important
is: what makes the most sense for Wubi-input and is the most helpful for users.
Don’t you agree? :-)
Original comment by Darten.b...@gmail.com
on 24 Jun 2014 at 8:22
GoogleCodeExporter commented
Would it be useful if I write a little pseudo code, which would represent one
way to get the desired result list in the correct order?
Original comment by Darten.b...@gmail.com
on 24 Jun 2014 at 8:30
GoogleCodeExporter commented
Yep, but first, you need to define "what makes the most sense". It's too easy
to propose an idea without providing proof of it, even if I think your idea
makes sense.
So, showing what others are doing is a good source of evidence.
Original comment by wen...@gmail.com
on 26 Jun 2014 at 2:02
GoogleCodeExporter commented
If even you think it makes sense, who else needs to be convinced? – You
prefer to implement something which does not make sense, just because other
IMEs do it the same way?
The IME situation for Linux is really sad. I tried a bunch of them. I don’t
need another IME which copies the bad parts of the others, I need one which
improves. Fcitx looks the most promising. – That’s the reason for my bug
report.
Original comment by Darten.b...@gmail.com
on 27 Jun 2014 at 7:04
GoogleCodeExporter commented
No you did not get the point. Thinking of the reason behind every design, and
comparing those reason is more important than a design itself.
This comic illustrates this issue well: http://xkcd.com/1172/ .
Put short result first is kinds of under the idea that people who use this
feature is learning this specific table. So shall we go even further? For
example, removing all multiple character result in wildcard?
Original comment by wen...@gmail.com
on 28 Jun 2014 at 5:41
GoogleCodeExporter commented
No, I didn’t get your point. Before you wanted some kind of proof of “what
makes sense”. Then I understood that what the others do is probably “what
makes sense”?
Now I read you want to know the reason why the others do what they do. – I
want to know that, too. From what I can see, all too often the reason is “it
works”. No real reason apart from “we did it this way”.
The comic illustrates not the same issue, but another: “every change breaks
someones workflow”. Related, but not the same.
Back to the Wubi issue itself:
Is someone who does a search with wildcards a newcomer to Wubi?
No, I don’t think so. Neither am I. The newcomers I know don’t use the
wildcard key at all. They usually search within the proposed results. Later,
when they have some knowledge some start to do refined searches with the
wildcard key.
Should less relevant search results be removed instead of forming the tail of
the search results?
No. Should Google only show 5 pages of search results and snip off the rest?
The only “advantage” that I can see is to cycle over to page 1 after page
5. A horrible idea (I can go into details).
Original comment by Darten.b...@gmail.com
on 28 Jun 2014 at 12:54
GoogleCodeExporter commented
So can you describe why you want to use wildcard at least? Wildcard feature is
there since the first day I join fcitx development, so I don't know the real
purpose of why it's there, seems you're using it so here's a real user we can
ask.
my random guesses are:
1. some one don't know how to type some character, or don't remember it. (so
why not use pinyin to search? z + pinyin or pinyin directly in wbpy)
2. learn some new code.
BTW though it's not that relevant, google has a result limit AFAIK it's 1000.
And "New comer" can have two different meaning:
1. new to Wubi
2. know some wubi but new to fcitx.
I suppose we are using the first meaning.
Original comment by wen...@gmail.com
on 1 Jul 2014 at 7:07
GoogleCodeExporter commented
The Wubi-wildcard is a search-tool. It can perform some specific kinds of
searches. Some of these are for example:
Which characters have 青 on the right side? The two Wubi components of 青
make two Wubi letters, which are:
ge
For characters having one Wubi structure on the left I type and (should) find:
zge → 睛 清 晴 情 精 ... (and others)
If two Wubi structures come before 青 I type:
zzge → 靕 猜 蔳 ...
Three or more Wubi structres before “ge” is harder to search, as it would
be contracted to “zzze” according to Wubi. Many wildcards and only one
restriction, so I’ll get a lot of results which don’t fit what I actually
want to search. That is unfortunate, of course.
There are several reasons why I do this kind of search: Sometimes I’m curios
what characters with certain components do actually exist. Sometimes I know how
to write half the character but the other half does not come to my mind by
itself, so I’m searching for it.
A different kind of search is this:
Are there codes which result in two characters where the second one is 青?
zzge → 藏青 菜青 石青 雪青 ...
My reasons for this kind of search are: Curiosity which expressions involving
青 at second position are covered by wubi? I have a word with two characters
in mind, but forgot how to write the first one and I am sure it’s in wubi,
because it’s frequent. I somehow can’t figure out the code of 青, but I
don’t want to spoil my eagerness to find it out by myself: I just want to
confirm that it starts with “ge”, what I’m sure of. The existence of
two-character entries like these do confirm this.
Note: two different searches used “zzge“. But the results of the first
question searched with “zzge” would always be one character, while the
results for the second kind of search would have two. – This clear separation
has actually nothing to do with searches. It’s a Wubi overlap of the rules of
how to make up codes of one or two characters: the codes can clash.
To directly address your guesses for wildcard’s purpose:
1. someone doesn’t know/remember how to type a character.
Well, it strongly depends on what he still knows about that character? Is it
it’s meaning? Then a dictionary is _one_ way to get know more about the
character, including pronunciation and it’s structure. – If he knows the
pronunciation _and_ pinyin _and_ would recognize it once presented in context
(is able to read it), pinyin is one way to search and type it. – If he knows
the structure _and_ Wubi, he could type it directly in Wubi (no wildcards). If
he knows parts of the structure _and_ Wubi _and_ would recognize it (reading),
Wubi wildcards are one way to find it.
Wubi wildcard implementation should, of course, support some of the use cases
described: namely input with partial knowledge of character structure.
To include more use cases and other searches is nice. I’m full of ideas here.
wbpy touches one of them. ...
2. learn new Wubi codes for a known structure/character
A good Wubi user knows how to get from the character to the code. That is
actually the ability/knowledge. No-one actually learns full codes for full
characters. For learning or understanding the structures and their Wubi letters
the wildcards are of limited use (playing around with them might give some
insights).
Original comment by Darten.b...@gmail.com
on 5 Jul 2014 at 4:41