nooop3/fcitx

Wubi: ordering of suggestions wrong (especially with wildcards)

Opened this issue · 10 comments

With Wubi (structure-based Chinese table input), the use of the “z” key 
practically converts the character input into a search-and-input.

The search is only useful when the results are properly presented. fcitx is 
lacking here:

fcitx is ordering the results alphabetically of the input.

It should be ordered by _numbers of resulting characters_ first. And then by 
other characteristics of the resulting characters[1]. The alphabetical use of 
the input code makes no sense at all for structure-based inputs.

Most if not all of the time the search will be for the code of a single 
character, this is why they should come first. Mixing those results with 
phrases and spreading them over countless pages is close to useless.

If the user is actually searching for phrases, he can go forward in the result 
list to find them there – all together.[2]

[1] ordering by code-point would come close to a useful “order by 
structure”. Ideally, more sophistic ordering would be nice to have. Fake 
example for “za” (where the wildcard comes first):
我工 → ga
主工 → xa
经工 → aa
我式 → ea
主式 → ya
经式 → za
(as you can see here, it is not ordered by code, but by representation of the 
fixed code part (“a” being 工 and 式))

[2] depending on [1] it might make sense to put “2 or more” resulting 
characters together after having single character results listed.

Original issue reported on code.google.com by Darten.b...@gmail.com on 4 Jun 2014 at 3:53

Is anyone reading this? I’ve another bug in the queue (early vs. late 
auto-commit). But I feel reluctant filing it if nobody cares...

Original comment by Darten.b...@gmail.com on 8 Jun 2014 at 11:57

I don't actually use wubi, so I could not confirm your request. Can you show 
some example of wubi wildcard feature on other platform?

Original comment by wen...@gmail.com on 23 Jun 2014 at 3:16

SCIM on Linux has a configuration switch which lets Wubi list shorter entries 
first in the result list (i.e. single characters come first). Furthermore the 
(only) Wubi wildcard key (“z”) is correctly interpreted as “?” in SCIM. 
(I’m using ? and * here for their very well known meaning in shell file name 
matching sense.) The entry of “zn” lists only possibilities matching 
“?n”. Fcitx interprets it as “?n*”.

Another important issue: my intention with this bug report is not to make fcitx 
act like any specific other implementation. So in reality it should not be 
relevant what other platforms do (as they might be crappy as well). Important 
is: what makes the most sense for Wubi-input and is the most helpful for users. 
Don’t you agree? :-)

Original comment by Darten.b...@gmail.com on 24 Jun 2014 at 8:22

Would it be useful if I write a little pseudo code, which would represent one 
way to get the desired result list in the correct order?

Original comment by Darten.b...@gmail.com on 24 Jun 2014 at 8:30

Yep, but first, you need to define "what makes the most sense". It's too easy 
to propose an idea without providing proof of it, even if I think your idea 
makes sense.

So, showing what others are doing is a good source of evidence.

Original comment by wen...@gmail.com on 26 Jun 2014 at 2:02

If even you think it makes sense, who else needs to be convinced? – You 
prefer to implement something which does not make sense, just because other 
IMEs do it the same way?

The IME situation for Linux is really sad. I tried a bunch of them. I don’t 
need another IME which copies the bad parts of the others, I need one which 
improves. Fcitx looks the most promising. – That’s the reason for my bug 
report.

Original comment by Darten.b...@gmail.com on 27 Jun 2014 at 7:04

No you did not get the point. Thinking of the reason behind every design, and 
comparing those reason is more important than a design itself.

This comic illustrates this issue well: http://xkcd.com/1172/ .

Put short result first is kinds of under the idea that people who use this 
feature is learning this specific table. So shall we go even further? For 
example, removing all multiple character result in wildcard?

Original comment by wen...@gmail.com on 28 Jun 2014 at 5:41

No, I didn’t get your point. Before you wanted some kind of proof of “what 
makes sense”. Then I understood that what the others do is probably “what 
makes sense”?

Now I read you want to know the reason why the others do what they do. – I 
want to know that, too. From what I can see, all too often the reason is “it 
works”. No real reason apart from “we did it this way”.

The comic illustrates not the same issue, but another: “every change breaks 
someones workflow”. Related, but not the same.

Back to the Wubi issue itself:

Is someone who does a search with wildcards a newcomer to Wubi?

No, I don’t think so. Neither am I. The newcomers I know don’t use the 
wildcard key at all. They usually search within the proposed results. Later, 
when they have some knowledge some start to do refined searches with the 
wildcard key.

Should less relevant search results be removed instead of forming the tail of 
the search results?

No. Should Google only show 5 pages of search results and snip off the rest? 
The only “advantage” that I can see is to cycle over to page 1 after page 
5. A horrible idea (I can go into details).

Original comment by Darten.b...@gmail.com on 28 Jun 2014 at 12:54

So can you describe why you want to use wildcard at least? Wildcard feature is 
there since the first day I join fcitx development, so I don't know the real 
purpose of why it's there, seems you're using it so here's a real user we can 
ask.

my random guesses are:
1. some one don't know how to type some character, or don't remember it. (so 
why not use pinyin to search? z + pinyin or pinyin directly in wbpy)
2. learn some new code.

BTW though it's not that relevant, google has a result limit AFAIK it's 1000.

And "New comer" can have two different meaning:
1. new to Wubi
2. know some wubi but new to fcitx.
I suppose we are using the first meaning.

Original comment by wen...@gmail.com on 1 Jul 2014 at 7:07

The Wubi-wildcard is a search-tool. It can perform some specific kinds of 
searches. Some of these are for example:

Which characters have 青 on the right side? The two Wubi components of 青 
make two Wubi letters, which are:

ge 

For characters having one Wubi structure on the left I type and (should) find:

zge → 睛 清 晴 情 精 ... (and others)

If two Wubi structures come before 青 I type:

zzge → 靕 猜 蔳 ...

Three or more Wubi structres before “ge” is harder to search, as it would 
be contracted to “zzze” according to Wubi. Many wildcards and only one 
restriction, so I’ll get a lot of results which don’t fit what I actually 
want to search. That is unfortunate, of course.

There are several reasons why I do this kind of search: Sometimes I’m curios 
what characters with certain components do actually exist. Sometimes I know how 
to write half the character but the other half does not come to my mind by 
itself, so I’m searching for it.

A different kind of search is this:

Are there codes which result in two characters where the second one is 青?

zzge → 藏青 菜青 石青 雪青 ...

My reasons for this kind of search are: Curiosity which expressions involving 
青 at second position are covered by wubi? I have a word with two characters 
in mind, but forgot how to write the first one and I am sure it’s in wubi, 
because it’s frequent. I somehow can’t figure out the code of 青, but I 
don’t want to spoil my eagerness to find it out by myself: I just want to 
confirm that it starts with “ge”, what I’m sure of. The existence of 
two-character entries like these do confirm this.


Note: two different searches used “zzge“. But the results of the first 
question searched with “zzge” would always be one character, while the 
results for the second kind of search would have two. – This clear separation 
has actually nothing to do with searches. It’s a Wubi overlap of the rules of 
how to make up codes of one or two characters: the codes can clash.

To directly address your guesses for wildcard’s purpose:

1. someone doesn’t know/remember how to type a character.

Well, it strongly depends on what he still knows about that character? Is it 
it’s meaning? Then a dictionary is _one_ way to get know more about the 
character, including pronunciation and it’s structure. – If he knows the 
pronunciation _and_ pinyin _and_ would recognize it once presented in context 
(is able to read it), pinyin is one way to search and type it. – If he knows 
the structure _and_ Wubi, he could type it directly in Wubi (no wildcards). If 
he knows parts of the structure _and_ Wubi _and_ would recognize it (reading), 
Wubi wildcards are one way to find it.

Wubi wildcard implementation should, of course, support some of the use cases 
described: namely input with partial knowledge of character structure.

To include more use cases and other searches is nice. I’m full of ideas here. 
wbpy touches one of them. ...

2. learn new Wubi codes for a known structure/character

A good Wubi user knows how to get from the character to the code. That is 
actually the ability/knowledge. No-one actually learns full codes for full 
characters. For learning or understanding the structures and their Wubi letters 
the wildcards are of limited use (playing around with them might give some 
insights).

Original comment by Darten.b...@gmail.com on 5 Jul 2014 at 4:41