kkos/oniguruma

The results of a Google search for "oniguruma" are crazy! (in Japan)

kkos opened this issue · 40 comments

kkos commented

If you do a Google search for the keyword "oniguruma" you'll see some very strange results. The first few links that appear on the first page are related to the keyword oniguruma, but the rest of the pages are mostly made up of completely unrelated links. I noticed this last August. However, this may be the case only in Japan. I don't know what is going on in other parts of the world.

The rest of this article is written below.
https://kkos.fc2.net/blog-entry-1.html

kkos commented

The attack on Google search is still going on.
I still don't know what it looks like outside of Japan.
When I registered this Issue, the behavior changed a little, so I think the criminal is looking at this page.

From San Francisco:
Screen Shot 2021-06-03 at 19 23 22

kkos commented

Thank you.
For the first time, I was able to learn how things look outside of Japan.
At least there doesn't seem to be anything weird in the first page.
In my environment, a dozen or so pages of mostly irrelevant stuff are displayed.

Portugal. Page count is under 100k instead of 1.14M
oniguruma

kkos commented

The number of searches in Japan is close to that.
It turns out that there will be little impact outside of Japan.

This is how it looks today in Colombia.

oniguruma - Google Search_Página_1

kkos commented

Most of the unrelated pages I see here are in Japanese, so it seems to be fine for non-Japanese areas.
If you don't see any unfamiliar or unusual characters (Japanese characters: kanji, hiragana, etc.) within the first few pages, you should be fine.
This may be due to the fact that the culprits are in Japan, where they are mechanically manipulating clicks to increase their rankings.

In Canada the results on Google and DuckDuckGo are similar to those posted above. Looks fine to me.
image

The search results also look fine for me in Los Angeles, CA 👍

Works fine in Sydney, Australia

From Bulgaria:
image

Works fine for me in Tokyo, Japan.
Screen Shot 2021-09-12 at 21 58 41

kkos commented

I don't think so.
Even here, the first six or seven of the first page will be the relevant pages. But after that, it's mostly filled with irrelevant pages for more than ten pages.
In other words, most of what comes up in a search is irrelevant links.
In your image, "すること。7 記載容量6は、営業外収益の「その他」" and "70 花粉発生源対策推進事業" probably have nothing to do with Oniguruma.

image

Hi there~ Here's the result for me, seems fine? I from China and use a global network _(:3

I'm not sure if you've tried changing Google's search settings? There're sth about the region and languages for the search result ...

btw, I used Singapore as the region setting (for some sorry reason what I won't to entangled in), and I set the languages for the search result to 简体中文、繁體中文、English and 日本語。

kkos commented

Thank you.
I am convinced that the Japanese search results are abnormal and that the non-Japanese search results are normal.
I just checked the contents of the two links in the previous example by @mmizutani.

Neither of them contains the strings "Oniguruma" or "鬼車", and neither of them has anything to do with Oniguruma.
Moreover, this is the result of the first page, and the next pages are full of irrelevant links.
Although @mmizutani hasn't produced a second page, I'm convinced of that from my own results.
I have no idea about the impact of where you search.

Confirmed, After I tried changing the region to Japan, the search results showed these completely unrelated items ... Trying to find the reason

image

seems changed the search options to 完全一致 from tools would help,and notice that most those things are PDF Doc

I have a guess about it... Weather is it possible that Google parsed all that content into romanization and then split it to match and lead to this...

kkos commented

You're right, most of the irrelevant links are PDFs.
But not all of them, maybe 60%.
When I set it to exact match, the irrelevant links disappeared.
That doesn't mean that the cause isn't an attack.

It fine from Vietnam.

image

Zip file of screenshots ( canada, france, indonesia, Taiwan ) 9.47MB
https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/234ver3.zip

kkos commented

I looked at your search results.
I used to think that the results only depended on the language, but now I know that it depends on the language and the location.
In other words, the results are terrible when you search in Japan specifying Japanese, and not so terrible otherwise.
However, your search also showed that the effects of this attack are not entirely absent outside of Japan.
Some examples are shown below.
These have nothing to do with Oniguruma.
And these are links that I have seen many times.

france_ja_p6, indonesia_ja_p7, Taiwan_ja_p6
円行東自治会 - FC2
https://engyouhigashi.web.fc2.com/inout-hiritu.html

canada_ja_p6, france_ja_p4, indonesia_ja_p4, Taiwan_ja_p3
持 続 可 能 な 医 療 保 険 制 度 を 構 築 す る た め の 国 民 健
https://www.sangiin.go.jp/japanese/gianjoho/ketsugi/189/f069_052601.pdf

france_ja_p6, indonesia_ja_p5, Taiwan_ja_p4
食品流通合理化促進事業
https://www.maff.go.jp/j/shokusan/sijyo/info/attach/attach/pdf/sijyou_yosan2-9.pdf

canada_ja_p7, france_ja_p7, indonesia_ja_p6
お 困 り の 方 へ 騒 音 や 悪 臭 な ど で
https://www.city.tochigi-sakura.lg.jp/manage/contents/upload/61bb4ab4dbec7.pdf

In canada_ja, p7 is more of irrelevant links.
indonesia_ja is more of irrelevant links from p6.
Taiwan_ja is more of irrelevant links from p4.

@tonco-miyazawa,
I would like to know what happens to the "other keywords" that appear below the results when I search for oniguruma and specify a time period of 24 hours or less.
Here are the results I just ran (in Japan, in Japanese)
Screen shot 2022-04-09 22 43 13
These bullshit words have been showing up at a high rate for nearly two years.

kkos commented

@tonco-miyazawa
I did not notice the April 15 addendum until today.

I wrote a rebuttal on my blog. (In both English and Japanese.).
https://kkos.fc2.net/blog-entry-2.html

After a re-investigation, I found that my idea was wrong.
再調査をしたところ、私の考えが間違っていたことが分かりました

I deleted the previous remarks.
私は以前の発言を削除しました

I'm sorry about that remark.
ご迷惑をおかけしてすみませんでした

Befzz commented

google is trying to show you most relevant information in your language based on your ip / location or preferences(if you are signed-in)

You can ask google to show results found in other language or multiple languages:

https://www.google.com/search?q=Oniguruma&lr=lang_ja|lang_en

英語 と 日本語のページを検索 ( プライバシーモード ) (click me)

image

japanese + english:
lr=lang_ja|lang_en 

english:
lr=lang_en

?) lang_XX (言語(lr)の収集値)
https://developers.google.com/custom-search/docs/xml_results_appendices#languageCollections

?) lr Language Restriction (言語制限)
https://developers.google.com/custom-search/docs/xml_results#lrsp

?) hl (インターフェース言語 )
https://developers.google.com/custom-search/docs/xml_results#hlsp

IMHO It is not that you "attacked", it is simply that keyword is less popular/cited than it's japanese counterpart.

It is ofthen desireable to search for english results only, especially in programming...

You can add new search engine and make it Default (click me)

image

kkos commented

@Befzz
If you read and understand the following two, I don't think you would make such a claim.
https://kkos.fc2.net/blog-entry-1.html
https://kkos.fc2.net/blog-entry-2.html

I didn't want to write the same thing twice, so I wrote another article.
https://kkos.fc2.net/blog-entry-3.html

Have you taken into account the effect of the personalized search algorithms used by Google?

kkos commented

Did you not read my first entry?
https://kkos.fc2.net/blog-entry-1.html

I don't think this is because Google is displaying customized results for users. The reason is that searching in Chrome's incognito mode did not make any difference.

I've heard that in incognito mode (or secret mode?), the search results will not be personalized.

And by @tonco-miyazawa
https://github.com/tonco-miyazawa/regex_etc/blob/master/MEMO_onig/Issues/onig_SS2.zip
The following two files in this archive show the same strange related keywords I saw.

Japan_aichi_JP_24h.png
Japan_hokkaido_JP_24h.png

(* But since it is in Japanese, I don't think you would know what it is when you see it.)

I randomly found this issue when looking up regex engines from Wikipedia. Guess advertising on front page works :)

I was able to reproduce in Japan. More importantly my friend at Google could too. It looks like a search bug so hope it gets fixed.

My hypothesis is the romaji gets converted to kanji 鬼車, but maybe split into two tokens 鬼 and 車. Especially the latter will retrieve a lot of unrelated pages, just need something like 車でお越しの方 somewhere.

The issue doesn't reproduce outside Japanese because the conversion from romaji to kanji is probably disabled elsewhere.

I suspect the attacker is a software bug and hope it gets squashed! I think we all know how hard CJK can be to get right ;)

kkos commented

I don't think so.
I just searched for "Oniguruma" and downloaded the first unrelated link (11th of all) and looked at the contents, but neither "鬼" nor "車" existed.
http://kitakyuminibas.g2.xrea.com/yamagata2018-1.pdf
(Acrobat Reader has a search function.)

Before, I looked at the contents of some of the links in my previous response to @mmizutani's comment, but again, those letters were not there as well.
#234 (comment)

Besides, it is not only "鬼車" that is troublesome to locate the word separator in Japanese, but I believe this is true for all words composed of multiple characters.
I don't know if Google's search engine uses a morphological analyzer for Japanese, but this is irrelevant since the problem I am having is specific to the keyword "Oniguruma".

Beside the how or the what, who would have an interest in such an attack and to achieve what ?

kkos commented

I guess the goal would be to harass me.
I have no control over this, so Google should identify and prosecute this culprit.

zzak commented

Screenshot 2023-01-30 at 8 17 12

Hello Kosako-san, the issue seems fixed for me (from Akita). Sorry if you were impacted by this!

It's not fixed yet. Please see the second and subsequent pages of the search results.
The first page often looks normal.
I have suggested to google several times to fix this issue but no response from google.

Nepal seems fine too! I assumed it was targeted for Asian countries only.

screenshot-2023-02-21_18:28:48

Hi, just happened to see this, I know a little about Japanese. In Japanese,
鬼 == oni == ghost (in English)
車 == guruma == vehicle
I would say this name is very "Japanese", and I actually assumed you are Japanese in the first place just by the plugin name, I think that's why google shows different results for different countries.

Searching Google here in Japan, search results seem fine.
Screenshot from 2024-01-03 17-46-29

kkos commented

I am aware that I have not seen any attacks since last June.
However, I am not inclined to close this issue now, as I have been under attack for over three years and attacks can resume at any time.

Although this issue here is about an attack on/at/via Google search, I would like to add that Google search, for
various reasons - including Google internal ones - has become significantly worse in the last years.

We may have to go back to the days before Google Search in regards of adding links to e. g. oniguruma
and other sites that should realistically be the first google search result.

On a tiny side note, I always found "onig" versus "oniguruma" a slight annoyance.

E. g.:

https://github.com/kkos/oniguruma/releases/download/v6.9.9/onig-6.9.9.tar.gz

IMO it is better to use one, same name - be it onig or oniguruma I have no preference,
but it is weird that the project is called oniguruma, but the download then tells you
that the name is onig.