Soft hyphens in text break grammar API
snomos opened this issue · 11 comments
I am not able to get GramDivvun to work in MS Word (the local app) when using the UiT account. It can be installed, it loads, and looks the way it should in the initial screen. But when clicking "Check", it almost immediately dies with the following errors in the console:
[Error] Failed to load resource: the server responded with a status of 500 () (se, line 0)
[Error] Failed to get grammar check API response
TypeError: undefined is not an object (evaluating 'e.filter')
p — index.ts:165
(anonym funksjon) — api.ts:58
(anonym funksjon) — app.0990553d7b12bc912a5b.js:21:37456
s — app.0990553d7b12bc912a5b.js:21:36328
u — bluebird.js:5370
(anonym funksjon) — bluebird.js:3366
(anonym funksjon) — bluebird.js:3423
(anonym funksjon) — bluebird.js:3468
(anonym funksjon) — bluebird.js:3548
l — bluebird.js:145
c — bluebird.js:138
(anonym funksjon) — bluebird.js:154
(anonym funksjon) — bluebird.js:67
(anonym funksjon) — bluebird.js:4620
(anonym funksjon) (app.0990553d7b12bc912a5b.js:21:38425)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:21:37456)
s (app.0990553d7b12bc912a5b.js:21:36328)
u (app.0990553d7b12bc912a5b.js:1:200125)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:173146)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:173967)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:174655)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:176008)
l (app.0990553d7b12bc912a5b.js:1:127337)
c (app.0990553d7b12bc912a5b.js:1:127262)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:128376)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:127207)
(anonym funksjon) (app.0990553d7b12bc912a5b.js:1:189711)
[Error] Unhandled Promise Rejection: TypeError: undefined is not an object (evaluating 'm.message')
(anonym funksjon) (word-mac-16.00.js:26:310515)
promiseReactionJob
Screenshot of the same:
I have tested various setups, and most work, but not this one. The ones I have tested are:
- windows, app (365), UiT account - works
- windows, browser (Edge), UiT account - works
- windows, browser (Edge), private account - works
- mac, browser (Chrome), private account - works
- mac, browser (Chrome), UiT account - works
- mac, browser (Safari), private account - DOES work
- mac, browser (Safari), UiT account - does NOT work
- mac, app (2016), private account - DOES work
- mac, app (2016), UiT account - does NOT work
The Safari+UiT problem manifests differently, and thus seems to be a different issue, and can be easily worked around by using another browser. So this bug report targets the Office 2016 locally installed app issue only.
Turns out the problem is soft hyphens in the text sent from Word. So the categorisation above is probably invalid, and just a happy coincidence of the test data used.
To trigger the error, use the following text - it should contain two soft hyphens:
Áigot nannet sámiid konsultašuvdnarievtti
The issue this time was the character \x1f
was found.
The issue this time was the character
\x1f
was found.
INFORMATION SEPARATOR ONE?
Everyone's favourite codepoint! The input was dutkama ja luonddu\x1fdiehtaga,
What is libdivvun doing wrong? I get
$ echo ' Áigot nannet sámiid konsultašuvdnarievtti' | src/divvun-checker -l se
{"errs":[["konsulta",21,29,"typo","Ii leat sátnelisttus",["konsula"],"Čállinmeattáhus"],["šuvdna",30,36,"typo","Ii leat sátnelisttus",["šuvona","govdna"],"Čállinmeattáhus"]],"text":" Áigot nannet sámiid konsultašuvdnarievtti"}
$ echo ' Áigot nannet sámiid konsultašuvdnarievtti' | src/divvun-checker -l se |hl-nonprinting
{"errs":[["konsulta",21,29,"typo","Ii leat sátnelisttus",["konsula"],"Čállinmeattáhus"],["šuvdna",30,36,"typo","Ii leat sátnelisttus",["šuvona","govdna"],"Čállinmeattáhus"]],"text":" Áigot nannet sámiid konsulta-šuvdna-rievtti"}⁋
from the command line with newest giella-sme-speller (that hl-nonprinting is just a script to sed \xad into a dash and EOL into ⁋).
@snomos has conflated two issues. the \x1f
issue seems to be coming from libdivvun, whereas the soft hyphen issue is our problem.
$ printf ' dutkama ja luonddu\x1fdiehtaga' | src/divvun-checker -l se
{"errs":[],"text":" dutkama ja luonddudiehtaga"}
$ printf ' dutkama ja luonddu\x1fdiehtaga' | src/divvun-checker -l se |hl-nonprinting
{"errs":[],"text":" dutkama ja luonddu^_diehtaga"}⁋
– should we be removing it from input, or are we somehow introducing \x1f's, or am I not reproducing the issue correctly here? (Some "expected this, but got that" examples would be nice ;))
Sorry I am trying to go on vacation, haha.
Error was control character (\\u0000-\\u001F) found while parsing a string
, looks like it was coming from libdivvun, perhaps it isn't. January's problem now.
There was a bug in libdivvun – that character should've been escaped according to the json spec. Fixed now, hopefully might help with this issue.
Everyone's favourite codepoint! The input was
dutkama ja luonddu\x1fdiehtaga,
This seems to be fixed, at least not causing any trouble in neither Word nor GDocs. Even
konsultašuvdnarievtti
(containing two soft hyphens) seems to be fixed. Closing.