Problem with letters ü, ö and ä in lession generator/analysis tab
GoogleCodeExporter opened this issue · 3 comments
GoogleCodeExporter commented
What steps will reproduce the problem?
1. Use sources with letters like ü ö ä in it (ie german texts)
2. Analyse the session / generate a lession
What is the expected output? What do you see instead?
Words like "Zweckmäßigkeit" look like "Zweckm"
What version of the product are you using? On what operating system?
0.16
Please provide any additional information below.
If you let the program show you only single keys, ü ö ä will show up
Original issue reported on code.google.com by quietdeath@gmail.com
on 6 Jan 2009 at 10:39
GoogleCodeExporter commented
Will try to fix this w/ the next release. :) Funny since my alphatbet also has
non-ascii letters (æøå), but I never tried the analysis/generator with them.
Original comment by tristesse
on 10 Jan 2009 at 3:27
- Changed state: Started
GoogleCodeExporter commented
I have the same problem.
Actually, in this case the program generates two statistics entries. One entry
containing the first half of the word before the umlaut and/or the sharp s
ligature (eszett) “ß” or “ẞ” and another entry containing the second
half of the word. So, for Zweckmäßigkeit you get “Zweckm” and
“igkeit”. This is really annoying in autogenerated lessons as none of both
entries is an actual German word.
I’ll try words like “Überfälle”, “tränenüberströmt”,
“Gehäusegröße” and “Ölüberschussländer” to check how many parts
are produced.
Should be:
berf, lle
tr, nen, berstr, mt
Geh, usegr, ße
and
l, berschussl, nder
Original comment by albedosh...@gmail.com
on 12 Aug 2011 at 9:24
GoogleCodeExporter commented
Now, this is odd.
The program doesn’t have any problem recognizing the special characters in
the trigram analysis.
My test file contains these words, including some non-German words, because I
had the suspicion that the lesson generator doesn’t see non-ASCII characters
at all:
Überfälle tränenüberströmt Gehäusegröße Ölüberschussländer
Ölüberschußländer Geräteüberhöhung Gefäßüberdehnung Löß süß
FAẞBIER Øresund Ælfwine Cœr C&A
So, this is what it looks like in the typer (see image “01-Typer.png”).
I mistyped every word in the lesson, to be sure all words are used in the
lesson generator. But the lesson generator produces this lesson (see image
“02-Generated lesson.png”).
Now check the word analysis (see image “03-Analysis.png”).
The funny part is that the trigram analysis works perfectly fine (see image
“04-Trigrams.png”).
Suspicion confirmed ;)
So the word analysis seems to ignore any non-ASCII character in the text which
obviously leads to erroneous auto-generated lessons and faulty word analyses
— whereas the letter and trigram analysis work as intended.
Original comment by albedosh...@gmail.com
on 12 Aug 2011 at 11:05
Attachments:
- 01-Typer.png
- [02-Generated lesson.png](https://storage.googleapis.com/google-code-attachments/amphetype/issue-11/comment-3/02-Generated lesson.png)
- 03-Analysis.png
- 04-Trigrams.png