Erikvl87/docker-languagetool

Crash using the ARM64 build with ngram configuration

Erikvl87 opened this issue · 5 comments

Originally posted by @anetschka in #15 (comment)

Hi there, I have tried building the image with the supplied dockerfile and and arm64-workaround, using docker-compose. The only difference is that I added ngram data to a folder within my image. The image builds and the container starts up normally, but when I actually post (German) text to LanguageTool, it seems that Hunspell is still not properly initialised:

languagetoolservice_1 | java.io.IOException: Read-only file system languagetoolservice_1 | at java.base/java.io.UnixFileSystem.createFileExclusively(Native Method) languagetoolservice_1 | at java.base/java.io.File.createTempFile(File.java:2129) languagetoolservice_1 | at java.base/java.io.File.createTempFile(File.java:2175) languagetoolservice_1 | at org.bridj.Platform.createTempDir(Platform.java:710) languagetoolservice_1 | at org.bridj.Platform.<clinit>(Platform.java:227) languagetoolservice_1 | at org.bridj.Pointer.<clinit>(Pointer.java:208) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:37) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.Hunspell.getDictionary(Hunspell.java:50) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.init(HunspellRule.java:488) languagetoolservice_1 | at org.languagetool.rules.de.GermanSpellerRule.init(GermanSpellerRule.java:1244) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.ensureInitialized(HunspellRule.java:462) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.match(HunspellRule.java:155) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:1295) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1846) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1765) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1736) languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1226) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkInternal(JLanguageTool.java:970) languagetoolservice_1 | at org.languagetool.JLanguageTool.check2(JLanguageTool.java:908) languagetoolservice_1 | at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:762) languagetoolservice_1 | at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:711) languagetoolservice_1 | at org.languagetool.server.TextChecker.access$000(TextChecker.java:56) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:427) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:420) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) languagetoolservice_1 | at java.base/java.lang.Thread.run(Thread.java:829) languagetoolservice_1 | 2021-11-19 17:11:26.422 +0000 ERROR org.languagetool.server.LanguageToolHttpHandler An error has occurred: 'java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent>, detected: de-DE', sending HTTP code 500. Access from 192.168.208.3, HTTP user agent: Python-urllib/3.8, User agent param: null, Referrer: null, language: de-DE, h: 1, r: 1, time: 5494text length: 59, m: ALL, l: DEFAULT, Stacktrace follows:java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent>, detected: de-DE languagetoolservice_1 | at org.languagetool.server.TextChecker.checkText(TextChecker.java:457) languagetoolservice_1 | at org.languagetool.server.ApiV2.handleCheckRequest(ApiV2.java:162) languagetoolservice_1 | at org.languagetool.server.ApiV2.handleRequest(ApiV2.java:76) languagetoolservice_1 | at org.languagetool.server.LanguageToolHttpHandler.handle(LanguageToolHttpHandler.java:182) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.AuthFilter.doFilter(AuthFilter.java:82) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:80) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange$LinkHandler.handle(ServerImpl.java:692) languagetoolservice_1 | at jdk.httpserver/com.sun.net.httpserver.Filter$Chain.doFilter(Filter.java:77) languagetoolservice_1 | at jdk.httpserver/sun.net.httpserver.ServerImpl$Exchange.run(ServerImpl.java:664) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) languagetoolservice_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) languagetoolservice_1 | at java.base/java.lang.Thread.run(Thread.java:829) languagetoolservice_1 | Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) languagetoolservice_1 | at org.languagetool.server.TextChecker.checkText(TextChecker.java:438) languagetoolservice_1 | ... 12 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1230) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkInternal(JLanguageTool.java:970) languagetoolservice_1 | at org.languagetool.JLanguageTool.check2(JLanguageTool.java:908) languagetoolservice_1 | at org.languagetool.server.TextChecker.getPipelineResults(TextChecker.java:762) languagetoolservice_1 | at org.languagetool.server.TextChecker.getRuleMatches(TextChecker.java:711) languagetoolservice_1 | at org.languagetool.server.TextChecker.access$000(TextChecker.java:56) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:427) languagetoolservice_1 | at org.languagetool.server.TextChecker$1.call(TextChecker.java:420) languagetoolservice_1 | at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) languagetoolservice_1 | ... 3 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: Could not check sentence (language: German (Germany)): <sentcontent>Die Deutsche Bank kündigte den Abbau von 18.000 Stellen an.</sentcontent> languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1883) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1765) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.call(JLanguageTool.java:1736) languagetoolservice_1 | at org.languagetool.JLanguageTool.performCheck(JLanguageTool.java:1226) languagetoolservice_1 | ... 11 more languagetoolservice_1 | Caused by: java.lang.RuntimeException: Could not create hunspell instance. Please note that LanguageTool supports only 64-bit platforms (Linux, Windows, Mac) and that it requires a 64-bit JVM (Java). languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:45) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.Hunspell.getDictionary(Hunspell.java:50) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.init(HunspellRule.java:488) languagetoolservice_1 | at org.languagetool.rules.de.GermanSpellerRule.init(GermanSpellerRule.java:1244) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.ensureInitialized(HunspellRule.java:462) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.HunspellRule.match(HunspellRule.java:155) languagetoolservice_1 | at org.languagetool.JLanguageTool.checkAnalyzedSentence(JLanguageTool.java:1295) languagetoolservice_1 | at org.languagetool.JLanguageTool$TextCheckCallable.getOtherRuleMatches(JLanguageTool.java:1846) languagetoolservice_1 | ... 14 more languagetoolservice_1 | Caused by: java.lang.UnsatisfiedLinkError: 'int org.bridj.Platform.sizeOf_ptrdiff_t()' languagetoolservice_1 | at org.bridj.Platform.sizeOf_ptrdiff_t(Native Method) languagetoolservice_1 | at org.bridj.Platform.<clinit>(Platform.java:232) languagetoolservice_1 | at org.bridj.Pointer.<clinit>(Pointer.java:208) languagetoolservice_1 | at org.languagetool.rules.spelling.hunspell.DumontsHunspellDictionary.<init>(DumontsHunspellDictionary.java:37) languagetoolservice_1 | ... 21 more languagetoolservice_1 |

Hi @Erikvl87, I don't think that adding the ngrams and changing the output port (which I also did) results in this mistake. I actually tested it by removing the --languagemodel option on LanguageTool's startup command and the error is still the same. I also noticed that the current setup does not include Python which renders me, for instance, unable to run unittests inside the container. I tried adding it, but to no avail.

Hi @anetschka, since I couldn't reproduce your issue, could you let me know on what type of device you are running the image? What OS is installed? How does your Dockerfile / docker-compose.yml file look like and with what arguments do you start it?

I've just configured my Raspberry Pi 4, 8gb, running the 64 bit version of Ubuntu 21.04 with a docker-compose.yml file with the following contents:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: languagetool
    ports:
        - 8010:8010  # Using default port from the image
    environment:
        - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
        - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
        - Java_Xmx=1g  # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
        - ./ngrams:/ngrams

My local ngrams folder contains the contents of the German ngrams found here: https://languagetool.org/download/ngram-data/ngrams-de-20150819.zip

I've ran it with the docker-compose up command and executed the following request:

curl --location --request GET 'http://192.168.1.186:8010/v2/check?language=de-DE&text=In den christlichen Traditionen gibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.'

Note: The text is taken from step 5 at https://dev.languagetool.org/finding-errors-using-n-gram-data.html

The response is:

{
    "software": {
        "name": "LanguageTool",
        "version": "5.5",
        "buildDate": "2021-10-16 14:46:22 +0000",
        "apiVersion": 1,
        "premium": false,
        "premiumHint": "You might be missing errors only the Premium version can find. Contact us at support<at>languagetoolplus.com.",
        "status": ""
    },
    "warnings": {
        "incompleteResults": false
    },
    "language": {
        "name": "German (Germany)",
        "code": "de-DE",
        "detectedLanguage": {
            "name": "German (Germany)",
            "code": "de-DE",
            "confidence": 0.9999957
        }
    },
    "matches": [
        {
            "message": "‚Mediation‘ (Verfahren zur Konfliktlösung) erscheint hier weniger wahrscheinlich als ‚Meditation‘ (spirituelle Übung).",
            "shortMessage": "Mögliche Wortverwechselung",
            "replacements": [
                {
                    "value": "Meditation",
                    "shortDescription": "spirituelle Übung"
                }
            ],
            "offset": 73,
            "length": 9,
            "context": {
                "text": "...ibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.",
                "offset": 43,
                "length": 9
            },
            "sentence": "In den christlichen Traditionen gibt es unterschiedliche Anleitungen zur Mediation und Kontemplation.",
            "type": {
                "typeName": "Other"
            },
            "rule": {
                "id": "CONFUSION_RULE_MEDIATION_MEDITATION",
                "description": "Mögliche Verwechselungen zwischen 'Mediation' und 'Meditation' erkennen",
                "issueType": "non-conformance",
                "category": {
                    "id": "TYPOS",
                    "name": "Mögliche Tippfehler"
                }
            },
            "ignoreForIncompleteSentence": false,
            "contextForSureMatch": 3
        }
    ]
}

I also noticed that the current setup does not include Python which renders me, for instance, unable to run unittests inside the container. I tried adding it, but to no avail.

You can use this image as a base image in a new Dockerfile and then install Python:

FROM erikvl87/languagetool

USER root
RUN apk update && apk add python3 py3-pip
USER languagetool

You can start the new Dockerfile by executing the following 2 commands in the directory of the Dockerfile:

docker build -t languagetool-custom .
docker run --rm -it -p 8010:8010 languagetool-custom

Hi @Erikvl87, sorry for my late reply. I have tried using your Dockerfile with and without the "workaround". I am running Linux containers and locally I use docker on a Windows desktop computer, so the "workaround" might actually not be the right option for me. I have now returned to my old LT+docker setup which works fine, however, I am ready to reproduce the steps I made to help you debug. It's possible that the mistake is on my side. The relevant part of my docker-compose looks like this:

languagetoolservice:
    image: languagetoolservice
    read_only: true
    build:
      context: languagetool
    restart: unless-stopped
    init: true
    tmpfs:
      - /var/nobody_tmp:mode=770,size=10M,uid=65534,gid=65534,exec
    ports:
      - 127.0.0.1:8010:8000
    cap_drop:
      - all
    networks:
      - languagetool-net
    depends_on:
      - _ngrams

_ngrams is a volume from which the ngrams are copied into the container at build time. As said above, the container builds and starts up normally, the error is encountered only at runtime.

I try to run the language tool server on my Raspberry Pi 3b+ with the following docker-compose.yaml:

version: "3"

services:
  languagetool:
    image: erikvl87/languagetool
    container_name: LanguageTool
    restart: always
    ports:
      - 8010:8010  # Using default port from the image
    environment:
      - langtool_languageModel=/ngrams  # OPTIONAL: Using ngrams data
      - Java_Xms=512m  # OPTIONAL: Setting a minimal Java heap size of 512 mib
      - Java_Xmx=1g    # OPTIONAL: Setting a maximum Java heap size of 1 Gib
    volumes:
      - ./ngrams:/ngrams

I get the following error, I am not sure if this is related to the original arm issue:

Pulling languagetool (erikvl87/languagetool:)...
latest: Pulling from erikvl87/languagetool
ERROR: no matching manifest for linux/arm/v7 in the manifest list entries

I also tried, building the image from source, but failed with the following issue:

[ERROR] Failed to execute goal org.xolstice.maven.plugins:protobuf-maven-plugin:0.6.1:compile (default) on project languagetool-core: Unable to resolve artifact: Missing:
[ERROR] ----------
[ERROR] 1) com.google.protobuf:protoc:exe:linux-arm_32:3.17.3
[ERROR] 
[ERROR]   Try downloading the file manually from the project website.
[ERROR] 
[ERROR]   Then, install it using the command: 
[ERROR]       mvn install:install-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-arm_32 -Dpackaging=exe -Dfile=/path/to/file
[ERROR] 
[ERROR]   Alternatively, if you host your own repository you can deploy the file there: 
[ERROR]       mvn deploy:deploy-file -DgroupId=com.google.protobuf -DartifactId=protoc -Dversion=3.17.3 -Dclassifier=linux-arm_32 -Dpackaging=exe -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]
[ERROR] 
[ERROR]   Path to dependency: 
[ERROR]   	1) org.languagetool:languagetool-core:jar:5.7
[ERROR]   	2) com.google.protobuf:protoc:exe:linux-arm_32:3.17.3
[ERROR] 
[ERROR] ----------
[ERROR] 1 required artifact is missing.
[ERROR] 
[ERROR] for artifact: 
[ERROR]   org.languagetool:languagetool-core:jar:5.7
[ERROR] 
[ERROR] from the specified remote repositories:
[ERROR]   central (https://repo.maven.apache.org/maven2, releases=true, snapshots=false)
[ERROR] 
[ERROR] -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :languagetool-core
The command 'mvn --projects languagetool-standalone --also-make package -DskipTests --quiet' returned a non-zero code: 1
ERROR: Service 'languagetool' failed to build : Build failed

Any ideas how to get it running? Thanks in advance!

@Maxl94 I think your issue is unrelated to this ticket. If I'm not mistaken, linux/arm/v7 is a 32bit architecture. Currently, I've only released Docker images for linux/amd64 and linux/arm64 (see Docker Hub tags).

I am not sure, but Raspberry Pi 3b+ should have a 64-bit SoC. Have you tried running a 64bit OS? See Raspbery Pi OS (64-bit)

If you need this to work on a different architecture (32 bit), please open up a new ticket so I could try and look into that.