pemistahl/lingua

Language detection does not work when a SecurityManager is enabled

ctalau opened this issue · 4 comments

If a SecurityManager is enabled in the JVM, the language detection does not work.

Sample code to reproduce the issue:

import java.io.File;
import java.io.IOException;
import java.nio.file.Files;

import com.github.pemistahl.lingua.api.Language;
import com.github.pemistahl.lingua.api.LanguageDetector;
import com.github.pemistahl.lingua.api.LanguageDetectorBuilder;

public class SecurityManagerApp {
  public static Language[] languages = new Language[] { Language.ENGLISH, Language.FRENCH};
  
  public static void main(String[] args) throws IOException {
    File policyFile = File.createTempFile("security", ".policy", null);
    Files.write(policyFile.toPath(), 
        "grant { permission java.security.AllPermission; }; ".getBytes());
    System.setProperty("java.security.policy", policyFile.getAbsolutePath());
    System.setSecurityManager(new SecurityManager());
    
    LanguageDetector detector = LanguageDetectorBuilder.fromLanguages(languages).build();
    System.out.println(detector.detectLanguageOf("Comment ca va\r\n"
        + "Comme ci\r\n"
        + "Comme ci\r\n"
        + "Comme ci\r\n"
        + "Comme ca").name());
  }
}

Note that in my use-case, I am creating a plugin for an application, so I cannot disable the SecurityManager of the JVM.

If running the sample code with -Djava.security.debug=access,failure I obtained the following stack trace:

access: access denied ("java.io.FilePermission" "D:\opt\.m2\com\github\pemistahl\lingua\1.2.1\lingua-1.2.1.jar" "read")
java.lang.Exception: Stack trace
	at java.base/java.lang.Thread.dumpStack(Thread.java:1380)
	at java.base/java.security.AccessControlContext.checkPermission(AccessControlContext.java:475)
	at java.base/java.security.AccessController.checkPermission(AccessController.java:1068)
	at java.base/java.lang.SecurityManager.checkPermission(SecurityManager.java:416)
	at java.base/jdk.internal.loader.URLClassPath.check(URLClassPath.java:556)
	at java.base/jdk.internal.loader.URLClassPath.checkURL(URLClassPath.java:535)
	at java.base/jdk.internal.loader.BuiltinClassLoader.checkURL(BuiltinClassLoader.java:1080)
	at java.base/jdk.internal.loader.BuiltinClassLoader.findResource(BuiltinClassLoader.java:356)
	at java.base/java.lang.ClassLoader.getResource(ClassLoader.java:1403)
	at java.base/java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:1733)
	at java.base/java.lang.Class.getResourceAsStream(Class.java:2850)
	at com.github.pemistahl.lingua.api.LanguageDetector$Companion.loadLanguageModel(LanguageDetector.kt:530)
	at com.github.pemistahl.lingua.api.LanguageDetector$Companion.loadLanguageModels(LanguageDetector.kt:520)
	at com.github.pemistahl.lingua.api.LanguageDetector$Companion.access$loadLanguageModels(LanguageDetector.kt:501)
	at com.github.pemistahl.lingua.api.LanguageDetector.lookUpNgramProbability-as5wtIs$lingua(LanguageDetector.kt:467)
	at com.github.pemistahl.lingua.api.LanguageDetector.computeSumOfNgramProbabilities$lingua(LanguageDetector.kt:442)
	at com.github.pemistahl.lingua.api.LanguageDetector.computeLanguageProbabilities$lingua(LanguageDetector.kt:429)
	at com.github.pemistahl.lingua.api.LanguageDetector.computeLanguageConfidenceValues$lambda-4$lambda-3(LanguageDetector.kt:142)
	at java.base/java.util.concurrent.ForkJoinTask$AdaptedInterruptibleCallable.exec(ForkJoinTask.java:1461)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
access: domain that failed ProtectionDomain  null
 null
 <no principals>
 null

The domain that did not have the required permissions was created by the ForkJoinPool. When using a SecurityManager, the commonPool uses a factory that creates InnocuousForkJoinWorkerThread instances which have that ProtectionDomain in their AccessControlContext.

I can see two solutions:

  1. Do not use a commonPool, but a dedicated one. This has the advantage of being a good citizen and not blocking the common pool for other libraries.
  2. Use AccessController.doPrivileged(...) to around the call to load the language models.

Hi Cristian @ctalau, thank you for this bug report. I will try to fix this bug soon.

Hi @ctalau, I was able to reproduce your problem and wrapped the respective code with AccessController.doPrivileged() which resolves it. I'm going to release Lingua 1.2.2 soon which includes this fix.

Out of curiosity, if the issue occurred while loading the language models, does the fix then also help when using LanguageDetectorBuilder.withPreloadedLanguageModels()?