duydo/elasticsearch-analysis-vietnamese

Error cannot initialize Tokenizer: /usr/local/share/tokenizer/dicts

nntruong02069999 opened this issue · 2 comments

Em chào anh,
Em run elasticsearch với docker , đã cài coccoc-tokenizer và test command được rồi ạ
Đây là lỗi khi em run images lên :

Cannot open file for reading /usr/local/share/tokenizer/dicts/multiterm_trie.dump
es01     | {"type": "server", "timestamp": "2021-06-30T03:08:14,237Z", "level": "WARN", "component": "r.suppressed", "cluster.name": "docker-cluster", "node.name": "7bd45b102d38", "message": "path: /_template/.management-beats, params: {include_type_name=true, name=.management-beats}", "cluster.uuid": "3KJAmJtuSSmhztSpxzZpJA", "node.id": "l8P14CGCQmmwoSCvFqyD7A" , 
es01     | "stacktrace": ["java.lang.RuntimeException: Cannot initialize Tokenizer: /usr/local/share/tokenizer/dicts",
es01     | "at com.coccoc.Tokenizer.<init>(Tokenizer.java:44) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizerImpl.lambda$new$0(VietnameseTokenizerImpl.java:54) ~[?:?]",
es01     | "at java.security.AccessController.doPrivileged(AccessController.java:312) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizerImpl.<init>(VietnameseTokenizerImpl.java:53) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseTokenizer.<init>(VietnameseTokenizer.java:45) ~[?:?]",
es01     | "at org.apache.lucene.analysis.vi.VietnameseAnalyzer.createComponents(VietnameseAnalyzer.java:88) ~[?:?]",
es01     | "at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:136) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]",
es01     | "at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:199) ~[lucene-core-8.8.0.jar:8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:07:45]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.checkVersions(AnalysisRegistry.java:637) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.produceAnalyzer(AnalysisRegistry.java:601) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:520) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.analysis.AnalysisRegistry.build(AnalysisRegistry.java:207) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.index.IndexModule.newIndexService(IndexModule.java:431) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:663) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:566) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateTemplate(MetadataIndexTemplateService.java:1199) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.access$300(MetadataIndexTemplateService.java:80) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$6.execute(MetadataIndexTemplateService.java:714) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:48) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:691) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:313) ~[elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:208) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:62) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:140) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:139) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:177) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:673) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:241) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:204) [elasticsearch-7.12.1.jar:7.12.1]",
es01     | "at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]",
es01     | "at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]",
es01     | "at java.lang.Thread.run(Thread.java:831) [?:?]"] }

Docker file em build image ạ

#Dockerfile
FROM elasticsearch:7.12.1

COPY elasticsearch-analysis-vietnamese-7.12.1.zip /usr/share/elasticsearch/

COPY libcoccoc_tokenizer_jni.so /usr/lib64

RUN cd /usr/share/elasticsearch && \
    bin/elasticsearch-plugin install file:///usr/share/elasticsearch/elasticsearch-analysis-vietnamese-7.12.1.zip && \
    bin/elasticsearch-plugin install analysis-icu
    

Copy the following files into directory /usr/local/share/tokenizer/dicts of container:

acronyms alphabetic chemical_comp d_and_gi.txt Freq2NontoneUniFile i_and_y.txt keyword.freq multiterm_trie.dump nontone_pair_freq nontone_pair_freq_map.dump numeric special_token.strong special_token.weak syllable_trie.dump vndic_multiterm

There are some redundant files but I am not sure which ones.