dbmdz/solr-ocrhighlighting

Error while trying to run the sample

javiermanzano opened this issue · 8 comments

Hi,

We've been trying to follow the instructions to run the sample locally and we've had this error:

example git:(main) ✗ ./ingest.py 
Indexing BNL/L'Union articles
Downloading missing BNL/L'Union issues to data/bnl_lunion
42000/41446concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1346, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1253, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1299, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1248, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1008, in _send_output
    self.send(msg)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 948, in send
    self.connect()
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 919, in connect
    self.sock = self._create_connection(
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/socket.py", line 843, in create_connection
    raise err
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/socket.py", line 831, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/paroar/solr-ocrhighlighting/example/ingest.py", line 219, in index_documents
    resp = request.urlopen(req)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 517, in open
    response = self._open(req, data)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 534, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
    result = func(*args)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1375, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1349, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 61] Connection refused>
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/paroar/solr-ocrhighlighting/example/./ingest.py", line 244, in <module>
    fut.result()
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/usr/local/Cellar/python@3.9/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
urllib.error.URLError: <urlopen error [Errno 61] Connection refused>
➜  example git:(main) ✗ 

This causes the docker container to crash:

solr_1        | 2021-05-13 11:24:42.877 INFO  (qtp322561962-51) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/system params={wt=json&_=1620905074800} status=0 QTime=15
solr_1        | 2021-05-13 11:28:32.426 INFO  (searcherExecutor-15-thread-1-processing-x:ocr) [   x:ocr] o.a.s.c.SolrCore [ocr]  Registered new searcher autowarm time: 17 ms
solr_1        | 2021-05-13 11:28:32.451 INFO  (qtp322561962-25) [   x:ocr] o.a.s.u.p.LogUpdateProcessorFactory [ocr]  webapp=/solr path=/update params={softCommit=true}{add=[1533660_1860-11-14-1, 1533660_1860-11-14-2, 1533660_1860-11-14-3, 1533660_1860-11-14-4, 1533660_1860-11-14-5, 1533660_1860-11-14-6, 1533660_1860-11-14-7, 1533660_1860-11-14-8, 1533660_1860-11-14-9, 1533660_1860-11-14-10, ... (1000 adds)],commit=} 0 22908
example_solr_1 exited with code 137

We are running on python 3.9 and a Mac OS. Although it shouldn't be a problem as we are running a docker container.

I appreciate your help :)

Also having an issue running the example on ubuntu 18.04, python 3.6

./ingest.py 
Indexing BNL/L'Union articles
Downloading missing BNL/L'Union issues to data/bnl_lunion
01000/41446Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.6/concurrent/futures/process.py", line 175, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "./ingest.py", line 219, in index_documents
    resp = request.urlopen(req)
  File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.6/urllib/request.py", line 532, in open
    response = meth(req, response)
  File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.6/urllib/request.py", line 570, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Server Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/concurrent/futures/process.py", line 178, in _process_worker
    result_queue.put(_ResultItem(call_item.work_id, exception=exc))
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 341, in put
    obj = _ForkingPickler.dumps(obj)
  File "/usr/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: cannot serialize '_io.BufferedReader' object
Traceback (most recent call last):
  File "./ingest.py", line 241, in <module>
    futs.append(pool.submit(index_documents, batch))
  File "/usr/lib/python3.6/concurrent/futures/process.py", line 452, in submit
    raise BrokenProcessPool('A child process terminated '
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

Thank you for reporting and sorry about that! Will be fixed ASAP.

So I just tried to reproduce both these issues locally but had no luck (or misfortune, as you will), the ingest works without problems and I can query the index with the included web interface.

@javiermanzano @mustard123

  • Are you both following the instructions in the example/README.md or do you have customizations?
  • Can you check (e.g. with curl and/or the browser) if you can access Solr from the environment you're running the ingest.py script in?
  • Is there anything suspicious in the Solr log (docker-compose logs)? Especially @mustard123, it'd be great to know what causes the 500 on the Solr side.
  • Do you have sufficient disk space available? The example needs at least 16GiB to store the documents and the index

@jbaiter sorry for the late response and thanks for your follow-up. I have enough disk space (more than 80GB), the docker-compose logs shows the follwing:
Start up seems ok I guess:

Starting example_solr_1 ... done
Starting example_iiif-prezi_1 ... done
Starting example_frontend_1 ... done
Attaching to example_iiif-prezi_1, example_frontend_1, example_solr_1
iiif-prezi_1 | [2021-06-05 16:12:45 +0000] [1] [INFO] Goin' Fast @ http://0.0.0.0:8008
frontend_1 | /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration
frontend_1 | /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/
iiif-prezi_1 | [2021-06-05 16:12:45 +0000] [1] [INFO] Starting worker [1]
solr_1 | Executing /opt/docker-solr/scripts/solr-precreate ocr /opt/core-config
frontend_1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh
solr_1 | Executing /opt/docker-solr/scripts/precreate-core ocr /opt/core-config
solr_1 | Core ocr already exists
frontend_1 | 10-listen-on-ipv6-by-default.sh: info: Getting the checksum of /etc/nginx/conf.d/default.conf
solr_1 | Starting Solr
solr_1 | The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location
solr_1 | where java was found but jstack was not found. Continuing.
frontend_1 | 10-listen-on-ipv6-by-default.sh: info: /etc/nginx/conf.d/default.conf differs from the packaged version
frontend_1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh
frontend_1 | /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh
frontend_1 | /docker-entrypoint.sh: Configuration complete; ready for start up
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: using the "epoll" event method
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: nginx/1.21.0
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: built by gcc 10.2.1 20201203 (Alpine 10.2.1_pre1)
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: OS: Linux 5.4.0-73-generic
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1024:4096
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker processes
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 31
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 32
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 33
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 34
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 35
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 36
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 37
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 38
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 39
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 40
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 41
frontend_1 | 2021/06/05 16:12:44 [notice] 1#1: start worker process 42
solr_1 | *** [WARN] *** Your open file limit is currently 1024.
solr_1 | It should be set to 65000 to avoid operational disruption.
solr_1 | If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
solr_1 | *** [WARN] *** Your Max Processes Limit is currently 62247.
solr_1 | It should be set to 65000 to avoid operational disruption.
solr_1 | If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 1)
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
solr_1 | OpenJDK 64-Bit Server VM warning: Failed to reserve shared memory. (error = 12)
solr_1 | Listening for transport dt_socket at address: 1044
solr_1 | 2021-06-05 16:12:46.369 INFO (main) [ ] o.e.j.u.log Logging initialized @776ms to org.eclipse.jetty.util.log.Slf4jLog
solr_1 | 2021-06-05 16:12:46.423 WARN (main) [ ] o.e.j.x.XmlConfiguration Ignored arg:
solr_1 |
solr_1 | solr.jetty
solr_1 |
solr_1 |
solr_1 | 2021-06-05 16:12:46.503 INFO (main) [ ] o.e.j.s.Server jetty-9.4.27.v20200227; built: 2020-02-27T18:37:21.340Z; git: a304fd9f351f337e7c0e2a7c28878dd536149c6c; jvm 11.0.11+9
solr_1 | 2021-06-05 16:12:46.519 INFO (main) [ ] o.e.j.d.p.ScanningAppProvider Deployment monitor [file:///opt/solr-8.7.0/server/contexts/] at interval 0
solr_1 | 2021-06-05 16:12:46.743 INFO (main) [ ] o.e.j.w.StandardDescriptorProcessor NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
solr_1 | 2021-06-05 16:12:46.751 INFO (main) [ ] o.e.j.s.session DefaultSessionIdManager workerName=node0
solr_1 | 2021-06-05 16:12:46.751 INFO (main) [ ] o.e.j.s.session No SessionScavenger set, using defaults
solr_1 | 2021-06-05 16:12:46.752 INFO (main) [ ] o.e.j.s.session node0 Scavenging every 660000ms
solr_1 | 2021-06-05 16:12:46.790 INFO (main) [ ] o.a.s.s.SolrDispatchFilter Using logger factory org.apache.logging.slf4j.Log4jLoggerFactory
solr_1 | 2021-06-05 16:12:46.794 INFO (main) [ ] o.a.s.s.SolrDispatchFilter ___ _ Welcome to Apache Solr™ version 8.7.0
solr_1 | 2021-06-05 16:12:46.794 INFO (main) [ ] o.a.s.s.SolrDispatchFilter / | | | _ Starting in standalone mode on port 8983
solr_1 | 2021-06-05 16:12:46.794 INFO (main) [ ] o.a.s.s.SolrDispatchFilter _
/ _ \ | '
| Install dir: /opt/solr
solr_1 | 2021-06-05 16:12:46.794 INFO (main) [ ] o.a.s.s.SolrDispatchFilter |
/__/|| Start time: 2021-06-05T16:12:46.794677Z
solr_1 | 2021-06-05 16:12:46.813 INFO (main) [ ] o.a.s.c.SolrPaths Using system property solr.solr.home: /var/solr/data
solr_1 | 2021-06-05 16:12:46.817 INFO (main) [ ] o.a.s.c.SolrXmlConfig Loading container configuration from /var/solr/data/solr.xml
solr_1 | 2021-06-05 16:12:46.877 INFO (main) [ ] o.a.s.c.SolrXmlConfig MBean server found: com.sun.jmx.mbeanserver.JmxMBeanServer@4f2410ac, but no JMX reporters were configured - adding default JMX reporter.
solr_1 | 2021-06-05 16:12:47.446 INFO (main) [ ] o.a.s.h.c.HttpShardHandlerFactory Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, whitelistHostCheckingEnabled=true]
solr_1 | 2021-06-05 16:12:47.548 WARN (main) [ ] o.e.j.u.s.S.config Trusting all certificates configured for Client@732c9b5c[provider=null,keyStore=null,trustStore=null]
solr_1 | 2021-06-05 16:12:47.549 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for Client@732c9b5c[provider=null,keyStore=null,trustStore=null]
solr_1 | 2021-06-05 16:12:47.650 WARN (main) [ ] o.e.j.u.s.S.config Trusting all certificates configured for Client@2e51d054[provider=null,keyStore=null,trustStore=null]
solr_1 | 2021-06-05 16:12:47.650 WARN (main) [ ] o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for Client@2e51d054[provider=null,keyStore=null,trustStore=null]
solr_1 | 2021-06-05 16:12:47.683 WARN (main) [ ] o.a.s.c.CoreContainer Not all security plugins configured! authentication=disabled authorization=disabled. Solr is only as secure as you make it. Consider configuring authentication/authorization before exposing Solr to users internal or external. See https://s.apache.org/solrsecurity for more info
solr_1 | 2021-06-05 16:12:47.783 INFO (main) [ ] o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 transient cores
solr_1 | 2021-06-05 16:12:47.785 INFO (main) [ ] o.a.s.h.a.MetricsHistoryHandler No .system collection, keeping metrics history in memory.
solr_1 | 2021-06-05 16:12:47.838 INFO (main) [ ] o.a.s.m.r.SolrJmxReporter JMX monitoring for 'solr.node' (registry 'solr.node') enabled at server: com.sun.jmx.mbeanserver.JmxMBeanServer@4f2410ac
solr_1 | 2021-06-05 16:12:47.838 INFO (main) [ ] o.a.s.m.r.SolrJmxReporter JMX monitoring for 'solr.jvm' (registry 'solr.jvm') enabled at server: com.sun.jmx.mbeanserver.JmxMBeanServer@4f2410ac
solr_1 | 2021-06-05 16:12:47.843 INFO (main) [ ] o.a.s.m.r.SolrJmxReporter JMX monitoring for 'solr.jetty' (registry 'solr.jetty') enabled at server: com.sun.jmx.mbeanserver.JmxMBeanServer@4f2410ac
solr_1 | 2021-06-05 16:12:47.863 INFO (main) [ ] o.a.s.c.CorePropertiesLocator Found 1 core definitions underneath /var/solr/data
solr_1 | 2021-06-05 16:12:47.863 INFO (main) [ ] o.a.s.c.CorePropertiesLocator Cores are: [ocr]
solr_1 | 2021-06-05 16:12:47.870 ERROR (coreContainerWorkExecutor-2-thread-1) [ ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup => java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [ocr]
solr_1 | at java.base/java.util.concurrent.FutureTask.report(Unknown Source)
solr_1 | java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [ocr]
solr_1 | at java.util.concurrent.FutureTask.report(Unknown Source) ~[?:?]
solr_1 | at java.util.concurrent.FutureTask.get(Unknown Source) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$15(CoreContainer.java:881) ~[?:?]
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180) ~[metrics-core-4.1.5.jar:4.1.5]
solr_1 | at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:?]
solr_1 | at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
solr_1 | at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218) ~[?:?]
solr_1 | at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
solr_1 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
solr_1 | at java.lang.Thread.run(Unknown Source) [?:?]
solr_1 | Caused by: org.apache.solr.common.SolrException: Unable to create core [ocr]
solr_1 | at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1420) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$14(CoreContainer.java:852) ~[?:?]
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) ~[metrics-core-4.1.5.jar:4.1.5]
solr_1 | ... 5 more
solr_1 | Caused by: org.apache.solr.common.SolrException: Could not load conf for core ocr: Error loading solr config from /var/solr/data/ocr/conf/solrconfig.xml
solr_1 | at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:88) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1393) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$14(CoreContainer.java:852) ~[?:?]
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) ~[metrics-core-4.1.5.jar:4.1.5]
solr_1 | ... 5 more
solr_1 | Caused by: org.apache.solr.common.SolrException: Error loading solr config from /var/solr/data/ocr/conf/solrconfig.xml
solr_1 | at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:159) ~[?:?]
solr_1 | at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:111) ~[?:?]
solr_1 | at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1393) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$14(CoreContainer.java:852) ~[?:?]
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) ~[metrics-core-4.1.5.jar:4.1.5]
solr_1 | ... 5 more
solr_1 | Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/var/solr/data/ocr'
solr_1 | at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:388) ~[?:?]
solr_1 | at org.apache.solr.core.XmlConfigFile.(XmlConfigFile.java:124) ~[?:?]
solr_1 | at org.apache.solr.core.SolrConfig.(SolrConfig.java:175) ~[?:?]
solr_1 | at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:151) ~[?:?]
solr_1 | at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:111) ~[?:?]
solr_1 | at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1393) ~[?:?]
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$14(CoreContainer.java:852) ~[?:?]
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) ~[metrics-core-4.1.5.jar:4.1.5]
solr_1 | ... 5 more
solr_1 | 2021-06-05 16:12:47.915 INFO (main) [ ] o.e.j.s.h.ContextHandler Started o.e.j.w.WebAppContext@1cd201a8{/solr,file:///opt/solr-8.7.0/server/solr-webapp/webapp/,AVAILABLE}{/opt/solr-8.7.0/server/solr-webapp/webapp}
solr_1 | 2021-06-05 16:12:47.924 INFO (main) [ ] o.e.j.s.AbstractConnector Started ServerConnector@e84a8e1{HTTP/1.1, (http/1.1, h2c)}{0.0.0.0:8983}
solr_1 | 2021-06-05 16:12:47.924 INFO (main) [ ] o.e.j.s.Server Started @2331ms

But as soon as i run ./ingest.py the follwing gets logged:

solr_1 | 2021-06-05 16:07:35.730 ERROR (qtp532048323-23) [ ] o.a.s.s.HttpSolrCall null:org.apache.solr.core.SolrCoreInitializationException: SolrCore 'ocr' is not available due to init failure: Could not load conf for core ocr: Error loading solr config from /var/solr/data/ocr/conf/solrconfig.xml
solr_1 | at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1898)
solr_1 | at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1871)
solr_1 | at org.apache.solr.servlet.HttpSolrCall.init(HttpSolrCall.java:258)
solr_1 | at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:526)
solr_1 | at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:415)
solr_1 | at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:345)
solr_1 | at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
solr_1 | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
solr_1 | at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
solr_1 | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
solr_1 | at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1610)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
solr_1 | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1300)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
solr_1 | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
solr_1 | at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1580)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
solr_1 | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1215)
solr_1 | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
solr_1 | at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
solr_1 | at org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177)
solr_1 | at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
solr_1 | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
solr_1 | at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
solr_1 | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
solr_1 | at org.eclipse.jetty.server.Server.handle(Server.java:500)
solr_1 | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:383)
solr_1 | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:547)
solr_1 | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:375)
solr_1 | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:273)
solr_1 | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
solr_1 | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
solr_1 | at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
solr_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
solr_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
solr_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
solr_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
solr_1 | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
solr_1 | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
solr_1 | at java.base/java.lang.Thread.run(Unknown Source)
solr_1 | Caused by: org.apache.solr.common.SolrException: Could not load conf for core ocr: Error loading solr config from /var/solr/data/ocr/conf/solrconfig.xml
solr_1 | at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:88)
solr_1 | at org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1393)
solr_1 | at org.apache.solr.core.CoreContainer.lambda$load$14(CoreContainer.java:852)
solr_1 | at com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202)
solr_1 | at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
solr_1 | at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:218)
solr_1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
solr_1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
solr_1 | ... 1 more
solr_1 | Caused by: org.apache.solr.common.SolrException: Error loading solr config from /var/solr/data/ocr/conf/solrconfig.xml
solr_1 | at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:159)
solr_1 | at org.apache.solr.core.ConfigSetService.createSolrConfig(ConfigSetService.java:111)
solr_1 | at org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83)
solr_1 | ... 8 more
solr_1 | Caused by: org.apache.solr.core.SolrResourceNotFoundException: Can't find resource 'solrconfig.xml' in classpath or '/var/solr/data/ocr'
solr_1 | at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:388)
solr_1 | at org.apache.solr.core.XmlConfigFile.(XmlConfigFile.java:124)
solr_1 | at org.apache.solr.core.SolrConfig.(SolrConfig.java:175)
solr_1 | at org.apache.solr.core.SolrConfig.readFromResourceLoader(SolrConfig.java:151)
solr_1 | ... 10 more
solr_1 |

Screenshot of the solr dashboard in my browser
image

@javiermanzano hi, I think your issue is not with OCR Highlight plugin but with your Solr Core initialization. the OCR core is not being created/started because /var/solr/data/ocr/conf/solrconfig.xml is not being found (or may be incorrect). How are you mounting inside your Docker Container the initial configuration/conf and data for Solr to create/initialize the Core?

Your logs show that initially it "finds" the core but that may be just the base folder, but fails because of missing conf/*. or you may even have permission issues (on Linux the user/gid needs to be the same as the default port: 8983)

https://github.com/docker-solr/docker-solr#running-solr-with-host-mounted-directories

If you are running this via docker-compose there is info in that GitHub readme too.

Here is an example Docker-compose snippet:

  solr:
    container_name: your-solr
    restart: always
    image: "solr:8.8.2"
    tty: true
    ports:
      - "8983:8983"
    networks:
      - host-net
      - internal-net
    volumes:
      - ${PWD}/persistent/solrcore:/var/solr/data:cached
      - ${PWD}/persistent/solrconfig:/ocrconfig:cached
      - ${PWD}/persistent/solrlib:/opt/solr/contrib/ocrhighlight/lib:cached
    entrypoint:
      - docker-entrypoint.sh
      - solr-precreate
      - ocr
      - /ocrconfig
 # see https://hub.docker.com/_/mysql/

This snipped

  • will read the Core Config from ${PWD}/persistent/solrconfig and mount it as /ocrconfig inside the container
  • Precreate (if it does not exist) a core named ocr from the Core Config found in /ocrconfig
  • And store the result/created Core/mount shared back to the host on ${PWD}/persistent/solrcore
  • Load the OCR solr-ocrhighlighting plugin from ${PWD}/persistent/solrlib and mount inside /opt/solr/contrib/ocrhighlight/lib for Solr to find it.

What @DiegoPino said is correct, the 500 is caused by a missing Solr configuration. Are you running the example as described in the README?
Maybe try these steps to tear down any existing containers and rebuild them from scratch:

$ cd ./example
$ docker-compose down -v
$ docker-compose up --build --force-recreate

Let me know if this helps!

Closing this for inactivity, will reopen when there have been updates.