dbpedia/virtuoso-sparql-endpoint-quickstart

Clarification about docker configuration beyond Quickstart

Closed this issue · 2 comments

The Quickstart instructions are easy and straightforward. But after the data is successfully downloaded and loaded, should this docker-compose configuration be used as-is every time to start the endpoint? Or should the download and/or load containers be disabled?

If I restart the default docker-compose configuration after successfully downloading and loading all of the data, it starts to re-download everything. If I disable the download container, the load container still seems to assume there is work to be done. If I disable both download and load containers, I am surprised to find the store container still pushes two CPU cores to 100% utilization for 20-30 minutes with zero server requests being made (with "High disk read" log messages below). This is all after receiving a "successfully loaded" log message in a previous run. Is there something I need to do to tell the store container that nothing more needs to be done?

store_1  | 14:24:04 Server online at 1111 (pid 1)
store_1  | 14:34:08 * Monitor: High disk read (1)
store_1  | 14:36:10 * Monitor: High disk read (1)
store_1  | 14:38:12 * Monitor: High disk read (1)
store_1  | 14:40:16 * Monitor: High disk read (1)
store_1  | 14:42:17 * Monitor: High disk read (1)
...

This is a very good question and I used to wonder the same thing when I used a local instance ~ 1 year ago. I believe that the load container reads all files in your downloads folder and processes them. So for subsequent starts, you are right to not start the download container, but you will also want to (re)move the downloaded files beforehand. Then, store and load work alongside just fine without any noticable idle cpu usage.

Confirmed! After successfully loading everything, I did as you suggested:

  1. Moved the downloaded files.
  2. Disabled the download container, leaving just store and load enabled.
  3. Ran docker-compose up again.

Now it starts up without any extra CPU usage.