neo4j/docker-neo4j

Docker container 4.0.7 failing to start

Closed this issue · 5 comments

j616 commented

Problem

The neo4j Docker image has started failing to start with a permissions error on logs. I'm specifying version 4.0 which currently maps to 4.0.7. I am not mapping any volumes. My builds worked yesterday when 4.0.6 was the latest 4.0 version.

Steps to reproduce

Relevant docker-compose:-

version: '3.8'

services:
  neo4j:
    image: neo4j:4.0
    environment:
      - NEO4J_AUTH=foo/bar
      - NEO4J_apoc_export_file_enabled=true
      - NEO4J_apoc_import_file_enabled=true
      - NEO4J_apoc_import_file_use__neo4j__config=true
      - NEO4JLABS_PLUGINS=["apoc"]

Expected behaviour

Neo4j Docker container starts.

Actual behaviour

Container fails to start with the following output:-

neo4j_1  | Warning: Folder mounted to "/data/dbms" is not writable from inside container. Changing folder owner to neo4j.
neo4j_1  | Changed password for user 'neo4j'.
neo4j_1  | Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json
neo4j_1  | Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.0.0.17/apoc-4.0.0.17-all.jar to /var/lib/neo4j/plugins/apoc.jar 
neo4j_1  | Applying default values for plugin apoc to neo4j.conf
neo4j_1  | Directories in use:
neo4j_1  |   home:         /var/lib/neo4j
neo4j_1  |   config:       /var/lib/neo4j/conf
neo4j_1  |   logs:         /logs
neo4j_1  |   plugins:      /var/lib/neo4j/plugins
neo4j_1  |   import:       /var/lib/neo4j/import
neo4j_1  |   data:         /var/lib/neo4j/data
neo4j_1  |   certificates: /var/lib/neo4j/certificates
neo4j_1  |   run:          /var/lib/neo4j/run
neo4j_1  | Starting Neo4j.
neo4j_1  | 2020-07-22 08:09:05.792+0000 INFO  ======== Neo4j 4.0.7 ========
neo4j_1  | 2020-07-22 08:09:05.798+0000 INFO  Starting...
neo4j_1  | 2020-07-22 08:09:06.138+0000 ERROR Failed to start Neo4j: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService@38af9828' was successfully initialized, but failed to start. Please see the attached cause exception "/logs/debug.log (Permission denied)". Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService@38af9828' was successfully initialized, but failed to start. Please see the attached cause exception "/logs/debug.log (Permission denied)".
neo4j_1  | org.neo4j.server.ServerStartupException: Starting Neo4j failed: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService@38af9828' was successfully initialized, but failed to start. Please see the attached cause exception "/logs/debug.log (Permission denied)".
neo4j_1  | 	at org.neo4j.server.exception.ServerStartupErrors.translateToServerStartupError(ServerStartupErrors.java:45)
neo4j_1  | 	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:164)
neo4j_1  | 	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:114)
neo4j_1  | 	at org.neo4j.server.ServerBootstrapper.start(ServerBootstrapper.java:89)
neo4j_1  | 	at org.neo4j.server.CommunityEntryPoint.main(CommunityEntryPoint.java:35)
neo4j_1  | Caused by: org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.server.database.LifecycleManagingDatabaseService@38af9828' was successfully initialized, but failed to start. Please see the attached cause exception "/logs/debug.log (Permission denied)".
neo4j_1  | 	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:465)
neo4j_1  | 	at org.neo4j.kernel.lifecycle.LifeSupport.start(LifeSupport.java:111)
neo4j_1  | 	at org.neo4j.server.AbstractNeoServer.start(AbstractNeoServer.java:157)
neo4j_1  | 	... 3 more
neo4j_1  | Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /logs/debug.log (Permission denied)
neo4j_1  | 	at org.neo4j.graphdb.factory.module.GlobalModule.createLogService(GlobalModule.java:350)
neo4j_1  | 	at org.neo4j.graphdb.factory.module.GlobalModule.<init>(GlobalModule.java:162)
neo4j_1  | 	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.createGlobalModule(DatabaseManagementServiceFactory.java:214)
neo4j_1  | 	at org.neo4j.graphdb.facade.DatabaseManagementServiceFactory.build(DatabaseManagementServiceFactory.java:119)
neo4j_1  | 	at org.neo4j.server.database.CommunityGraphFactory.newDatabaseManagementService(CommunityGraphFactory.java:36)
neo4j_1  | 	at org.neo4j.server.database.LifecycleManagingDatabaseService.start(LifecycleManagingDatabaseService.java:88)
neo4j_1  | 	at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.start(LifeSupport.java:444)
neo4j_1  | 	... 5 more
neo4j_1  | Caused by: java.io.FileNotFoundException: /logs/debug.log (Permission denied)
neo4j_1  | 	at java.base/java.io.FileOutputStream.open0(Native Method)
neo4j_1  | 	at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
neo4j_1  | 	at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
neo4j_1  | 	at org.neo4j.io.fs.DefaultFileSystemAbstraction.openAsOutputStream(DefaultFileSystemAbstraction.java:78)
neo4j_1  | 	at org.neo4j.io.fs.FileSystemUtils.createOrOpenAsOutputStream(FileSystemUtils.java:54)
neo4j_1  | 	at org.neo4j.logging.RotatingFileOutputStreamSupplier.openOutputFile(RotatingFileOutputStreamSupplier.java:338)
neo4j_1  | 	at org.neo4j.logging.RotatingFileOutputStreamSupplier.<init>(RotatingFileOutputStreamSupplier.java:137)
neo4j_1  | 	at org.neo4j.logging.RotatingFileOutputStreamSupplier.<init>(RotatingFileOutputStreamSupplier.java:121)
neo4j_1  | 	at org.neo4j.logging.internal.StoreLogService.<init>(StoreLogService.java:191)
neo4j_1  | 	at org.neo4j.logging.internal.StoreLogService$Builder.build(StoreLogService.java:131)
neo4j_1  | 	at org.neo4j.graphdb.factory.module.GlobalModule.createLogService(GlobalModule.java:346)
neo4j_1  | 	... 11 more
neo4j_1  | 2020-07-22 08:09:06.139+0000 INFO  Neo4j Server shutdown initiated by request
docker-tests_neo4j_1 exited with code 1

Neo4j image tag being used, eg neo4j:latest, neo4j:enterprise-3.5 etc

neo4j:4.0

The output of the docker version command

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:45:36 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:44:07 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Operating system: (for example Windows 95/Ubuntu 16.04)

Ubuntu 18.04

@j616 I tried running your reproduction code, but Neo4j 4.0.7 started as expected:

docker-compose up
Starting 269-permission-onstart_neo4j_1 ... done
Attaching to 269-permission-onstart_neo4j_1
neo4j_1  | Changed password for user 'neo4j'.
neo4j_1  | Fetching versions.json for Plugin 'apoc' from https://neo4j-contrib.github.io/neo4j-apoc-procedures/versions.json
neo4j_1  | Installing Plugin 'apoc' from https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/4.0.0.17/apoc-4.0.0.17-all.jar to /var/lib/neo4j/plugins/apoc.jar 
neo4j_1  | Applying default values for plugin apoc to neo4j.conf
neo4j_1  | Skipping dbms.security.procedures.unrestricted for plugin apoc because it is already set
neo4j_1  | Directories in use:
neo4j_1  |   home:         /var/lib/neo4j
neo4j_1  |   config:       /var/lib/neo4j/conf
neo4j_1  |   logs:         /logs
neo4j_1  |   plugins:      /var/lib/neo4j/plugins
neo4j_1  |   import:       /var/lib/neo4j/import
neo4j_1  |   data:         /var/lib/neo4j/data
neo4j_1  |   certificates: /var/lib/neo4j/certificates
neo4j_1  |   run:          /var/lib/neo4j/run
neo4j_1  | Starting Neo4j.
neo4j_1  | 2020-07-22 12:03:10.199+0000 INFO  ======== Neo4j 4.0.7 ========
neo4j_1  | 2020-07-22 12:03:10.205+0000 INFO  Starting...
neo4j_1  | 2020-07-22 12:03:18.717+0000 INFO  Called db.clearQueryCaches(): Query cache already empty.
neo4j_1  | 2020-07-22 12:03:18.787+0000 INFO  Bolt enabled on 0.0.0.0:7687.
neo4j_1  | 2020-07-22 12:03:18.788+0000 INFO  Started.
neo4j_1  | 2020-07-22 12:03:19.516+0000 INFO  Remote interface available at http://localhost:7474/

There are a couple of similar sounding active issues that you may be hitting though:

  • #223 where neo4j hits mysterious unreproducable permission denied errors when starting and it seems related to apoc somehow.
  • #266 where uid/gid changes between 4.0.6 and 4.0.7 caused permission errors on a named volume (doesn't happen on file mounts for some reason). Although you say you aren't mounting anything, the messages Warning: Folder mounted to "/data/dbms" is not writable from inside container. Changing folder owner to neo4j. and Caused by: java.io.FileNotFoundException: /logs/debug.log would suggest that there are /data and /logs folder being mounted.

So, my questions:

  1. are you SURE you're not mounting any volume/s, and if you are, are they named volumes or file mounts?
  2. what if you try starting your container without setting NEO4JLABS_PLUGINS?
j616 commented

Hello, thanks for your reply. I've managed to solve my issue by pruning containers. But for completeness:-

  1. Yes, I'm sure I'm not mounting any volumes/files. Or, to be specific, I'm not specifying any volume/file mounts in my docker-compose.yml file or in the commands I'm using to start the containers.
  2. I reduced my docker-compose file to the following and still had the issue
version: '3.8'

services:
  neo4j:
    image: neo4j:4.0
    environment:
      - NEO4J_AUTH=foo/bar

The specific commands I was using to start the container:-

docker-compose -f ./docker-tests/docker-compose.yml pull
docker-compose -f ./docker-tests/docker-compose.yml up

The pull being there to make sure I get the latest version, even if I already have on that satisfies the requirements available on my machine.

If you have any thoughts on why the prune would be needed in this case, I'd be grateful. But appreciate that "go and google it" is a legitimate reply 😅

Thanks for your help.

That is weird. Maybe one of the docker layers got corrupted/corrupted itself somehow?
I would have thought that if there are no shared mounts, then two running containers would be isolated and not break each other, but I don't know the specifics of how docker caching works.
Maybe I've been making an assumption that containerisation (and linux namespacing) automatically gives you compartmentalistaion for free.

j616 commented

I didn't have any other instances running so I don't think it'd be that. If anything, it'd be a between launches issue rather than a between running instances issue.

It could be that one of the Dockerfiles up the stack isn't invalidating the layer cache correctly? So a new version isn't being pulled because docker doesn't think there's been any changes at that layer? But purging will force the whole stack to be pulled. I know that can cause issues when building a container locally. I'm not sure if it can also cause an issue once deployed to the repositories.

Same happened to me, pruning solved the issue