MariaDB/mariadb-docker

Clarification about import via /docker-entrypoint-initdb.d

Closed this issue · 6 comments

I maintain the Alpine Linux version of MariaDB https://github.com/wodby/mariadb and use the entrypoint script from this repository for init. Recently I've identified the issue when MariaDB for some reason does not rebuild indexes and acts differently when a dump imported via /docker-entrypoint-initdb.d bind mount as opposed to mysql db < file.sql import after the full init/start. Described in details at wodby/docker4drupal#341. I was also able to reproduce this issue with vanilla WordPress.

So I'm not sure whether the import after init and before start is a safe method of importing. But maybe you have some explanation for this behavior.

If you look at the entrypoint, what you describe is exactly how the import behavior works today:

  1. we start mysqld (normally, with all arguments and flags the user passed and an explicit --socket value and --skip-networking so mysqld doesn't become available outside the container prematurely): https://github.com/docker-library/mariadb/blob/e62e2b5bd3b34da3596bc80b40513e9bded8ec5a/10.3/docker-entrypoint.sh#L97-L99

  2. import *.sql files via mysql foo < bar.sql: https://github.com/docker-library/mariadb/blob/e62e2b5bd3b34da3596bc80b40513e9bded8ec5a/10.3/docker-entrypoint.sh#L170-L179

  3. shut down our temporary mysqld instance: https://github.com/docker-library/mariadb/blob/e62e2b5bd3b34da3596bc80b40513e9bded8ec5a/10.3/docker-entrypoint.sh#L181-L184

  4. fire up the "real" mysqld instance (now with networking appropriately enabled): https://github.com/docker-library/mariadb/blob/e62e2b5bd3b34da3596bc80b40513e9bded8ec5a/10.3/docker-entrypoint.sh#L192

Thank you for the response.

Yes, I know how it works, let me show you how to reproduce the issue with the latest MariaDB 10.2. I'm attaching a vanilla Drupal 7 clean db dump after a standard installation. d7.sql.gz, nothing special in this dump, just a real world example, it may be any db dump that have enough records in a table and the structure to build indexes.

Import via init bind mount

First, import the dump via the init bind mount:

docker run --rm -d -e MYSQL_DATABASE=drupal -e MYSQL_ROOT_PASSWORD=password -v ./d7.sql:/docker-entrypoint-initdb.d/d7.sql mariadb:10.2
docker exec -ti [CID] sh
mysql -uroot -ppassword drupal
show index from system;
# Cardinality column is all 0
# Run select from the table for a few times to trigger index rebuild:
select * from system;
# Run a few times.
show index from system;
# Cardinality is still 0

As you can see indexes weren't built.

Manual post-start import

Mount the db dump as .txt to skip init

docker run --rm -d -e MYSQL_DATABASE=drupal -e MYSQL_ROOT_PASSWORD=password -v ./d7.sql:/docker-entrypoint-initdb.d/d7.txt mariadb:10.2
docker exec -ti [CID] sh
# Manually import dump after MariaDB will initialize and start.
mysql -uroot -ppassword drupal < /docker-entrypoint-initdb.d/d7.txt
mysql -uroot -ppassword drupal

Run SQL queries from the previous example, you'll see that after a few selects (and a few seconds) MariaDB will build indexes and cardinality column won't be 0.

I believe it may have something to do with the full start of MariaDB.

My docker version:

Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        e68fc7a
 Built:             Tue Aug 21 17:21:31 2018
 OS/Arch:           darwin/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       e68fc7a
  Built:            Tue Aug 21 17:29:02 2018
  OS/Arch:          linux/amd64
  Experimental:     true
 Kubernetes:
  Version:          v1.10.3
  StackAPI:         v1beta2

My docker info:

Server Version: 18.06.1-ce
Storage Driver: overlay2
 Backing Filesystem: extfs
 Supports d_type: true
 Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: fec3683
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.9.93-linuxkit-aufs
Operating System: Docker for Mac
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 1.952GiB
Name: linuxkit-025000000001
ID: Q4AW:SBZG:NWA4:TIRY:MLDW:TGVL:MWMH:H26Y:PEET:W2XJ:6TXM:4YEP
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 172
 Goroutines: 187
 System Time: 2018-11-14T05:08:29.0688101Z
 EventsListeners: 2
HTTP Proxy: gateway.docker.internal:3128
HTTPS Proxy: gateway.docker.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

Ok, figured out what's going on here.

The dump you've provided explicitly locks the table and disables indexes while dumping data (which greatly speeds up the data insertion process), after which it then re-enables indexes. Normally, mysqld would stay running after doing so, which gives it time to slowly rebuild those indexes over time.

In our case, we almost immediately stop (gracefully) mysqld to start it back up again, and it seems to forget that it was trying to rebuild indexes and doesn't get back around to it.

If I run ANALYZE TABLE system; (which appears to be the official way to ask MySQL to rebuild indexes; https://stackoverflow.com/a/30051747/433558), the indexes have appropriate Cardinality values again.

So all-in-all, I'm thinking there's really not much we can do here except recommend that you run ANALYZE TABLE after your imports (or adjust them to not disable indexes) to force mysqld to actually build those indexes properly.

For mysqldump, that'd be:

  -K, --disable-keys  '/*!40000 ALTER TABLE tb_name DISABLE KEYS */; and
                      '/*!40000 ALTER TABLE tb_name ENABLE KEYS */; will be put
                      in the output.
                      (Defaults to on; use --skip-disable-keys to disable.)

(But I think adding ANALYZE TABLE lines explicitly is the better solution -- I wish mysqldump had a flag for that.)

Closing, since I don't think there's anything else we can reasonably do here automatically (as far as I know, mysqld doesn't have an alternative "clean shutdown" method that also allows any in-progress reindexing to complete first -- we're already using the only "clean shutdown" method we're aware of).

Thank you for looking at this.

Regarding the --disable-keys option:

'/*!40000 ALTER TABLE tb_name DISABLE KEYS /; and '/!40000 ALTER TABLE tb_name ENABLE KEYS */; will be put in the output. This makes loading the dump file faster because the indexes are created after all rows are inserted. This option is effective only for non-unique indexes of MyISAM tables. Disable with --skip-disable-keys.

So assumming most people use InnoDB, ANALYZE TABLE is the only good option.

from https://mariadb.com/kb/en/library/mysqldump/