ChangeSet applied twice
matesebi opened this issue · 2 comments
Last week I ran into the problem of DATABASECHANGELOGLOCK
getting stuck, and a service failing to restart, because Liquibase times out waiting for the lock. The service is running on Kubernetes with multiple replicas, a replica was restarted by the system, which is normal in Kubernetes, and for some reason it terminated in the middle of checking Liquibase change log, leaving the lock in the DB, so after that no new replicas could start.
I started to look for solutions to this, and this library seems like the perfect solution for the problem. I created a test scenario to reproduce the error I ran into. At first adding the dependency seemed to resolve the issue as expected, but on the second run a change set I added for testing that runs slow on purpose was applied twice by two different replicas.
Version info:
Spring Boot: 2.5.8
Liquibase: 4.3.5
liquibase-sessionlock: 1.6.3
mysql-connector-java: 8.0.27
MySQL: 5.7.42
The test scenario:
- Scale the older version of the service without the slow change set to
3
replicas, withRecreate
strategy, which means, all 3 replicas will start at the same time, and wait for the 3 replicas to start successfully - Add the slow change set and deploy a new version, so 1 replica will start to apply the change set, and the other two will wait for the lock
- Terminate the replica, that started the migration, and see what happens
Whithout liquibase-sessionlock the lock in DATABASECHANGELOGLOCK
got stuck, so no replicas could start successfully, so I could reproduce the problem.
I deleted the slow change set from the DATABASECHANGELOG
table, added the liquibase-sessionlock dependency, and deployed a new version.
After adding liquibase-sessionlock when the replica that started the migration was terminated, the other two replicas tried to acquire the session lock, both saw that it was successful, both read the DATABASECHANGELOG
table, then one of them returned to wait for the lock, the other applied the change set, and released the lock, the first one acquired the lock again, and applied the same change set, then released the lock, the third replica that started after the termination, acquired the lock, did not apply any changes and then released the lock.
The on purpose slow change set:
select sleep(120);
The logs after adding liquibase-sessionlock:
The first replica, that was terminated:
INFO 2023-08-29T11:04:51.853Z HikariPool-1 - Starting...
INFO 2023-08-29T11:04:52.315Z HikariPool-1 - Start completed.
INFO 2023-08-29T11:04:52.685Z Successfully acquired change log lock
INFO 2023-08-29T11:04:55.101Z Reading from rfid.DATABASECHANGELOG
INFO 2023-08-29T11:04:55.333Z Successfully released change log lock
INFO 2023-08-29T11:04:55.359Z Successfully acquired change log lock
INFO 2023-08-29T11:04:55.360552323Z Skipping auto-registration
WARNING 2023-08-29T11:04:55.361Z Skipping auto-registration
WARNING 2023-08-29T11:06:02.194Z HikariPool-1 - Connection com.mysql.cj.jdbc.ConnectionImpl@45e22def marked as broken because of SQLSTATE(08S01), ErrorCode(0) com.mysql.cj.jdbc.exceptions.CommunicationsException: Communications link failure The last packet successfully received from the server was 66,750 milliseconds ago. The last packet sent successfully to the server was 66,769 milliseconds ago. at com.mysql.cj.jdbc.exceptions.SQLError.createCommunicationsException(SQLError.java:174) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:64) at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:768) at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:653) at com.zaxxer.hikari.pool.ProxyStatement.execute(ProxyStatement.java:94) at com.zaxxer.hikari.pool.HikariProxyStatement.execute(HikariProxyStatement.java) at liquibase.executor.jvm.JdbcExecutor$ExecuteStatementCallback.doInStatement(JdbcExecutor.java:389) at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:82) at liquibase.executor.jvm.JdbcExecutor.execute(JdbcExecutor.java:150) at liquibase.database.AbstractJdbcDatabase.execute(AbstractJdbcDatabase.java:1278) at liquibase.database.AbstractJdbcDatabase.executeStatements(AbstractJdbcDatabase.java:1260) at liquibase.changelog.ChangeSet.execute(ChangeSet.java:637) at liquibase.changelog.visitor.UpdateVisitor.visit(UpdateVisitor.java:49) at liquibase.changelog.ChangeLogIterator$2.lambda$null$0(ChangeLogIterator.java:111) at liquibase.Scope.lambda$child$0(Scope.java:160) at liquibase.Scope.child(Scope.java:169) at liquibase.Scope.child(Scope.java:159) at liquibase.Scope.child(Scope.java:138) at liquibase.changelog.ChangeLogIterator$2.lambda$run$1(ChangeLogIterator.java:110) at liquibase.Scope.lambda$child$0(Scope.java:160) at liquibase.Scope.child(Scope.java:169) at liquibase.Scope.child(Scope.java:159) at liquibase.Scope.child(Scope.java:138) at liquibase.Scope.child(Scope.java:222) at liquibase.changelog.ChangeLogIterator$2.run(ChangeLogIterator.java:94) at liquibase.Scope.lambda$child$0(Scope.java:160) at liquibase.Scope.child(Scope.java:169) at liquibase.Scope.child(Scope.java:159) at liquibase.Scope.child(Scope.java:138) at liquibase.Scope.child(Scope.java:222) at liquibase.Scope.child(Scope.java:226) at liquibase.changelog.ChangeLogIterator.run(ChangeLogIterator.java:66) at liquibase.Liquibase.lambda$null$0(Liquibase.java:272) at liquibase.Scope.lambda$child$0(Scope.jav…
WARNING 2023-08-29T11:06:02.260Z Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'liquibase' defined in class path resource [org/springframework/boot/autoconfigure/liquibase/LiquibaseAutoConfiguration$LiquibaseConfiguration.class]: Invocation of init method failed; nested exception is liquibase.exception.DatabaseException: java.sql.SQLException: Connection is closed
INFO 2023-08-29T11:06:02.263Z HikariPool-1 - Shutdown initiated...
INFO 2023-08-29T11:06:02.530Z HikariPool-1 - Shutdown completed.
WARNING 2023-08-29T11:06:02.681Z Invocation of destroy method failed on bean with name 'reactorServerResourceFactory': java.lang.NullPointerException: Cannot invoke "org.springframework.cloud.sleuth.CurrentTraceContext.context()" because the return value of "org.springframework.cloud.sleuth.internal.LazyBean.get()" is null
The second replica, that took over first, and applied the change set:
INFO 2023-08-29T11:05:15.978Z HikariPool-1 - Starting...
INFO 2023-08-29T11:05:16.893Z HikariPool-1 - Start completed.
INFO 2023-08-29T11:05:22.861Z Waiting for changelog lock....
INFO 2023-08-29T11:05:32.878Z Waiting for changelog lock....
INFO 2023-08-29T11:05:42.898Z Waiting for changelog lock....
INFO 2023-08-29T11:05:52.915Z Waiting for changelog lock....
INFO 2023-08-29T11:06:02.931Z Waiting for changelog lock....
INFO 2023-08-29T11:06:07.947Z Successfully acquired change log lock
INFO 2023-08-29T11:06:10.489Z Reading from rfid.DATABASECHANGELOG
INFO 2023-08-29T11:06:10.705Z Successfully released change log lock
INFO 2023-08-29T11:06:13.385Z Successfully acquired change log lock
WARNING 2023-08-29T11:06:13.385Z Skipping auto-registration
INFO 2023-08-29T11:06:13.392805017Z Skipping auto-registration
INFO 2023-08-29T11:08:13.512Z SQL in file db/V1_31_DELETEME_test_liquibase_session_lock.sql executed
INFO 2023-08-29T11:08:13.528Z ChangeSet db/liquibase-changelog.xml::31::mate.sebestyen ran successfully in 120096ms
INFO 2023-08-29T11:08:13.657Z Successfully released change log lock
INFO 2023-08-29T11:08:14.127Z HHH000204: Processing PersistenceUnitInfo [name: default]
INFO 2023-08-29T11:08:14.347Z HHH000412: Hibernate ORM core version 5.4.33
INFO 2023-08-29T11:08:14.824Z HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
INFO 2023-08-29T11:08:15.155Z HHH000400: Using dialect: org.hibernate.dialect.MySQL5InnoDBDialect
INFO 2023-08-29T11:08:18.589Z HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
INFO 2023-08-29T11:08:18.615Z Initialized JPA EntityManagerFactory for persistence unit 'default'
The third replica, that applied the change set again:
INFO 2023-08-29T11:05:17.652Z HikariPool-1 - Starting...
INFO 2023-08-29T11:05:18.644Z HikariPool-1 - Start completed.
INFO 2023-08-29T11:05:24.182Z Waiting for changelog lock....
INFO 2023-08-29T11:05:34.202Z Waiting for changelog lock....
INFO 2023-08-29T11:05:44.219Z Waiting for changelog lock....
INFO 2023-08-29T11:05:54.236Z Waiting for changelog lock....
INFO 2023-08-29T11:06:04.255Z Waiting for changelog lock....
INFO 2023-08-29T11:06:10.703Z Successfully acquired change log lock
INFO 2023-08-29T11:06:13.105Z Reading from rfid.DATABASECHANGELOG
INFO 2023-08-29T11:06:13.382Z Successfully released change log lock
INFO 2023-08-29T11:06:18.430Z Waiting for changelog lock....
INFO 2023-08-29T11:06:28.446Z Waiting for changelog lock....
INFO 2023-08-29T11:06:38.465Z Waiting for changelog lock....
INFO 2023-08-29T11:06:48.480Z Waiting for changelog lock....
INFO 2023-08-29T11:06:58.495Z Waiting for changelog lock....
INFO 2023-08-29T11:07:08.511Z Waiting for changelog lock....
INFO 2023-08-29T11:07:18.527Z Waiting for changelog lock....
INFO 2023-08-29T11:07:28.547Z Waiting for changelog lock....
INFO 2023-08-29T11:07:38.563Z Waiting for changelog lock....
INFO 2023-08-29T11:07:48.578Z Waiting for changelog lock....
INFO 2023-08-29T11:07:58.596Z Waiting for changelog lock....
INFO 2023-08-29T11:08:08.611Z Waiting for changelog lock....
INFO 2023-08-29T11:08:13.663Z Successfully acquired change log lock
INFO 2023-08-29T11:08:13.666923296Z Skipping auto-registration
WARNING 2023-08-29T11:08:13.669Z Skipping auto-registration
INFO 2023-08-29T11:10:13.809Z SQL in file db/V1_31_DELETEME_test_liquibase_session_lock.sql executed
INFO 2023-08-29T11:10:13.824Z ChangeSet db/liquibase-changelog.xml::31::mate.sebestyen ran successfully in 120115ms
INFO 2023-08-29T11:10:13.978Z Successfully released change log lock
INFO 2023-08-29T11:10:14.399Z HHH000204: Processing PersistenceUnitInfo [name: default]
INFO 2023-08-29T11:10:14.584Z HHH000412: Hibernate ORM core version 5.4.33
INFO 2023-08-29T11:10:14.999Z HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
INFO 2023-08-29T11:10:15.320Z HHH000400: Using dialect: org.hibernate.dialect.MySQL5InnoDBDialect
INFO 2023-08-29T11:10:18.753Z HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
INFO 2023-08-29T11:10:18.776Z Initialized JPA EntityManagerFactory for persistence unit 'default'
The last replica, that was created, because of the termination:
INFO 2023-08-29T11:06:33.037Z HikariPool-1 - Starting...
INFO 2023-08-29T11:06:33.563Z HikariPool-1 - Start completed.
INFO 2023-08-29T11:06:39.004Z Waiting for changelog lock....
INFO 2023-08-29T11:06:49.019Z Waiting for changelog lock....
INFO 2023-08-29T11:06:59.039Z Waiting for changelog lock....
INFO 2023-08-29T11:07:09.057Z Waiting for changelog lock....
INFO 2023-08-29T11:07:19.074Z Waiting for changelog lock....
INFO 2023-08-29T11:07:29.092Z Waiting for changelog lock....
INFO 2023-08-29T11:07:39.108Z Waiting for changelog lock....
INFO 2023-08-29T11:07:49.126Z Waiting for changelog lock....
INFO 2023-08-29T11:07:59.146Z Waiting for changelog lock....
INFO 2023-08-29T11:08:09.165Z Waiting for changelog lock....
INFO 2023-08-29T11:08:19.185Z Waiting for changelog lock....
INFO 2023-08-29T11:08:29.203Z Waiting for changelog lock....
INFO 2023-08-29T11:08:39.220Z Waiting for changelog lock....
INFO 2023-08-29T11:08:49.238Z Waiting for changelog lock....
INFO 2023-08-29T11:08:59.254Z Waiting for changelog lock....
INFO 2023-08-29T11:09:09.271Z Waiting for changelog lock....
INFO 2023-08-29T11:09:19.287Z Waiting for changelog lock....
INFO 2023-08-29T11:09:29.304Z Waiting for changelog lock....
INFO 2023-08-29T11:09:39.324Z Waiting for changelog lock....
INFO 2023-08-29T11:09:49.340Z Waiting for changelog lock....
INFO 2023-08-29T11:09:59.357Z Waiting for changelog lock....
INFO 2023-08-29T11:10:09.373Z Waiting for changelog lock....
INFO 2023-08-29T11:10:14.390Z Successfully acquired change log lock
INFO 2023-08-29T11:10:16.502Z Reading from rfid.DATABASECHANGELOG
INFO 2023-08-29T11:10:16.740Z Successfully released change log lock
INFO 2023-08-29T11:10:16.760Z Successfully acquired change log lock
INFO 2023-08-29T11:10:16.763864829Z Skipping auto-registration
WARNING 2023-08-29T11:10:16.766Z Skipping auto-registration
INFO 2023-08-29T11:10:16.803Z Successfully released change log lock
INFO 2023-08-29T11:10:17.221Z HHH000204: Processing PersistenceUnitInfo [name: default]
INFO 2023-08-29T11:10:17.506Z HHH000412: Hibernate ORM core version 5.4.33
INFO 2023-08-29T11:10:18.172Z HCANN000001: Hibernate Commons Annotations {5.1.2.Final}
INFO 2023-08-29T11:10:18.556Z HHH000400: Using dialect: org.hibernate.dialect.MySQL5InnoDBDialect
INFO 2023-08-29T11:10:21.790Z HHH000490: Using JtaPlatform implementation: [org.hibernate.engine.transaction.jta.platform.internal.NoJtaPlatform]
INFO 2023-08-29T11:10:21.822Z Initialized JPA EntityManagerFactory for persistence unit 'default'
Conclusion
I believe this library could be the ideal solution for the initial problem, but there seems to be a concurrency issue, and the lock can somehow be acquired by two replicas at the same time, and both can read the DATABASECHANGELOG
table, to determine which change sets to apply.
Any help is appreciated, I'll try to find the issue, and submit a PR, if I'm successful.
This seems like a problem in Liquibase itself (not the lock implementation). You'll notice these lines in pod #3:
INFO 2023-08-29T11:06:10.703Z Successfully acquired change log lock
INFO 2023-08-29T11:06:13.105Z Reading from rfid.DATABASECHANGELOG
INFO 2023-08-29T11:06:13.382Z Successfully released change log lock
At this time, the second pod has not finished, so the DATABASECHANGELOG will not have the changeset marked as done.
Then later, after it acquired the lock again, it will execute the changesets which it think still need execution, but it does not reread from the DATABASECHANGELOG table.
Either Liquibase should not release the lock between the read and execute, or it should reread from the DATABASECHANGELOG table. Are you able to upgrade to the latest version of Liquibase, to see if the problem is still present in the latest version?
It would be nice to include an integration test for your scenario, but I also realize it might be hard to implement.
Thank you for your answer, after some more testing, debugging and digging through Liquibase release notes, I think I found why the lock is released after readnig the rfid.DATABASECHANGELOG
table: Liquibase Issue.
For some reason without liquibase-sessionlock
the change set was never applied twice in my tests, but with it it was always applied twice.
The releasing of the lock was fixed in Liquibase v4.4.1, so I upgraded our Spring Boot version to 2.7.15
which uses Liquibase version 4.9.1
, and ran my tests again, the lock was not released between reading the rfid.DATABASECHANGELOG
table and applying the change set, with or without liquibase-sessionlock
.
It was just a coincidence, that after adding the liquibase-sessionlock
dependency I started to notice the change set applying twice.
Thank you again for your answer, I'll close this issue.