DmitryKey/luke

Make unit tests stable

mocobeta opened this issue · 4 comments

Some of unit tests are still unstable.

I posted a question about usage of LuceneTestCase to lucene user's mailing list.
Here is the thread archive: http://mail-archives.apache.org/mod_mbox/lucene-java-user/201808.mbox/browser

Here are random seeds (i.e. test parameter combinations) those produces assertion errors:

mvn test -Dtest=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=CA70FBB57042AF34

NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=22543, maxDocsPerChunk=448, blockSize=3), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=22543, blockSize=3)), sim=RandomSimilarity(queryNorm=true): {}, locale=el-CY, timezone=Pacific/Noumea
mvn test -Dtest=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=B00CDBB7AA2F2EA9

NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=1, maxDocsPerChunk=243, blockSize=229), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=1, blockSize=229)), sim=RandomSimilarity(queryNorm=true): {}, locale=ar-YE, timezone=America/Montreal
mvn test -Dtest=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=6F3AD2CDDC481719

NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=14493, maxDocsPerChunk=9, blockSize=10), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=14493, blockSize=10)), sim=RandomSimilarity(queryNorm=true): {}, locale=ar-LB, timezone=Asia/Aqtau
mvn test -Dtest=CommitsImplTest -Dtests.method=testGetCommit_generation_notfound -Dtests.seed=CCFC7423CFDCC16A

NOTE: test params are: codec=Lucene70, sim=RandomSimilarity(queryNorm=false): {}, locale=sr-RS, timezone=America/Santiago
mvn test -Dtest=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=C2F6CB0A0E641E29

NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=3, maxDocsPerChunk=675, blockSize=817), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=3, blockSize=817)), sim=RandomSimilarity(queryNorm=true): {}, locale=nl, timezone=Asia/Manila
mvn test -Dtest=CommitsImplTest -Dtests.method=testListCommits -Dtests.seed=12EB3E3B19E3AC89

NOTE: test params are: codec=Asserting(Lucene70): {f1=PostingsFormat(name=Direct)}, docValues:{}, maxPointsInLeafNode=359, maxMBSortInHeap=5.195673810400375, sim=RandomSimilarity(queryNorm=false): {}, locale=tr, timezone=Indian/Cocos
mvn test -Dtest=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=A6ABF40276C979CF

NOTE: test params are: codec=DummyCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=DUMMY, chunkSize=16938, maxDocsPerChunk=320, blockSize=258), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=DUMMY, chunkSize=16938, blockSize=258)), sim=RandomSimilarity(queryNorm=false): {}, locale=es-PY, timezone=Asia/Jakarta
mvn test -Dtestcase=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=FB9BFAF87DBD805D

NOTE: test params are: codec=HighCompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=6289, maxDocsPerChunk=1, blockSize=576), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=HIGH_COMPRESSION, chunkSize=6289, blockSize=576)), sim=RandomSimilarity(queryNorm=true): {}, locale=de, timezone=Asia/Jerusalem
mvn test -Dtestcase=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=4D0A3252A6F3F2F3

NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=16755, maxDocsPerChunk=249, blockSize=1), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=16755, blockSize=1)), sim=RandomSimilarity(queryNorm=false): {}, locale=id-ID, timezone=Europe/Ljubljana
mvn test -Dtestcase=CommitsImplTest -Dtests.method=testGetSegmentAttributes -Dtests.seed=90B0C1FA836B2017

NOTE: test params are: codec=FastDecompressionCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=1, maxDocsPerChunk=499, blockSize=9), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST_DECOMPRESSION, chunkSize=1, blockSize=9)), sim=RandomSimilarity(queryNorm=true): {}, locale=en-IE, timezone=Canada/Yukon

@msokolov I think you are an expert in this area :) , so I just would like to share you the results above. We'd appreciate if you share us your intuitions / suggestions. I am not rushing to resolve it, so please do not mind if you don't want to respond.

I got the cause of testGetSegmentAttributes() failures: Subclasses of CompressingCodec do not store any segment attributes.
So can we ignore all subclasses of CompressingCodec for our tests (by @LuceneTestCase.SuppressCodecs annotation ) ? Those are contained in test-framework, but there are no mentions about segment attributes.
http://lucene.apache.org/core/7_5_0/test-framework/org/apache/lucene/codecs/compressing/CompressingCodec.html

Nice work!