opensanctions/yente

elasticsearch.helpers.BulkIndexError: 37 document(s) failed to index.

jbothma opened this issue · 3 comments

Seen in logs after following docker-compose instrutions at https://www.opensanctions.org/docs/yente/deploy/

yente-tut-app-1  | 2023-06-27T10:16:35.283539Z [error    ] Indexing error: BulkIndexError('37 document(s) failed to index.') [yente.search.indexer] dataset=default entities_url=https://data.opensanctions.org/datasets/20230627/default/entities.ftm.json index=yente-entities-default-00320230627063008
yente-tut-app-1  | Traceback (most recent call last):
yente-tut-app-1  |   File "/app/yente/search/indexer.py", line 115, in index_entities
yente-tut-app-1  |     await async_bulk(es, docs, yield_ok=False, stats_only=True, chunk_size=1000)
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 337, in async_bulk
yente-tut-app-1  |     async for ok, item in async_streaming_bulk(
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 252, in async_streaming_bulk
yente-tut-app-1  |     async for data, (ok, info) in azip(  # type: ignore
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 156, in azip
yente-tut-app-1  |     yield tuple([await x.__anext__() for x in aiters])
yente-tut-app-1  |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 156, in <listcomp>
yente-tut-app-1  |     yield tuple([await x.__anext__() for x in aiters])
yente-tut-app-1  |                  ^^^^^^^^^^^^^^^^^^^
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 127, in _process_bulk_chunk
yente-tut-app-1  |     for item in gen:
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/helpers/actions.py", line 274, in _process_bulk_chunk_success
yente-tut-app-1  |     raise BulkIndexError(f"{len(errors)} document(s) failed to index.", errors)
yente-tut-app-1  | elasticsearch.helpers.BulkIndexError: 37 document(s) failed to index.

matching still works fine

two indexes were loading at the same time, then I think the first completed and the second failed?

more logs from just before

yente-tut-app-1  | 2023-06-27T10:16:28.123512Z [info     ] Index: 335000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:28.315509Z [info     ] Index: 2157000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:29.612188Z [info     ] Index: 2158000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:30.304867Z [info     ] Index: 336000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:31.307114Z [info     ] Index: 2159000 entities...     [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:32.216707Z [info     ] Index: 337000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
yente-tut-app-1  | 2023-06-27T10:16:32.329842Z [info     ] Index is now aliased to: yente-entities [yente.search.indexer] index=yente-entities-default-00320230626183008
yente-tut-app-1  | 2023-06-27T10:16:32.339328Z [info     ] Delete other index             [yente.search.indexer] index=yente-entities-default-00320230627063008
index            | {"@timestamp":"2023-06-27T10:16:32.341Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/S79Og3JaRmqF7Eh0Rzu-AQ] deleting index", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataDeleteIndexService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
yente-tut-app-1  | 2023-06-27T10:16:32.476391Z [info     ] Index update complete.         [yente.search.indexer] changed=True
yente-tut-app-1  | 2023-06-27T10:16:32.476725Z [info     ] Closing elasticsearch client   [yente.search.base] 
yente-tut-app-1  | 2023-06-27T10:16:34.133746Z [info     ] Index: 338000 entities...      [yente.search.indexer] index=yente-entities-default-00320230627063008
index            | {"@timestamp":"2023-06-27T10:16:34.184Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008] creating index, cause [auto(bulk api)], templates [], shards [1]/[1]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataCreateIndexService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.528Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] create_mapping", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.579Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.626Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.688Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.730Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.778Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.818Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.863Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:34.903Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.069Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.140Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
index            | {"@timestamp":"2023-06-27T10:16:35.184Z", "log.level": "INFO", "message":"[yente-entities-default-00320230627063008/PhUtJZRXS_W4tl_kC81NxA] update_mapping [_doc]", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[index][masterService#updateTask][T#1]","log.logger":"org.elasticsearch.cluster.metadata.MetadataMappingService","elasticsearch.cluster.uuid":"hOqHptlAT-ezXFwQK84TfQ","elasticsearch.node.id":"ZrtQG-U2S1miZBgej2ZZYQ","elasticsearch.node.name":"index","elasticsearch.cluster.name":"opensanctions-index"}
yente-tut-app-1  | 2023-06-27T10:16:35.283539Z [error    ] Indexing error: BulkIndexError('37 document(s) failed to index.') [yente.search.indexer] dataset=default entities_url=https://data.opensanctions.org/datasets/20230627/default/entities.ftm.json index=yente-entities-default-00320230627063008
yente-tut-app-1  | Traceback (most recent call last):
yente-tut-app-1  |   File "/app/yente/search/indexer.py", line 115, in index_entities
yente-tut-app-1  |     await async_bulk(es, docs, yield_ok=False, stats_only=True, chunk_size=1000)
yente-tut-app-1  |   File "/venv/lib/python3.11/site-packages/elasticsearch/_async/helpers.py", line 337, in async_bulk
yente-tut-app-1  |     async for ok, item in async_streaming_bulk(

will keep it running and see if the next update indexes or also has an issue

it later tried again and successfully indexed the new dataset.

pudo commented

Nice catch on this :)