GNS-Science/nshm-toshi-api

Feature: Search Reindex

Opened this issue · 1 comments

We have found that some content was not indexed and this was not detected (new issue required). This mean Search is not useful/working as intended.

So we want a script that can be run that will chech and/or re-index the entire Toshi collection...

Done when:

  • #207
  • Toshi dev_ops can run a script, requiring just a AWS having permissions to replace the index
  • TEST mode builds a new index, without replacing the current one
  • report useful information about progress, any errors, and final statistics

Statistics at 2024/04/12 (from AWS Console)

ToshiIdentity-PROD

Item count: 3
Table size: 236 bytes
Average item size: 78.67 bytes

ToshiFileObject-PROD

Item count: 6,798,698
Table size: 2.3 gigabytes
Average item size: 343.55 bytes

ToshiThingObject-PROD:

Item count: 6,826,637
Table size: 16 gigabytes
Average item size 2,339.86 bytes

ToshiTableObject-PROD

Item count: 2,932
Table size: 43.1 megabytes
Average item size: 14,696.82 bytes

S3 stats

Item count: 7464421 (includes pre dynamodDB objects)
Bucket size: 7.7 TB