internetarchive/openlibrary

Reindex documents into a new Solr on OJF

cdrini opened this issue · 8 comments

Subtask of #1067

  • Create a Solr environment on server.openjournal.foundation
  • Index latest openlibrary dump into OJF Solr
  • [-] Replay missing edits from Infobase onto OJF Solr until up-to-date (partially done)
    • Skipping: should be done once on ol-solr0
  • Test OJF Solr by linking dev.openlibrary.org to OJF Solr

Can I work on this issue

@cdrini could guide you for any help in the subtasks :)

So @cdrini can I collaborate with you ??

Hey @viragumathe5 ! Unfortunately this task is already underway (adding the WIP label!). There's already a pretty long backlog of related changes enqueued (#1843, #2246), so I can't think of a way to add you to this task :(

What type of things are you interested in? I'm sure we can find a good issue for you work on :)

No problem at all I just ask for the collaboration if required

I would like to contribute in any way like Documentation, CodeBase, etc
unable to do designing stuff :)
I feel lucky to work for Internet Archives
Thank You

If you'd like a small task, these would be good:

  • #2330 (just have to edit a string)
  • #2239 (edit our issue/pr templates)
  • #2279 (small html change)

If you want something larger, this would be good:

Reindex complete; here are the numbers (using 2020-01-31 dump; and querying 02-14 solr for "before" values)

Type # in postgres # in old solr # in new solr psql diff solr diff
Works 18891263 16934104 18891032 -231 1956928
Orphans 3117594 2093485 3115125 -2469 1021640
Authors 7247819 6982935 7247631 -188 264696
Subjects 0 1514064 1514068 1514068 4

Reindex complete; here are the numbers (using 2020-02-29 dump; and querying 03-03 solr for "before" values)

Type # in postgres # in old solr # in new solr psql diff solr diff
Works 18895253 16937045 18895021 -232 1957976
Orphans 3116995  2093378 3114527 3114527 1021149
Authors 7248307 6983408 7248115 -192 264707
Subjects 0 1514064 1514068 1514068 4

-> 3.2M records will be made visible 🎉 Next step: #1067