desci-labs/nodes

Create script to pull updates from OpenAlex every day

hubsmoke opened this issue · 0 comments

Create a standalone project that contains scripts to pull from OpenAlex API (filter by date) and import all the entities into a Postgres database. Ensure the job runs just once per day and doesn't miss records since the last pull. Keep track of when records were pulled, perhaps by adding import time into the DB, or adding an ImportLog db table, and marking each entity with ImportLogId.

Use date filter for OpenAlex API and import each entity into the PostgresDB

https://docs.openalex.org/how-to-use-the-api/get-lists-of-entities/filter-entity-lists (required premium subscription, we are getting a quote)

The postgres db structure is as follows https://github.com/ourresearch/openalex-documentation-scripts/blob/main/openalex-pg-schema.sql

Ask sina for db creds.

Feel free to modify schema by adding fields/tables and integrate prisma if helpful

The OpenAlex API allows us to import on an hourly basis if we upgrade to premium. Keep in mind we may want to enable hourly updates in the future.