quantified-uncertainty/metaforecast

Independent update schedules for different platforms

Opened this issue · 4 comments

What I have in mind here: let's keep a platform_status DB table with a timestamp of the last update for each platform; ask every platform to update every minute in scheduler; and skip an update for a platform if its data is fresh enough ("fresh enough" definition can be specified by each platform code separately).

This would allow us to e.g. update polymarket once per day and metaculus every 30 minutes, or vice versa.

It would also give us an ability to force an update from web UI: mark a platform as "needs-to-be-updated" flag with a single button click and don't worry about Heroku.

And also a better observability: it'd be easy to build a (secret? login-only?) page listing all platforms and their current update status. We could also log an exception if platform update failed, store it in the same table, and display it on that web page.

Why every minute?

That's the easiest way to get periodic jobs running every N hours, automatic retries and custom business logic for "should we run this code now?", managed by configuration in code and by DB state, instead of manually configuring heroku jobs or managing crontab files.

Something like:

// polymarket.ts
export const polymarket: Platform = {
  name: 'polymarket',
  fetcher: ...,
  period: 8 * 3600, // seconds after success
  retryPolicy: {
    minDelay: 3600, // seconds after failure
    // after a few failures we won't spam polymarket with requests too often
    doubleDelayOnFailureUntil: 24 * 3600,
  },
};

// index.ts
export const fetchPlatformIfNeeded = (platform: Platform) => {
  const needsRefetch = ...; // based on db state, `period` and `retryPolicy` options
  if (needsRefetch) fetchPlatform(platform);
}

...And then we call platforms.map(fetchPlatformIfNeeded) every minute. With some DB locks per platform, or some other way to avoid fetching the same platform twice in parallel, if we ever get to multiple workers.

I might be overcomplicating the options just to show an idea, but it's nice to have some room for further flexibility.

Seems a bit hardcore, but also pretty nice :)

Well, we could also do a long-running server which calculates what to run and manages jobs queue as necessary. That's less hackish than "try everything every minute", but harder to implement properly and more fragile (unless there's a good node lib for that; for Python there's apscheduler which is comprehensive and work well; I'm not sure if there's anything similar in node world, but I'll look around).

Both of these approaches would be costly on Heroku, but this is a task for later and we might leave Heroku by then. Also, we can start with "every hour" instead of "every minute".