djsegal/julia_observer

Caching technique has not scaled with ecosystem size

Closed this issue · 1 comments

This code says every package that's been modified in the last 5 days is new. However, this leads to around 20x more hits per recently updated package.

Simple fix would be to read cache to check if the URL was from more than 5 days ago?

// the original purpose of this was that api hits with caching are tricky in the 1-2 days ago range


def is_new_response? url
  return true \
    unless Batch.active_marker_date.present?

  url_response = HTTParty.get url, \
    query: @client_info, \
    headers: @url_headers.merge({
      "If-Modified-Since" => 5.days.ago.rfc822
    })

  validate_response url, url_response
  return false if url_response.code == 304

  Rails.cache.write url, url_response.to_yaml
  true
end

This was a false alert caused by a bug

// bug was fixed in 0f31af8


not saying this won't happen in the future. just doesn't seem to be the case now