purescript/registry-dev

Incorrect time measurements in unpublish workflow prevents unpublishing

thomashoneyman opened this issue · 5 comments

As seen in:
purescript/registry#346 (comment)

...a package published just minutes prior cannot be unpublished because, and I quote:

Cannot unpublish web-fetch@4.0.0 because it was published 6063.176484722222 hours ago. Packages can only be unpublished for 48 hours after publication.

This error arises here:

now <- nowUTC
published <- case Operation.Validation.validateUnpublish now payload.version metadata of
Left NotPublished ->
Except.throw $ "Cannot unpublish " <> formatted <> " because this version has not been published."
Left AlreadyUnpublished ->
Except.throw $ "Cannot unpublish " <> formatted <> " because it has already been unpublished."
Left InternalError -> do
Log.error $ formatted <> " is listed as both published and unpublished, which should be impossible!"
Except.throw $ "Cannot unpublish " <> formatted <> " due to an internal error."
Left (PastTimeLimit { difference, limit }) ->
Except.throw $ Array.fold
[ "Cannot unpublish " <> formatted <> " because it was published " <> Number.Format.toString (unwrap difference) <> " hours ago. "
, "Packages can only be unpublished for " <> Number.Format.toString (unwrap limit) <> " hours after publication."
]

But the real implementation is deeper, in the registry library itself:

-- | Verifies that a package version is eligible for unpublishing.
-- | The version must have been published before, must not have been unpublished before, and unpublishing
-- | must happen within the 48 hours following publishing.
validateUnpublish :: DateTime -> Version -> Metadata -> Either UnpublishError PublishedMetadata
validateUnpublish now version (Metadata metadata) = do
let
inPublished = Map.lookup version metadata.published
inUnpublished = Map.lookup version metadata.unpublished
case inPublished, inUnpublished of
Nothing, Nothing ->
Left NotPublished
Just published, Nothing -> do
-- We only pass through the case where the user is unpublishing a
-- package that has been published and not yet unpublished.
let diff = DateTime.diff now published.publishedTime
if (diff > hourLimit) then
Left $ PastTimeLimit { limit: hourLimit, difference: diff }
else
Right published
Nothing, Just _ ->
Left AlreadyUnpublished
Just _, Just _ ->
Left InternalError
where
hourLimit = Hours 48.0

We're comparing the current time in the UTC time zone to the published time associated with the package, which is calculated here:

let
getRefTime = do
timestamp <- Except.rethrow =<< Run.liftAff (Git.gitCLI [ "log", "-1", "--date=iso8601-strict", "--format=%cd", ref ] (Just repoDir))
jsDate <- Run.liftEffect $ JSDate.parse timestamp
dateTime <- Except.note "Failed to convert JSDate to DateTime" $ JSDate.toDateTime jsDate
pure dateTime
-- Cloning will result in the `repo` name as the directory name
publishedTime <- Except.runExcept getRefTime >>= case _ of
Left error -> do
Log.error $ "Failed to get published time: " <> error
Except.throw $ "Cloned repository " <> owner <> "/" <> repo <> " at ref " <> ref <> ", but could not read the published time from the ref."
Right value -> pure value
pure { path: repoDir, published: publishedTime }

The problem in this specific case:

  1. A commit to the repository was made in 2022
  2. The tag associated with that commit wasn't made until today
  3. We published the package, looked up the ref time, and voila: we see 2022 and assign that as the 'published time'
  4. Now the publish time is long in the past, and you can't unpublish.

The reason why we use the ref time as the publish time is to support the legacy importer, in which we need to associate old publish times so the registry doesn't say everything was published in August 2023. But we probably should be using the current time in the publish pipeline instead.

f-f commented

But we probably should be using the current time in the publish pipeline instead

I think there's value in preserving timestamps of old packages. We can take care of this issue by:

  • distinguishing when something is coming from the legacy importer vs the new pipeline (in which it's fair to take the current time)
  • allowing trustees to unpublish without any time limits - we'll need to enable this anyways for DMCA reasons.

Good point on the second note there, and I agree with you we ought to preserve time stamps for old packages

  • distinguishing when something is coming from the legacy importer vs the new pipeline

Would this just amount to distinguishing between when this handle is called from the LegacyImporter module vs elsewhere? Effectively something like this diff here (15d747f) or would it be more involved than that?

I think you're right that it's basically choosing between two scenarios: one in which this is an import of a recent package initiated by a user's action, and one in which this is an automatic import of a legacy package. In fact, we already have a type for the distinction between the two:

-- | Operations can be exercised for old, pre-registry packages, or for packages
-- | which are on the 0.15 compiler series. If a true legacy package is uploaded
-- | then we do not require compilation to succeed and we don't publish docs.
data Source = Legacy | Current

The easiest thing is probably to move this data type into Prelude.purs and then have the fetch function take it as an argument:

fetch :: forall r. FilePath -> Location -> String -> Run (SOURCE + EXCEPT String + r) FetchedSource
fetch destination location ref = Except.rethrow =<< Run.lift _source (Fetch destination location ref identity)

(The comment on the Source data type isn't quite right, because with #255 we'll require all packages to compile; no exemption of "legacy" packages. But the idea is the same: in some ways we treat legacy packages differently than 'current' ones.)