facebookresearch/fairseq

Use symbolic link for saving best/last checkpoints

ZeroRin opened this issue ยท 0 comments

๐Ÿš€ Feature Request

When preparing checkpoint_best.pt and checkpoint_last.pt, create symbolic link instead of making exact copy of the checkpoint

Motivation

I'm running on a machine that somehow has bad io performance, writing the same file 3 times seems extremely inefficient.

Seems that there is plan to implement a asynchronous copying but i believe a symlink is much more efficient

if len(checkpoints) > 0 and trainer.should_save_checkpoint_on_current_rank:
saved_cp = trainer.save_checkpoint(checkpoints[0], extra_state)
for cp in checkpoints[1:]:
if cfg.write_checkpoints_asynchronously:
# TODO[ioPath]: Need to implement a delayed asynchronous
# file copying/moving feature.
logger.warning(
f"ioPath is not copying {checkpoints[0]} to {cp} "
"since async write mode is on."
)
else:
assert PathManager.copy(
checkpoints[0], cp, overwrite=True
), f"Failed to copy {checkpoints[0]} to {cp}"

Pitch

Alternatives

Additional context