hardlinking kinda fails, sometimes..
boyanpenkov opened this issue · 8 comments
OK, this is like some inside baseball here, but is worth noting I have occasionally started to see the following test failure on one machine:
self = <tests.test_add.TestAdd testMethod=test_add_rename_copy>
def test_add_rename_copy(self):
> paperscmd(f'add -rc --bibtex {self.mybib} --filesdir {self.filesdir} {self.pdf}')
tests/test_add.py:76:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/common.py:59: in paperscmd
return speedy_paperscmd(cmd, *args, **kwargs)
tests/common.py:53: in speedy_paperscmd
return call(main, args, check=check, check_output=check_output)
tests/common.py:34: in call
return f(*args, **kwargs)
papers/__main__.py:1071: in main
check_install(subp, o, config) and addcmd(subp, o, config)
papers/__main__.py:452: in addcmd
biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
papers/bib.py:432: in add_pdf
self.insert_entry(entry, update_key=True, **kw)
papers/bib.py:288: in insert_entry
self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
papers/bib.py:320: in insert_entry_check
self.insert_entry(entry, update_key, rename=rename, copy=copy)
papers/bib.py:311: in insert_entry
if rename: self.rename_entry_files(entry, copy=copy)
papers/bib.py:529: in rename_entry_files
self.move(file, newfile, copy)
papers/bib.py:223: in move
return _move(file, newfile, copy=copy, dryrun=papers.config.DRYRUN)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
f1 = '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf'
f2 = '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf', copy = True
interactive = True, dryrun = False, hardlink = True
def move(f1, f2, copy=False, interactive=True, dryrun=False, hardlink=True):
maybe = 'dry-run:: ' if dryrun else ''
dirname = os.path.dirname(f2)
if dirname and not os.path.exists(dirname):
logger.info(f'{maybe}create directory: {dirname}')
if not dryrun: os.makedirs(dirname)
if f1 == f2:
logger.info('dest is identical to src: '+f1)
return
if os.path.exists(f2):
# if identical file, pretend nothing happened, skip copying
if os.path.samefile(f1, f2) or checksum(f2) == checksum(f1):
if not copy and not dryrun:
logger.info(f'{maybe}rm {f1}')
os.remove(f1)
return
elif interactive:
ans = input(f'dest file already exists: {f2}. Replace? (y/n) ')
if ans.lower() != 'y':
return
else:
logger.info(f'{maybe}rm {f2}')
if not dryrun:
os.remove(f2)
if copy:
# If we can do a hard-link instead of copy-ing, let's do:
if hardlink:
cmd = f'{maybe}ln {f1} {f2}'
logger.info(cmd)
if not dryrun:
> os.link(f1, f2)
E OSError: [Errno 18] Invalid cross-device link: '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf' -> '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf'
papers/utils.py:119: OSError
This is not reproducible on two other machines -- and I think the filesystems are all flat! -- but one of the solutions seems to be here: higlass/higlass-manage#3 (tldr: replace os.link
with shutil.copy2
).
There's some back and forth of increasing dubiousness about hardlinks being bad and somewhat dangerous, and re-considering my behavior in the past, I'm tending to agree; I tend to just copy the files, and let the filesystem then dedupe them if if needs be -- the option here may be just not supporting the --hardlink option...
But again, this only rears it's head on one machine (and when I run tests manually, not though tox, cf #54 ).
This is on python 3.11, btw..
Thanks for checking @boyanpenkov
The hardlinking was meant primarily for git-lfs mediated backup. I was a little careless in making the change. Here is what I plan to do for a fix:
- set hardlink=False as default parameter of the move function
- explicitly pass hardlink=True in the backup function in main
- have a try / except close that makes a copy instead in case of failure
ASAP
Super, sounds like a plan....
It should be solved now. Do you mind testing? (I have no tests for hardlinks vs copy specifically). Full test would involve installation with git and git-lfs... Not urgent (I'm working on a more involved PR concerning backup anyway).
I just did something dirty -- sorry for that -- I amended a commit ( main.py was not saved !) and forced-pushed it. Hopefully you were not so fast to pull in the 2 min interval.
Seems like I was really not focused. I had not saved not commited utils.py which contains the move
command. Should be fixed now...
Pulled and confirming I see all passes