perrette/papers

hardlinking kinda fails, sometimes..

boyanpenkov opened this issue · 8 comments

OK, this is like some inside baseball here, but is worth noting I have occasionally started to see the following test failure on one machine:

self = <tests.test_add.TestAdd testMethod=test_add_rename_copy>

    def test_add_rename_copy(self):
    
>       paperscmd(f'add -rc --bibtex {self.mybib} --filesdir {self.filesdir} {self.pdf}')

tests/test_add.py:76: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/common.py:59: in paperscmd
    return speedy_paperscmd(cmd, *args, **kwargs)
tests/common.py:53: in speedy_paperscmd
    return call(main, args, check=check, check_output=check_output)
tests/common.py:34: in call
    return f(*args, **kwargs)
papers/__main__.py:1071: in main
    check_install(subp, o, config) and addcmd(subp, o, config)
papers/__main__.py:452: in addcmd
    biblio.add_pdf(file, attachments=o.attachment, rename=o.rename, copy=o.copy,
papers/bib.py:432: in add_pdf
    self.insert_entry(entry, update_key=True, **kw)
papers/bib.py:288: in insert_entry
    self.insert_entry_check(entry, update_key=update_key, rename=rename, copy=copy, **checkopt)
papers/bib.py:320: in insert_entry_check
    self.insert_entry(entry, update_key, rename=rename, copy=copy)
papers/bib.py:311: in insert_entry
    if rename: self.rename_entry_files(entry, copy=copy)
papers/bib.py:529: in rename_entry_files
    self.move(file, newfile, copy)
papers/bib.py:223: in move
    return _move(file, newfile, copy=copy, dryrun=papers.config.DRYRUN)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

f1 = '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf'
f2 = '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf', copy = True
interactive = True, dryrun = False, hardlink = True

    def move(f1, f2, copy=False, interactive=True, dryrun=False, hardlink=True):
        maybe = 'dry-run:: ' if dryrun else ''
        dirname = os.path.dirname(f2)
        if dirname and not os.path.exists(dirname):
            logger.info(f'{maybe}create directory: {dirname}')
            if not dryrun: os.makedirs(dirname)
        if f1 == f2:
            logger.info('dest is identical to src: '+f1)
            return
    
        if os.path.exists(f2):
            # if identical file, pretend nothing happened, skip copying
            if os.path.samefile(f1, f2) or checksum(f2) == checksum(f1):
                if not copy and not dryrun:
                    logger.info(f'{maybe}rm {f1}')
                    os.remove(f1)
                return
    
            elif interactive:
                ans = input(f'dest file already exists: {f2}. Replace? (y/n) ')
                if ans.lower() != 'y':
                    return
            else:
                logger.info(f'{maybe}rm {f2}')
                if not dryrun:
                    os.remove(f2)
    
        if copy:
            # If we can do a hard-link instead of copy-ing, let's do:
            if hardlink:
                cmd = f'{maybe}ln {f1} {f2}'
                logger.info(cmd)
                if not dryrun:
>                   os.link(f1, f2)
E                   OSError: [Errno 18] Invalid cross-device link: '/home/boyan/boyanshouse/Vazhno/Work/papers/tests/downloadedpapers/bg-8-515-2011.pdf' -> '/tmp/papers.files1tk1bilj/perrette_et_al_2011_near-ubiquity-of-ice-edge-blooms-in-the-arctic.pdf'

papers/utils.py:119: OSError

This is not reproducible on two other machines -- and I think the filesystems are all flat! -- but one of the solutions seems to be here: higlass/higlass-manage#3 (tldr: replace os.link with shutil.copy2).

There's some back and forth of increasing dubiousness about hardlinks being bad and somewhat dangerous, and re-considering my behavior in the past, I'm tending to agree; I tend to just copy the files, and let the filesystem then dedupe them if if needs be -- the option here may be just not supporting the --hardlink option...

But again, this only rears it's head on one machine (and when I run tests manually, not though tox, cf #54 ).

This is on python 3.11, btw..

Thanks for checking @boyanpenkov
The hardlinking was meant primarily for git-lfs mediated backup. I was a little careless in making the change. Here is what I plan to do for a fix:

  • set hardlink=False as default parameter of the move function
  • explicitly pass hardlink=True in the backup function in main
  • have a try / except close that makes a copy instead in case of failure
    ASAP

Super, sounds like a plan....

It should be solved now. Do you mind testing? (I have no tests for hardlinks vs copy specifically). Full test would involve installation with git and git-lfs... Not urgent (I'm working on a more involved PR concerning backup anyway).

I just did something dirty -- sorry for that -- I amended a commit ( main.py was not saved !) and forced-pushed it. Hopefully you were not so fast to pull in the 2 min interval.

Confirming I still see #54 -- however, on 4a4bb62 with pytest --cov=papers --cov-append --cov-report=term-missing -xv directly on python3.11, I see all tests pass.

Seems like I was really not focused. I had not saved not commited utils.py which contains the move command. Should be fixed now...

Pulled and confirming I see all passes