MESAHub/mesa

unnecessary (?) installation failure when git lfs detects "short" files

mjoyceGR opened this issue · 5 comments

current in-dev MESA version at time of posting: mesa/data/version_number = acd763a
remote installation on OzSTAR HPC cluster

since I don't have root access here, I manually installed git lfs using these instructions: git-lfs/git-lfs#4134

this got me past the "git lfs not installed" issue successfully, but I kept encountering versions of
"Some data files were not successfully retrieved" no matter how many times I pulled. I finally went into the install script and commented out exit 1 in the block:

        FILE_LIST=$(git-lfs ls-files | awk '{print $NF}')
        for LFS_FILE in ${FILE_LIST}; do
            # the checks have this form to handle uncommitted changes to LFS files
            # if no file exists, that means the file was probably deleted and that's OK
            # if the file exists, but is short, then it is probably the bare LFS pointer
            #   and that indicates a problem retrieving the file
            if [ -f "${LFS_FILE}" ] && [ $(du -k "${LFS_FILE}" | cut -f1) -le 4 ];
            then
                echo
                echo "${LFS_FILE} is smaller than expected for a file tracked by git LFS"
                echo
                echo "****************************************************************"
                echo "*        Some data files were not successfully retrieved       *"
                echo "*                                                              *"
                echo "*                         Try running:                         *"
                echo "*                   git lfs install --force                    *"
                echo "*                           and then                           *"
                echo "*                         git lfs pull                         *"
                echo "*                                                              *"
                echo "****************************************************************"
                echo
                #exit 1
            fi
        done

when I ran ./install after this, I got a warning (but no exit) for seemingly every large file, but then the MESA installation proceeded as normal and concluded successfully.

To me, it seems like the above code block assumes that detection of "short" large files will automatically result in a botched install, but then it doesn't (necessarily). In which case, we're causing installation failures that we don't need to. Can the install script be modified in some way to avoid this? Perhaps replacing the forced exit with a warning?

Did you try git lfs install --force and git lfs pull, as suggested in the error message? That usually fixes things for me when it occasionally comes up.

I just had a look at OzSTAR, and it seems to use Environment Modules for loading software. Have you tried to module load git-lfs or something similar? I can't find a list of available modules online but you can try module avail as a first try at seeing what's available. The OzSTAR docs make it sound like you might first have to load a compiler before module avail will show you what's built with that compiler.

yes I tried all of those things, many times (hence "no matter how many times I pulled"). Simply commenting out the exit command got me through the installation, which I think means that exit command doesn't need to be there. We could be causing more failed installations than is actually necessary

I encountered a filesystem issue a while back where I would get installation errors of this nature because the filesystem took a bit of time to update the file properties, such that it would trigger this failure even though lfs was properly installed.

At least for me, an adequate solution on Cannon was to set this environment variable:
export MESA_GIT_LFS_SLEEP=60

When that is set, the install script pauses for 60 seconds to give the filesystem a chance to catch up, and then it usually passes the file size test. Any chance that solves your issue?

interesting! I definitely didn't try that. My issue is resolved in the short term with my (possibly stupid but seemingly also not dangerous) work-around.

I'll investigate, but regardless: is it worth terminating the installation over this short file detection issue? wouldn't a warning suffice?

In the case where git-lfs actually isn't installed/loaded, I think you might end up getting weird errors much later in the installation that are difficult to interpret. Maybe one of us should try that out to see how bad it is.