pyiron/pyiron_atomistics

pr.import_from_path is not correctly importing vasp job

jyang2009 opened this issue · 7 comments

Dear all,

I recently realized that import_from_path cannot import vasp jobs that are successfully completed with pyiron. The reason is that after completing the job, pyiron compress the files into job_name.tar.bz2. However, import_from_path uses the function _calculation_validation, which checks whether the folder includes one of the files "OUTCAR", "vasprun.xml", "OUTCAR.gz", "vasprun.xml.bz2" or "vasprun.xml.gz" to determine whether it should be imported.
I wonder if there is an easy fix for this problem.

Best,
Jing

Hey Jing,

I don't think there is an easy fix for this. The fastest solution would be to extract then import. If you are dealing with a large number of these files, I can help there with some personal utilities that are not part of pyiron, depending on what you actually want to do with the data.

Han

I guess the pyiron-side solution would be to pack your data in a project and then go through unpack. If you have calculations that you haven't run through pyiron, then you're out of luck and have to go through the import (and extraction).

Hey Hi,

Do you have a python code that goes through all subdirectories in a folder and decompress the files? That would be useful to me.

Best,
Jing

pmrv commented

Using find on the command line should be enough, try some variation of

find -L calculations -type f -name '*.tar.bz2' -execdir tar xf '{}' \;

where calculations is the folder that contains your VASP runs.

what Marvin said should work.

However, I'm impatient and have parallelised it so that it works for large numbers of files here:

https://github.com/ligerzero-ai/utils

Clone it into your PYTHONPATH (somewhere where python can see it) and call:

from utils.generic import find_and_extract_tarballs_parallel
find_and_extract_tarballs_parallel(parent_dir="/root/personal_python_utilities/utils/development/test_decompress", extensions=(".tar.bz2"))

Please be VERY careful with this tool, since it can generate millions of files in a matter of seconds if you call it on a large number of tarballs. I've not built any safeguards, so just test it on a small number of files first. Take note that it extracts dirs inplace, so if the tarballs somehow aren't directories but rather just extract inplace, you need to make sure that they are contained in their own folder

But, it should work.

Got it. Thanks a lot guys.

One last comment is that there is also functionality in the above repository that supports compression in parallel, so if you ever need to archive all the directories containing specific files quickly (so the opposite of what you are trying to do here).

So if you ever feel like some file management task takes too long, this can help you.

I also support extraction of specific files from the tarball in case you only need a few files. (also parallelised)