pyiron/pyiron_base

`GenericJob.save` just keeps copying things

Opened this issue · 6 comments

Repeated calls to save silently create new database items while duplicating the job name, i.e. they are a sort of silent "save-as" in database space. The job name however, stays the same, so you get an error if you try to load by job name:

from pyiron_base import Project

pr = Project("tmp")
pr.remove_jobs(silently=True, recursive=True)

job = pr.create.job.TableJob("my_table")
job.save()

print(len(pr.job_table()))
# 1

job.save()
print(len(pr.job_table()))
# 2

try:
    loaded = pr.load("my_table")
except ValueError as e:
    print(e.args[0])
# job name 'my_table' in this project '...' is not unique...

Is this intentional behaviour? I find it very counter-intuitive to other programs, where "save" updates my save file for the current object but doesn't create a new save file (or database entry in this case).

To further complicate things, PythonFunctionContainerJob does behave in the way I would expect by loading instead of saving if it finds the job. I like this better, but the difference in behaviour is definitely bad -- whatever behaviour we decide on should be consistent.

save() is intended as internal function which serialises the job object, consisting of two parts, the storage in HDF5 and the creation of a SQL database entry. The function was never intended to be called by the user directly. I would be open to renaming it to a private function as we also had other users misusing it before.

Renaming it is reasonable to me, but is there a reason we want the duplicating behaviour even for internal use?

pmrv commented

I have run into this before, but never deeply looked into it. Moving the linked section from PythonFunctionContainerJob to GenericJob should fix the issue and would be a-ok with me.

I haven't gotten to checking but one of you might know offhand -- is the status also stored in hdf? If so then I guess the snippet needs to update that bit of serialized data? Otherwise I agree, I think the new code should be promoted up to the parent class.

pmrv commented

It is stored, but not sure where, might be just job['status'] even.

It is stored, but not sure where, might be just job['status'] even.

We probably want to update this in the stored version too then.

IMO it would also be good to give a

warnings.warn(f"{self.job_name} already exists and was not re-saved. Use `Project.remove_job` to remove the stored version and re-save it if you have really made changes you want to save")

(or similar)