`GenericJob.save` just keeps copying things
Opened this issue · 6 comments
Repeated calls to save
silently create new database items while duplicating the job name, i.e. they are a sort of silent "save-as" in database space. The job name however, stays the same, so you get an error if you try to load by job name:
from pyiron_base import Project
pr = Project("tmp")
pr.remove_jobs(silently=True, recursive=True)
job = pr.create.job.TableJob("my_table")
job.save()
print(len(pr.job_table()))
# 1
job.save()
print(len(pr.job_table()))
# 2
try:
loaded = pr.load("my_table")
except ValueError as e:
print(e.args[0])
# job name 'my_table' in this project '...' is not unique...
Is this intentional behaviour? I find it very counter-intuitive to other programs, where "save" updates my save file for the current object but doesn't create a new save file (or database entry in this case).
To further complicate things, PythonFunctionContainerJob
does behave in the way I would expect by loading instead of saving if it finds the job. I like this better, but the difference in behaviour is definitely bad -- whatever behaviour we decide on should be consistent.
save()
is intended as internal function which serialises the job object, consisting of two parts, the storage in HDF5 and the creation of a SQL database entry. The function was never intended to be called by the user directly. I would be open to renaming it to a private function as we also had other users misusing it before.
Renaming it is reasonable to me, but is there a reason we want the duplicating behaviour even for internal use?
I have run into this before, but never deeply looked into it. Moving the linked section from PythonFunctionContainerJob
to GenericJob
should fix the issue and would be a-ok with me.
I haven't gotten to checking but one of you might know offhand -- is the status also stored in hdf? If so then I guess the snippet needs to update that bit of serialized data? Otherwise I agree, I think the new code should be promoted up to the parent class.
It is stored, but not sure where, might be just job['status']
even.
It is stored, but not sure where, might be just
job['status']
even.
We probably want to update this in the stored version too then.
IMO it would also be good to give a
warnings.warn(f"{self.job_name} already exists and was not re-saved. Use `Project.remove_job` to remove the stored version and re-save it if you have really made changes you want to save")
(or similar)