uqfoundation/dill

dill.load_session truncate/empty file when load file object

Closed this issue · 3 comments

I try these code, the file content on disk print empty after dill.load_session

import dill
with open("a.csv", "w") as f:
    f.write("foo")

dill.dump_session("session.pk1")
with open("a.csv", "r") as f2:
    print("f2 content:", f2.read())

dill.load_session("session.pk1")
with open("a.csv", "r") as f3:
    print("f3 content:", f3.read())

output:

> python3 a.py
f2 content: foo
f3 content: 

the strace log show dill.load_session has truncate my a.csv to empty


meta:

> python3 --version
Python 3.8.10
> pip3 show dill
Name: dill
Version: 0.3.5.1
> cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.4 LTS (Focal Fossa)"

The intent of dump_session and load_session is to preserve and resume an interpreter session. It's not intended to revert to an earlier version of the same session. Something like this:

Python 3.7.14 (default, Sep 10 2022, 11:17:06) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open("a.csv", "w") as f:
...     f.write("foo")
... 
3
>>> del f
>>> dill.dump_session("session.pk1")

then resume:

Python 3.7.14 (default, Sep 10 2022, 11:17:06) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session("session.pk1")
>>> with open("a.csv", "r") as f2:
...     print("f2 content:", f2.read())
... 
f2 content: foo
>>> with open("a.csv", "r") as f3:
...     print("f3 content:", f3.read())
... 
f3 content: foo
>>> 

Deleting the files and starting over...
You'll notice that if we start by writing the files to disk, then reading them, the file is not affected:

Python 3.7.14 (default, Sep 10 2022, 11:17:06) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("a.csv", "w") as f:
...     f.write("foo")
... 
3
>>> with open("a.csv", "r") as f2:
...     print("f2 content:", f2.read())
... 
f2 content: foo
>>> with open("a.csv", "r") as f3:
...     print("f3 content:", f3.read())
... 
f3 content: foo

Then reading... and using dump_session and load_session:

Python 3.7.14 (default, Sep 10 2022, 11:17:06) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("a.csv", "r") as f2:
...     print("f2 content:", f2.read())
... 
f2 content: foo
>>> import dill
>>> dill.dump_session("session.pk1")
>>> 
>>> with open("a.csv", "r") as f3:
...     print("f3 content:", f3.read())
... 
f3 content: foo
>>> with open("a.csv", "r") as f2:
...     print("f2 content:", f2.read())
... 
f2 content: foo
>>> dill.load_session("session.pk1")
>>> 
>>> with open("a.csv", "r") as f3:
...     print("f3 content:", f3.read())
... 
f3 content: foo
>>> 

I'm assuming the issue is somehow due to the closed f being found in the original session when dump_session or load_session is called. It needs some more investigation. I'm not sure what you are attempting is the intended usage of load_session...

just run dill.settings["fmode"] = dill.CONTENTS_FMODE before dill.dump_session("session.pkl"), then it won't truncate file when load file object, by the way, i test it on dill.dump_module and dill.load_module, this is my codes:

if __name__ == '__main__':
    with open("a.csv", "w") as f:
        f.write("foo")
    dill.settings["fmode"] = dill.CONTENTS_FMODE
    dill.dump_module("session.pkl")
    with open("a.csv", "r") as f2:
        print("f2 content:", f2.read())
    dill.load_module("session.pkl")
    with open("a.csv", "r") as f3:
        print("f3 content:", f3.read())

I'm going to close this issue. Feel free to reopen if the above isn't sufficient.