your-tools/pycp

cp cp --reflink=auto feature

manujchandra opened this issue · 3 comments

I am using Fedora 35 XFCE edition. One thing I have noticed:

I copied a 27 GB folder in Thunar in the same drive, the GUI file manager of XFCE. It took 4 seconds to copy.

If I run sudo btrfs fi du ~

It shows 27 GB is shared. In other words, only metadata was copied, not the actual data.

But if I use pycp on the same folder, it takes a few minutes to copy to the same drive. It means pycp is duplicating the files. I can see the free space also decreases.

I found here

that we can use cp --reflink=auto to copy with deduplication.

Is it possible to integrate this behavior of Thunar in pycp where it copies with de-duplication?

Thanks! 🙏🏽

Is it possible to integrate this behavior [...] in pycp where it copies with de-duplication?

I don't think so. It seems cp --reflink uses some very specific, linux-only C code to achieve that, and pycp is very high-level and cross-platform.

That being said, feel free to try and write a patch

I am leaving these links here just for future reference:

https://stackoverflow.com/questions/65492317/copy-file-in-python-with-copy-on-write-cow

https://bugs.python.org/issue37157

I will investigate them later.

Leaving this here for future reference:

https://pypi.org/project/reflink/

from reflink import reflink

# Reflink copy 'large_file.img' to 'copy_of_file.img'
reflink("large_file.img", "copy_of_file.img")