remove_source does not remove records from dataset.json
Closed this issue · 2 comments
Hello,
I add an image source and then want to delete it before adding another image with the same source name. I remove source the following way:
dataset_metadata = mobie.metadata.read_dataset_metadata(dataset_folder)
sources = dataset_metadata["sources"]
if source_name in sources:
print("Source already exists, remove before adding again")
mobie.remove_source(dataset_folder, source_name, remove_data=True)
Then adding it with add_image
fails with the following error:
Traceback (most recent call last):
File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 88, in <module>
main()
File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 66, in main
mobie.remove_source(dataset_folder, args.source_name, remove_data=True)
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/source_utils.py", line 97, in remove_source
_remove_image_data(storage_type, os.path.join(dataset_folder, rel_path))
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/source_utils.py", line 13, in _remove_image_data
rmtree(data_path) if storage_type.endswith("n5") else os.remove(data_path)
^^^^^^^^^^^^^^^^^
File "/home/buglakov/miniconda3/envs/mobie/lib/python3.11/shutil.py", line 722, in rmtree
onerror(os.lstat, path, sys.exc_info())
File "/home/buglakov/miniconda3/envs/mobie/lib/python3.11/shutil.py", line 720, in rmtree
orig_st = os.lstat(path, dir_fd=dir_fd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/g/kreshuk/buglakova/projects/platy_registration/data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.n5'
I figured that the dataset.json
still has entries about this deleted source and if I go and remove them manually, I can add it again. What would be the correct way to completely delete a source?
Another problem I have is that after adding the source I have temporary directories with the name like "tmp_dataset_source" remaining in the directory where I was running a script. If I delete the source and information in dataset.json
but this temporary directory is there, I can't add the image with the same name, failing with the following error:
Traceback (most recent call last):
File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 88, in <module>
main()
File "/g/kreshuk/buglakova/projects/platy_registration/pipeline_steps/platybrowser_export/add_image.py", line 69, in main
mobie.add_image(
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/image_data.py", line 246, in add_image
metadata.add_source_to_dataset(dataset_folder, "image", image_name, image_metadata_path,
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 274, in add_source_to_dataset
source_metadata = get_image_metadata(dataset_folder, image_metadata_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 192, in get_image_metadata
return _get_image_metadata(dataset_folder, metadata_path, "image",
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 164, in _get_image_metadata
file_format = _get_file_format(path) if file_format is None else file_format
^^^^^^^^^^^^^^^^^^^^^^
File "/g/kreshuk/buglakova/libraries/mobie-utils-python/mobie/metadata/source_metadata.py", line 153, in _get_file_format
raise ValueError(f"{path} does not exist.")
ValueError: data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.xml does not exist.
This is not such an issue, because I can just delete that directory after adding the source, but it's quite unexpected.
I figured that the
dataset.json
still has entries about this deleted source and if I go and remove them manually, I can add it again. What would be the correct way to completely delete a source?
remove_source
will remove the entry from the dataset.json
if it fully runs through. I just checked and the unittest covers this, so I am pretty sure this works. However, in your case remove_source
fails with an error, so the function does not reach the point where the updated metadata is written out. You can see in the source code that this is the last thing that happens in the function.
The function fails because the folder 'g/kreshuk/buglakova/projects/platy_registration/data/platybrowser-smfish-project/data/1.0.1/images/bdv-n5/prox-brn3-sfg_pl3_dapi.n5'
that it tries to remove (because of remove_data=True
) is not there. Probably you have some inconsistent state because some previous failed attempt to remove the source. Can you try again with a source where you are sure that the image data is there? Then it should work.
This is not such an issue, because I can just delete that directory after adding the source, but it's quite unexpected.
Yes, that is true. You have to remove this tmp
folder in order to re-add the source. I agree that it's a bit unexpected but this folder contains important debug information, so I don't want to remove it automatically.
I figured out that it was because I run it with Snakemake and the n5
file was the target of the rule. When the rule is rerun, Snakemake deletes the target file before running any commands. Thanks for help!