Project-MONAI/tutorials

Missing file in `FileNotFoundError in profiling_camelyon_pipeline.ipynb`

KumoLiu opened this issue · 5 comments

08:55:28  Running ./pathology/tumor_detection/ignite/profiling_camelyon_pipeline.ipynb
08:55:28  Checking PEP8 compliance...
08:55:29  Running notebook...
08:55:37  MONAI version: 1.3.0+57.gd7137cf4
08:55:37  Numpy version: 1.22.2
08:55:37  Pytorch version: 2.1.0a0+29c30b1
08:55:37  MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
08:55:37  MONAI rev id: d7137cf410617f4ec54c621d9511e982855a2892
08:55:37  MONAI __file__: /home/jenkins/agent/workspace/Monai-notebooks/MONAI/monai/__init__.py
08:55:37  
08:55:37  Optional dependencies:
08:55:37  Pytorch Ignite version: 0.4.11
08:55:37  ITK version: 5.3.0
08:55:37  Nibabel version: 5.2.0
08:55:37  scikit-image version: 0.22.0
08:55:37  scipy version: 1.11.1
08:55:37  Pillow version: 9.2.0
08:55:37  Tensorboard version: 2.9.0
08:55:37  gdown version: 4.7.1
08:55:37  TorchVision version: 0.16.0a0
08:55:37  tqdm version: 4.65.0
08:55:37  lmdb version: 1.4.1
08:55:37  psutil version: 5.9.4
08:55:37  pandas version: 1.5.2
08:55:37  einops version: 0.6.1
08:55:37  transformers version: 4.36.2
08:55:37  mlflow version: 2.9.2
08:55:37  pynrrd version: 1.0.0
08:55:37  clearml version: 1.14.0rc0
08:55:37  
08:55:37  For details about installing the optional dependencies, please visit:
08:55:37      https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies
08:55:37  
08:55:39  papermill  --progress-bar -k python3
08:55:40  /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
08:55:40    warnings.warn(
08:56:12  
Executing:   0%|          | 0/19 [00:00<?, ?cell/s]
Executing:   5%|▌         | 1/19 [00:01<00:27,  1.50s/cell]
Executing:  32%|███▏      | 6/19 [00:20<00:46,  3.56s/cell]
Executing:  42%|████▏     | 8/19 [00:28<00:40,  3.65s/cell]
Executing:  53%|█████▎    | 10/19 [00:30<00:25,  2.80s/cell]
Executing:  53%|█████▎    | 10/19 [00:32<00:29,  3.23s/cell]
08:56:12  /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:153: UserWarning: the file is not specified with any extension : -
08:56:12    warnings.warn(
08:56:12  Traceback (most recent call last):
08:56:12    File "/usr/local/bin/papermill", line 8, in <module>
08:56:12      sys.exit(papermill())
08:56:12    File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1157, in __call__
08:56:12      return self.main(*args, **kwargs)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1078, in main
08:56:12      rv = self.invoke(ctx)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 1434, in invoke
08:56:12      return ctx.invoke(self.callback, **ctx.params)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/click/core.py", line 783, in invoke
08:56:12      return __callback(*args, **kwargs)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/click/decorators.py", line 33, in new_func
08:56:12      return f(get_current_context(), *args, **kwargs)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/papermill/cli.py", line 254, in papermill
08:56:12      execute_notebook(
08:56:12    File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 134, in execute_notebook
08:56:12      raise_for_execution_errors(nb, output_path)
08:56:12    File "/usr/local/lib/python3.10/dist-packages/papermill/execute.py", line 241, in raise_for_execution_errors
08:56:12      raise error
08:56:12  papermill.exceptions.PapermillExecutionError: 
08:56:12  ---------------------------------------------------------------------------
08:56:12  Exception encountered at "In [3]":
08:56:12  ---------------------------------------------------------------------------
08:56:12  AttributeError                            Traceback (most recent call last)
08:56:12  Cell In[3], line 4
08:56:12        2 dataset_url = "https://drive.google.com/uc?id=1uWS4CXKD-NP_6-SgiQbQfhFMzbs0UJIr"
08:56:12        3 dataset_path = "training.csv"
08:56:12  ----> 4 gdown.download(dataset_url, dataset_path, quiet=False)
08:56:12        6 # Download images
08:56:12        7 # by default the images expect to be under training/images/
08:56:12        8 image_dir = os.path.join("training", "images", "")
08:56:12  
08:56:12  File /usr/local/lib/python3.10/dist-packages/gdown/download.py:259, in download(url, output, quiet, proxy, speed, use_cookies, verify, id, fuzzy, resume, format)
08:56:12      255     content_disposition = six.moves.urllib_parse.unquote(
08:56:12      256         res.headers["Content-Disposition"]
08:56:12      257     )
08:56:12      258     m = re.search(r"filename\*=UTF-8''(.*)", content_disposition)
08:56:12  --> 259     filename_from_url = m.groups()[0]
08:56:12      260     filename_from_url = filename_from_url.replace(osp.sep, "_")
08:56:12      261 else:
08:56:12  
08:56:12  AttributeError: 'NoneType' object has no attribute 'groups'

Same to #1598.
The link to the dataset is broken, which points to a wrong .txt file.
Hi Stephen, could you please help take a look at this? If the link is changed, I could help update it. Thanks!
@aylward

May not be due to permission but the issue described here
And I tried again at my local, it can be downloaded successfully.

The gdrive files used in that jupyter notebook are set so that anyone with the link can view them. See figure below.

The strange thing is that the file being downloaded (https://drive.google.com/uc?id=1uWS4CXKD-NP_6-SgiQbQfhFMzbs0UJIr) is only 149 KB - the small annotation file. The issue linked by @KumoLiu seems more related to large collections of files, although perhaps the issue can still happen from an accumulation of messages in the cookie that happen from repeated downloads across multiple sessions rather than from a single large download? Interesting. Glad it works from local. Might have been a glitch in the matrix :)

image

Is there something wrong with assumptions about what the "Content-Disposition" value is the response headers? The value that I get by playing around with requests is 'attachment; filename="tumor_091.annotation.txt"' for that header entry. This doesn't appear to match the regex pattern here in the library. This has already been raised here on gdown itself.

Pin the gdown version as a workaround, close for now.
Project-MONAI/MONAI#7384