sign-language-processing/datasets

ImportError: "attempted relative import with no known parent package" when attempting to locally test rwth_phoenix2014_t

Closed this issue · 19 comments

I thought I'd try the instructions for testing from https://tensorflow.google.cn/datasets/add_dataset?hl=en#unit-test_your_dataset on a known correct dataset, rwth_phoenix2014_t. However I keep getting errors like:

> tfds build
(log truncated)
sign-language\datasets\sign_language_datasets\datasets\rwth_phoenix2014_t\rwth_phoenix2014_t.py", line 9, in <module>
 from ..warning import dataset_warning
ImportError: attempted relative import with no known parent package

Basically I've been trying to figure out how to locally test a dataset, for #29. The README for this repo at https://github.com/sign-language-processing/datasets?tab=readme-ov-file#adding-a-new-dataset describes the basic idea of how to create and register a new dataset. tl;dr you follow this guide: https://tensorflow.google.cn/datasets/add_dataset?hl=en#default_template_tfds_new. But I wanted to try and find a way to test locally, so I thought I'd try it on a dataset I know to be correct, which is when I encountered this error.

If I try adjusting the code to remove relative imports, replacing them with, e.g. from sign_language_datasets.datasets.warning import dataset_warning, then I am able to run the tfds build command without getting this error. Not yet sure if python rwth_phoenix2014_t_test.py will work, as the tfds build command estimates it will take 4 hours to finish.

Same issue occurs if I attempt to run this in dgs_corpus.

tfds --version
TensorFlow Datasets: 4.9.4+nightly

pip list shows

pip list
Package                      Version
---------------------------- ---------------------
absl-py                      1.4.0
astunparse                   1.6.3
cachetools                   5.3.2
certifi                      2023.11.17
charset-normalizer           3.3.2
click                        8.1.7
colorama                     0.4.6
dill                         0.3.8
dm-tree                      0.1.8
docopt                       0.6.2
etils                        1.6.0
flatbuffers                  23.5.26
fsspec                       2023.12.2
gast                         0.5.4
google-auth                  2.27.0
google-auth-oauthlib         1.2.0
google-pasta                 0.2.0
googleapis-common-protos     1.62.0
grpcio                       1.60.1
h5py                         3.10.0
idna                         3.6
importlib-resources          6.1.1
keras                        2.15.0
langcodes                    3.3.0
language-data                1.1
libclang                     16.0.6
marisa-trie                  0.7.8
Markdown                     3.5.2
MarkupSafe                   2.1.4
ml-dtypes                    0.2.0
numpy                        1.26.3
oauthlib                     3.2.2
opencv-python                4.5.5.64
opt-einsum                   3.3.0
packaging                    23.2
pandas                       2.2.0
pillow                       10.2.0
pip                          23.3.1
pose_format                  0.3.2
promise                      2.3
protobuf                     3.20.3
psutil                       5.9.8
pyarrow                      15.0.0
pyasn1                       0.5.1
pyasn1-modules               0.3.0
pympi-ling                   1.70.2
python-dateutil              2.8.2
python-dotenv                1.0.1
pytz                         2023.4
requests                     2.31.0
requests-oauthlib            1.3.1
rsa                          4.9
scipy                        1.12.0
setuptools                   68.2.2
sign-language-datasets       0.2.0
six                          1.16.0
tensorboard                  2.15.1
tensorboard-data-server      0.7.2
tensorflow                   2.15.0
tensorflow-estimator         2.15.0
tensorflow-intel             2.15.0
tensorflow-io-gcs-filesystem 0.31.0
tensorflow-metadata          1.14.0
termcolor                    2.4.0
tfds-nightly                 4.9.4.dev202402010044
toml                         0.10.2
tqdm                         4.66.1
typing_extensions            4.9.0
tzdata                       2023.4
urllib3                      2.2.0
webvtt-py                    0.4.6
Werkzeug                     3.0.1
wheel                        0.41.2
wrapt                        1.14.1
zipp                         3.17.0

I am on Windows 11, using Anaconda and pip is installed within that

Looks like you are running the tests using python instead of pytest
Please try using pytest instead.

I will say that in this repository, tests have been widely neglected, due to frequent changes in the dataset creations, and the need to store a small, dummy file for different dataset features.

In our other repositories, you will see tests are running in CI on every commit.

I'll try pytest!

Commands tried so far are based on the instructions in https://tensorflow.google.cn/datasets/add_dataset?hl=en#unit-test_your_dataset, which gives two commands that I noticed:

  • tfds build
  • python my_dataset_test.py

Attempted using pytest, pip installed pytest-8.0.1 into my conda env. I'm rusty on pytest so I tried a few commands and looked at https://realpython.com/pytest-python-testing/#how-to-install-pytest.

All of these gave me an identical error:

  • pytest rwth_phoenix2014_t_test.py
  • (in <the parent to where I cloned the repo datasets\sign_language_datasets\datasets\rwth_phoenix2014_t) pytest .
  • pytest . in the parent of that folder
  • in the parent of that folder

image

OK, now I get 22 errors, but from pytest we are making progress!
image

OK, digging in a bit deeper, here's an example stack trace:

_____________________________ ERROR collecting sign_language_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t_test.py ______________________________
..\..\..\miniconda3\envs\jw_sign_create\Lib\importlib\__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1126: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1126: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
<frozen importlib._bootstrap>:1204: in _gcd_import
    ???
<frozen importlib._bootstrap>:1176: in _find_and_load
    ???
<frozen importlib._bootstrap>:1147: in _find_and_load_unlocked
    ???
<frozen importlib._bootstrap>:690: in _load_unlocked
    ???
<frozen importlib._bootstrap_external>:940: in exec_module
    ???
<frozen importlib._bootstrap>:241: in _call_with_frames_removed
    ???
sign_language_datasets\datasets\__init__.py:1: in <module>
    from .aslg_pc12 import AslgPc12
sign_language_datasets\datasets\aslg_pc12\__init__.py:3: in <module>
    from .aslg_pc12 import AslgPc12
sign_language_datasets\datasets\aslg_pc12\aslg_pc12.py:26: in <module>
    class AslgPc12(tfds.core.GeneratorBasedBuilder):
<frozen abc>:106: in __new__
    ???
..\..\..\miniconda3\envs\jw_sign_create\Lib\site-packages\tensorflow_datasets\core\registered.py:209: in __init_subclass__
    raise ValueError(f'Dataset with name {cls.name} already registered.')
E   ValueError: Dataset with name aslg_pc12 already registered.

Apparently that originates from this file: https://github.com/sign-language-processing/datasets/blob/master/sign_language_datasets/datasets/__init__.py.

When this gets imported, that triggers a "register" process of some kind?

Possibly related to tensorflow/datasets#552

Another possible issue: I have sign_language_datasets installed via pip, not from source. There may be some interference going on between the two

Trying with source installation:

conda create -n sign_language_datasets_source pip python=3.10 # if I do 3.11 on Windows then there's no compatible tensorflow
# navigate to the repo
git pull # to make sure it's up to date
python -m pip install . #python -m pip ensures we're using the pip inside the conda env
python -m pip install pytest pytest-cov 
pytest

Ran this, got a bunch of errors with wanting the "dill" package.

python -m pip install dill # https://pypi.org/project/dill/
pytest

OK, I am finally getting useful errors out of pytest:
image

Submitted a pull request to fix the errors preventing PyTest tests from running.

I believe the original question is answered, I will separately make an issue/pull request with instructions on how to test a new dataset

OK, I tried doing the following and I'm getting this again: #53 (comment)

conda create -n sign_language_datasets_source pip python=3.10 # if I do 3.11 on Windows then there's no compatible tensorflow
conda activate sign_language_datasets_source 
# navigate to the repo
git pull # to make sure it's up to date
python -m pip install . #python -m pip ensures we're using the pip inside the conda env
python -m pip install pytest pytest-cov dill
pytest .

How come I'm getting the "already registered" again? I'm very confused.

Out of the blue guess without actually testing it - if one dataset might import from another dataset, it could declare the class twice