rapidsai/cugraph-pg

`data_store/test_property_graph_mg.py::test_vertex_vector_property` shows `RuntimeError` in worker with `-s`

Opened this issue · 2 comments

data_store/test_property_graph_mg.py::test_vertex_vector_property passes, but it should not because the worker throws an exception. Test command (on a 2-GPU workstation) and output is below.

We should fix the problem that results in the exception, and also ideally fix the test so it does not pass when the worker fails this way.

data_store/test_property_graph_mg.py::test_vertex_vector_property shows a RuntimeError in worker when pytest is run with -s. This is normal behavior, but looks like a worker is failing and the test is incorrectly passing if the user is not familiar with how the test is written. The command to run the test and the output is below:

(rapids) root@c0355200a652:/Projects/cugraph# pytest -sv python/cugraph/cugraph/tests/data_store/test_property_graph_mg.py::test_vertex_vector_property
=================================================================================== test session starts ===================================================================================
platform linux -- Python 3.10.11, pytest-7.3.1, pluggy-1.0.0 -- /opt/conda/envs/rapids/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/Projects/cugraph/.hypothesis/examples')
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=1 min_time=0.000005 max_time=0 calibration_precision=10 warmup=False warmup_iterations=100000)
rapids_pytest_benchmark: 0.0.15
rootdir: /Projects/cugraph/python/cugraph
configfile: pytest.ini
plugins: xdist-3.3.1, asyncio-0.20.3, cov-4.0.0, timeout-2.1.0, anyio-3.6.2, hypothesis-6.75.3, cases-3.6.14, benchmark-4.0.0, rapids-pytest-benchmark-0.0.15
asyncio: mode=strict
collected 1 item

python/cugraph/cugraph/tests/data_store/test_property_graph_mg.py::test_vertex_vector_property 2023-06-02 18:02:30,317 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-06-02 18:02:30,317 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2023-06-02 18:02:30,317 - distributed.preloading - INFO - Creating preload: dask_cuda.initialize
2023-06-02 18:02:30,317 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

Dask client/cluster created using LocalCUDACluster
2023-06-02 18:02:45,843 - distributed.worker - WARNING - Compute Failed
Key:       ('_vector_series_to_array_partition-ea4ea07252e113b475f9f2be21958c8c', 3, 0)
Function:  subgraph_callable-9dde3b9b-4235-41a1-95d4-97380b05
args:      (3, None, 'error', 'getitem-3fffde68d8f35d9cafe178e3984f82cb', 'vec1', 'reset_index-index-8c0c5f2355f565a5cfe79bdad82c4bb8', 'index',          merchant_sales merchant_name _TYPE_  vec1  vertical        vec2
_VERTEX_
32431              <NA>          <NA>  users  None         1  [78750, 1]
78634              <NA>          <NA>  users  None         0  [47906, 0]
89021              <NA>          <NA>  users  None         0  [78757, 0]
89216              <NA>          <NA>  users  None         1  [78757, 1], 'fillna-233c2fa9633293bed6cc617234c5bd0e')
kwargs:    {}
Exception: 'RuntimeError(\'Vector property \\\'vec1\\\' has empty rows! Provide a fill value or use `missing="ignore"` to ignore empty rows.\')'

PASSED
Dask client closed.

I tracked this down to these lines in the test, which are testing that the call raises RuntimeError. The test output is showing the worker failing as intended, and the test passes because it's expecting this.

@eriknw - should this issue be closed, or do you think we need to change anything?

discussed this offline, copied with permission:

Erik: confirmed, looks like this is deliberately tested. It's unfortunate that the output shown with -s is messy from this; I wonder if the warning be better if it didn't raise a RuntimeError?

Rick: I was thinking that too, since I use RuntimeError as more of a catch-all for things the user has no control over.
I don't know if this is worth a PR or not, but another improvement could be to add prints around those assertions basically saying something along the lines of "testing missing rows, expect a RuntimeError from worker(s)", followed by a "done testing missing rows, exceptions should no longer be seen". This will be shown only when it's needed (when -s is used).
Yet another option could be to make those assertions separate tests with self-explanatory names, although I doubt that would work as well as a print.

Rick: it's tricky because the context here is a failing MG test suite, and users just scanning logs for any traceback/error as a starting point. I think a print would help but only if the user is reading closely.