ONSdigital/csvcubed

Implement pandas v2.1.1

Opened this issue · 0 comments

Upgrading to pandas v2.1.0 raises the following Pyright errors:

/home/runner/work/csvcubed/csvcubed/src/csvcubed/inspect/sparql_handler/data_cube_repository.py
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/inspect/sparql_handler/data_cube_repository.py:534:21 - error: Argument of type "ArrayLike | Unknown | Any" cannot be assigned to parameter "maybe_columnar_data" of type "PandasDataTypes" in function "pandas_input_to_columnar_optional_str"
    Type "ArrayLike | Unknown | Any" cannot be assigned to type "PandasDataTypes"
      Type "ExtensionArray" cannot be assigned to type "PandasDataTypes"
        "ExtensionArray" is incompatible with "DataFrame"
        "ExtensionArray" is incompatible with "Series"
        Type cannot be assigned to type "None" (reportGeneralTypeIssues)
/home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/pandas.py
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/pandas.py:41:19 - error: Argument of type "Set[str]" cannot be assigned to parameter "na_values" of type "Sequence[str] | Mapping[str, Sequence[str]] | None" in function "read_csv"
    Type "Set[str]" cannot be assigned to type "Sequence[str] | Mapping[str, Sequence[str]] | None"
      "Set[str]" is incompatible with "Sequence[str]"
      "Set[str]" is incompatible with "Mapping[str, Sequence[str]]"
      Type cannot be assigned to type "None" (reportGeneralTypeIssues)
/home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py:134:78 - error: Cannot access member "isna" for type "ndarray[Any, Unknown]"
    Member "isna" is unknown (reportGeneralTypeIssues)
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py:134:78 - error: Cannot access member "isna" for type "NDArray[Unknown]"
    Member "isna" is unknown (reportGeneralTypeIssues)
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py:134:78 - error: Cannot access member "isna" for type "NDArray[Any]"
    Member "isna" is unknown (reportGeneralTypeIssues)
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py:141:62 - error: Cannot access member "index" for type "ndarray[Any, Unknown]"
    Member "index" is unknown (reportGeneralTypeIssues)
  /home/runner/work/csvcubed/csvcubed/src/csvcubed/utils/qb/validation/observations.py:141:62 - error: Cannot access member "index" for type "NDArray[Unknown]"
    Member "index" is unknown (reportGeneralTypeIssues)

See Other API changes for potential ideas to fix the Pyright errors.

Once fixed, revert to pandas-version: ['pandas@latest'] in the test_in_environments jobs in pull-request.yaml, release.yaml and main-push.yaml

Update 25/9/23

  • pandas v2.1.1 has already been released, so I've updated to this version.
  • Added np.ndarray and pd.api.extensions.ExtensionArray to PandasDataTypes in inputs.py which resolved the Type "ArrayLike | Unknown | Any" cannot be assigned to type "PandasDataTypes" / Type "ExtensionArray" cannot be assigned to type "PandasDataTypes" issue.
  • Converted SPECIFIED_NA_VALUES to a List in utils/pandas.py, and changed the type of na_values in def read_csv() to Sequence.

Still outstanding:

  • In utils/qb/validations/observations.py, still no solution for Cannot access member "isna" for type "ndarray[Any, Unknown]" and Cannot access member "index" for type "ndarray[Any, Unknown]" in _validate_missing_observation_values (L115)