🐛 group_tolerance/clp_link_tolerance appears to be incorrectly applied

Question

🐛 group_tolerance/clp_link_tolerance appears to be incorrectly applied

jsnel opened this issue 3 years ago · 1 comments

Version information

pyglotaran version: main

Describe the bug

The group_tolerance/clp_link_tolerance appears to not be applied for the first dataset (at least in case there is more than a single dataset), and for non-overlapping datasets the behavior is, unpredictable (possible for the same reason).

To Reproduce

Steps to reproduce the behavior:

Run an example with 1 dataset, with the clp_link_tolerance set to a value (much) larger than the spacing between 2 element on the global_axis.
Run an example with at least 2 dataset, with the clp_link_tolerance set to a value a 1.01x larger than the spacing between 2 element on the global_axis. Additionally, make sure that one datasets contains 1 or more additional element on the global_axis.

In the specific example showing the issues the datasets have global_axis:
dataset1: 401.025543, 402.901245, 404.941254, 407.021240, ..., 796.960022, 798.969971 (len=159)
dataset2: 400.997772, 402.833344, 404.940002, 407.091125, ..., 796.099976, 798.049988 (len=160)
and with clp_link_tolerance=1.9 this gives problem (also with any other values, but different errors)

To test I would simulate data with a fixed offset between the global_axis plus a small random offset, e.g.
np.linspace(400,800,51)+(4*np.random.random_sample((1,51))-2)
np.linspace(404,800,51)+(4*np.random.random_sample((1,51))-2)
with clp_link_tolerance=3.9
which ensures a random grouping across the board.

Error messages

A non-exaustive list of error messages

ValueError: conflicting sizes for dimension 'spectral': length <some_number_larger_than_global_axis_1> on 'matrix' and length <length_global_axis_1> on {'time': 
'time', 'spectral': 'spectral', 'left_singular_value_index': 'data_left_singular_vectors', 'singular_value_index': 'data_singular_values', 'right_singular_value_index': 'data_right_singular_vectors'}

    full_index_clp_labels = full_clp_labels[i + offset]
IndexError: list index out of range

>
    mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
ValueError: 'c1' is not in list

Expected behavior

If clp_link_tolerance is set, it should affect grouping across all datasets.
If the two dataset do not have equidistant global_axis, and/or an offsetted axis, pyglotaran should not crash

Screenshots

Additional context

Tracebacks

  File "NTDCK_CK_694_40_20uWQA.py", line 65, in 
main
    result = optimize(scheme)
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
    return _create_result(
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
    group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 327, in create_result_data
    self._calculator.prepare_result_creation()
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 420, in prepare_result_creation
    full_index_clp_labels = full_clp_labels[i + offset]
IndexError: list index out of range

NTDCK_CK_694_40_20uWQA.py", line 65, in 
main
    result = optimize(scheme)
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
    return _create_result(
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
    group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 330, in create_result_data
    result_data[label] = self.create_result_dataset(label, copy=copy)
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 345, in create_result_dataset      
    dataset = self._calculator.create_index_dependent_result_dataset(label, dataset)
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 447, in create_index_dependent_result_dataset
    dataset["matrix"] = (
  File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 1563, in __setitem__      
    self.update({key: value})
  File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 4208, in update
    merge_result = dataset_update_method(self, other)
  File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\merge.py", line 984, in dataset_update_method
    return merge_core(
  File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\merge.py", line 640, in merge_core
    dims = calculate_dimensions(variables)
  File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 208, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'spectral': length 234 on 'matrix' and length 159 on {'time': 'time', 'spectral': 'spectral', 'left_singular_value_index': 'data_left_singular_vectors', 'singular_value_index': 'data_singular_values', 'right_singular_value_index': 'data_right_singular_vectors'}

File "NTDCK_CK_694_40_20uWQA.py", line 65, in 
main
    result = optimize(scheme)
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
    return _create_result(
  File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
    group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 327, in create_result_data
    self._calculator.prepare_result_creation()
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 422, in prepare_result_creation
    mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
  File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 422, in <listcomp>
    mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
ValueError: 'c1' is not in list

Answer 1 · 2022-04-10T14:52:44.000Z

After investigation the I found out that the error you have here is that 2 values of the second axis map to the same value in the first axis. in #1060 you will get a proper error for that plus a way to cope it namely forward and backward alignment which should help here.