๐ group_tolerance/clp_link_tolerance appears to be incorrectly applied
jsnel opened this issue ยท 1 comments
Version information
- pyglotaran version: main
Describe the bug
The group_tolerance/clp_link_tolerance appears to not be applied for the first dataset (at least in case there is more than a single dataset), and for non-overlapping datasets the behavior is, unpredictable (possible for the same reason).
To Reproduce
Steps to reproduce the behavior:
-
Run an example with 1 dataset, with the clp_link_tolerance set to a value (much) larger than the spacing between 2 element on the global_axis.
-
Run an example with at least 2 dataset, with the clp_link_tolerance set to a value a 1.01x larger than the spacing between 2 element on the global_axis. Additionally, make sure that one datasets contains 1 or more additional element on the global_axis.
In the specific example showing the issues the datasets have global_axis:
dataset1: 401.025543, 402.901245, 404.941254, 407.021240, ..., 796.960022, 798.969971 (len=159)
dataset2: 400.997772, 402.833344, 404.940002, 407.091125, ..., 796.099976, 798.049988 (len=160)
and with clp_link_tolerance=1.9 this gives problem (also with any other values, but different errors)
To test I would simulate data with a fixed offset between the global_axis plus a small random offset, e.g.
np.linspace(400,800,51)+(4*np.random.random_sample((1,51))-2)
np.linspace(404,800,51)+(4*np.random.random_sample((1,51))-2)
with clp_link_tolerance=3.9
which ensures a random grouping across the board.
Error messages
A non-exaustive list of error messages
ValueError: conflicting sizes for dimension 'spectral': length <some_number_larger_than_global_axis_1> on 'matrix' and length <length_global_axis_1> on {'time':
'time', 'spectral': 'spectral', 'left_singular_value_index': 'data_left_singular_vectors', 'singular_value_index': 'data_singular_values', 'right_singular_value_index': 'data_right_singular_vectors'}
full_index_clp_labels = full_clp_labels[i + offset]
IndexError: list index out of range
>
mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
ValueError: 'c1' is not in list
Expected behavior
If clp_link_tolerance
is set, it should affect grouping across all datasets.
If the two dataset do not have equidistant global_axis, and/or an offsetted axis, pyglotaran should not crash
Screenshots
Additional context
Tracebacks
File "NTDCK_CK_694_40_20uWQA.py", line 65, in
main
result = optimize(scheme)
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
return _create_result(
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 327, in create_result_data
self._calculator.prepare_result_creation()
File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 420, in prepare_result_creation
full_index_clp_labels = full_clp_labels[i + offset]
IndexError: list index out of range
NTDCK_CK_694_40_20uWQA.py", line 65, in
main
result = optimize(scheme)
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
return _create_result(
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 330, in create_result_data
result_data[label] = self.create_result_dataset(label, copy=copy)
File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 345, in create_result_dataset
dataset = self._calculator.create_index_dependent_result_dataset(label, dataset)
File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 447, in create_index_dependent_result_dataset
dataset["matrix"] = (
File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 1563, in __setitem__
self.update({key: value})
File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 4208, in update
merge_result = dataset_update_method(self, other)
File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\merge.py", line 984, in dataset_update_method
return merge_core(
File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\merge.py", line 640, in merge_core
dims = calculate_dimensions(variables)
File "C:\Anaconda3\envs\py39gta\lib\site-packages\xarray\core\dataset.py", line 208, in calculate_dimensions
raise ValueError(
ValueError: conflicting sizes for dimension 'spectral': length 234 on 'matrix' and length 159 on {'time': 'time', 'spectral': 'spectral', 'left_singular_value_index': 'data_left_singular_vectors', 'singular_value_index': 'data_singular_values', 'right_singular_value_index': 'data_right_singular_vectors'}
File "NTDCK_CK_694_40_20uWQA.py", line 65, in
main
result = optimize(scheme)
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 76, in optimize
return _create_result(
File "c:\src\pyglotaran\glotaran\analysis\optimize.py", line 137, in _create_result
group.create_result_data(parameter_history, success=success, add_svd=scheme.add_svd)
File "c:\src\pyglotaran\glotaran\analysis\optimization_group.py", line 327, in create_result_data
self._calculator.prepare_result_creation()
File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 422, in prepare_result_creation
mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
File "c:\src\pyglotaran\glotaran\analysis\optimization_group_calculator_linked.py", line 422, in <listcomp>
mask = [full_index_clp_labels.index(clp_label) for clp_label in clp_labels]
ValueError: 'c1' is not in list
After investigation the I found out that the error you have here is that 2 values of the second axis map to the same value in the first axis. in #1060 you will get a proper error for that plus a way to cope it namely forward and backward alignment which should help here.