Issue/Question about the output of `lstsq`
loiseaujc opened this issue · 3 comments
Hej,
I'm really happy that the linalg
module is on its way to being production-ready. I have already played a bit with the latest additions in v0.6.0
and observed something an undesirable (in my opinion) behaviour regarding the output of lstsq
. Below is a MWE.
program main
use iso_fortran_env, only: output_unit
use stdlib_kinds, only: dp
use stdlib_linalg, only: lstsq, solve_lstsq
implicit none
integer :: m, n
real(dp), allocatable :: A(:, :), b(:), x(:), x_lstsq(:)
! Create the problem.
m = 10 ; n = 5
allocate(A(m, n)) ; call random_number(A)
allocate(x(n)) ; call random_number(x)
b = matmul(A, x)
! Solve the least-squares problem.
x_lstsq = lstsq(A, b)
write(output_unit, *) shape(A), shape(b), shape(x), shape(x_lstsq)
write(output_unit, *) "Norm of the error :", norm2(x_lstsq(1:n)-x)
end program main
When running this code, the vector x_lstsq
computed by lstsq
is of size m
rather than being of size n
. I know that, by default, *gelsd
takes A
and b
as input and overwrites the first n
entries of b
with the solution x
. I believe this is quite misleading and can potentially cause issues if the user is directly using solve_lstsq
. Calling solve_lstsq
with x
being an n
-vector instead of an m
-vector actually raises an error since the following test then fails
if (lda<1 .or. n<1 .or. ldb<1 .or. ldb/=m .or. ldx/=m) then
err0 = linalg_state_type(this,LINALG_VALUE_ERROR,'invalid sizes: a=',[lda,n], &
'b=',[ldb,nrhs],' x=',[ldx,nrhsx])
call linalg_error_handling(err0,err)
if (present(rank)) rank = 0
return
end if
in e.g. stdlib_linalg_s_solve_lstsq_one
. Are there any reasons why we should keep the output vector as an m
-vector rather than an n
-vector ? From a mathematical and practical point of view, I would expect the following behaviour:
lstsq
takes anm
xn
matrixA
and anm
-vectorb
as entry and returns ann
-vector.solve_lstsq
takes as argumentsm
xn
matrixA
, anm
-vectorb
and ann
-vectorx
.
Seems like the issue is here:
stdlib/src/stdlib_linalg_least_squares.fypp
Line 138 in 42182b0
x
should not be allocated using mold=b
but by extraction of size(A,dim=2), and if rank(b)>1 then also the size of b in the second dimension.
@perazz what are your thoughts?
Sorry I hadn't thought about this. *GELSD
require that x
is sized at least as b
, because they take the rhs [n]
on input and return x[m]
on output.
I totally agree that this is misleading. So, there is no way to avoid an additional allocation, in order to return an appropriately sized x
. The only thing we can do to save an allocation is maybe in the subroutine
interface, to let the user provide a larger x
, and only return the appropriate part in that case.
Awesome. I'll try it out as soon as it's merged. Closing the issue. Thx.