brandondube/linalg

Idea for refactor

Opened this issue · 5 comments

I understand linalg has the purpose to be a compact linear algebra memory. This is something that also is of use to me since gonum does not see binary bloat and overhead as an issue that needs fixing.

However, the low-levelness of this library is too far down for me to reach. I have created a fork with a draft of what works for me. I don't want to convince you to change this repo if my fork is not in line with your vision, since I understand you deal with real-time stuff.

However, if you find that maybe with a few changes my fork can be acceptable for use in your applications maybe we can merge some of it to this repo and see how it goes from there. As I said, it's all good either way :)

Main differences With this repo:

  • Types for matrices and vectors. Functions are then sectioned off as methods.
  • Transpose function renamed to T

Some pointers on my vision:

  • Let users provide their own scratch and don't have package wide scratch pools.
  • Loosely coupled interfaces. Like I mentioned in the issue above, avoid coupling interfaces to the package, this results in binary bloat and long compile times
  • Have types for dense, sparse matrices, vectors.
  • Correctness: Avoid letting users do incorrect operations. Right now this repo lets users operate on overlapping memory!
  • Compactness: Be usable with tinygo with small binary size. Prefer compact algorithms to optimized algorithms.

Some things I am willing to reconsider

  • Maybe the DenseM type could be of underlying type float64[][], worth benchmarking to see if performance hit is big, probably is not

Let me know what you think!

Size comparisons. Keep in mind I have two more functions (RowView,ColView). I'd say they are pretty similar or identical in size. (3kB flash is hardly identical)

soypat/linalg# tinygo test  -o=test.bin -size=short .
   code    data     bss |   flash     ram
 225872   62744    3840 |  288616   66584
brandondube/linalg# tinygo test  -o=test.bin -size=short .
  code    data     bss |   flash     ram
222775   62696    3840 |  285471   66536

I have the following thoughts:

  • Transpose -> T should be paired with Eye -> I if the intent to reduce verbosity
  • I do not think types for matricies and vectors provide value; actually, I think there are two elements of anti-value:
    • having to use an At accessor is is no good. I can't think of any examples of a successful numerics environment that works this way; Fortran, Matlab, Numpy, etc, all allow some kind of bracket based indexing. This is successful design to replicate.
    • Perhaps something like type Mat2 [][]float64 is less verbose, but not by very many keystrokes. Something like this would end in the direction of Rust's analog to Numpy, where there's Mat2, Mat3, Mat4, [...].
    • By simply using slices as the common type is compatible with any other code that works this way. For example, you might have a camera that emits struct Image {data []uint16, w, h int}; then you convert the DNs to floats to do some math on them. There are linear algebra operations that view the image as a vector ([]float64) but there are operations that view it as a matrix. For example, any linear least squares problem wants the image as a vector. If you have to copy the data to view it as a vector vs as an array, you've actually reduced performance to the point that the tool is not viable for real-time use.
  • There is nothing incorrect about operating on overlapping memory. While it can be taken to the level of a dark art (numpy.lib.stride_tricks), there are many, many use-cases where operating on overlapping memory is desirable. A (type) system that precludes that requires more copies, which hurts performance.

About things that do not yet exist:

  • At least what I want this for, has no use for sparse arrays. In general, any hypothetical linalg/sparse package is going to have even more code than the non-sparse package. You might want lower and upper triangular matrices for something like a QR factorization routine, etc, but at the sizes of arrays I can think of for this package, the difference in runtime is probably not going to be worth writing all the extra code. The only serious exception would be a DiagonalMatProd function or similar, which takes a vector for the diagonal elements of a diagonal matrix. You would want this in e.g. using an SVD to compute a psuedo-inverse.
  • I never envisioned that this package would have some kind of internal scratch semantics. None of the things linalg is being an alternate to (~= LAPACK) work that way. Libraries that do allocate their own scratch space typically allocate, use, free. There are packages that have their own memory pool (e.g., numpy, cupy) but that is just to reduce "actual" memory allocations and not tied directly to scratch space.

When I look at your changes, they just look like the pure Go implementation of Gonum. I would suggest instead that you fork Gonum and just start deleting everything that isn't pure Go until you get the binary size down to where you want it.

Alright! Good to see we can talk about it. I'm not sure I agree with you on all the things you put forth though. I'm going to go ahead and document them here for posterity/good measure, whichever one comes last.

having to use an At accessor is is no good. [...] bracket based indexing. This is successful design to replicate.

Yes, it's true bracket based indexing is king, there's no doubt about it. Sadly Go does not as of yet support multidimensional slicing (a.k.a strided slices). See golang/go#13253. It seems they will not be supported. That means you won't exactly get the same functionality as these tools you mentioned. To get the same functionality one needs these ColView and RowView functions I've added, and a special strided vector type (DenseV).

As for the At accesor: I've mainly included it to create the Matrix interface to be able to define other kinds of matrices, such as Sparse matrices. Overall, if there were true brace strided access to [][]T types then we wouldn't be having this conversation, but alas, bracket indexing is very weak in Go. Go arguably shines with interfaces. Define an Eye(1000) with the Matrix interface and total space taken is 8 bytes. Try the same in matlab with G=eye(1000);whos G and what you get is 8000000 bytes. You can take this a step further and compose matrices easily (or lazily generate the elements) and put Matlab's memory usage to shame. So yes, At is fugly but gonum wasn't totally wrong with this abstraction. It's a powerful one. They just had to go ahead and add the darn T method gosh darn them.

Mat2

meh, I'd let higher-than-second-order tensors be their own type. I don't use them much, or when I do they usually have a second-order model representation 🤷 As for the [][]T underlying type representation, I think the only advantage would be "cleaner" look for users, as the internals of what I envision would still use the Matrix interface for operations due to the benefits outlined above and below.

If you have to copy the data to view it as a vector vs as an array,

I don't think I follow you? You may operate on the Image as one wishes. What's more you don't have to do any copying if the Image implements the Matrix interface. That's the beauty of it, You can pre-transform the image by implementing the Matrix interface with any pre-processor you'd like with (dare I say) global minimum memory usage and copying. i.e.

func (m Gauss) At(i, j int) float64 {
	current := float64(m.data[i*m.w+j])
	up := float64(m.data[max(i-1, 0)*m.w+j])
	left := float64(m.data[i*m.w+max(j-1, 0)])
	gaussUp := current * up / 2
	gaussLeft := current * left / 2
	return gaussUp + gaussLeft
}

Put a basic cache in front of those mem accesses and pretty soon you're cooking with electricity, or so I'd believe. You just avoided allocating w*h float64s onto the heap!

You can do the same implementing the Vector interface for vector access.

There is nothing incorrect about operating on overlapping memory.

That is true as long as you are not writing where you are reading during the operation. MatMul and VecProd can suffer from this if the user is not careful. Here's a test that fails silently due to this behaviour. Feel free to add it as MatMul is not yet tested afaict

func TestMatMul(t *testing.T) {
	shared := NewDenseMatrix(3, 3, []float64{
		26, 37, 48,
		16, 23, 30,
		6, 9, 12,
	})
	sharedSq := NewDenseMatrix(3, 3, []float64{
		1556, 2245, 2934,
		964, 1391, 1818,
		372, 537, 702,
	})
	for _, tC := range []struct {
		A, B, expect [][]float64
	}{
		{
			A: NewDenseMatrix(3, 2, []float64{
				6, 5,
				4, 3,
				2, 1,
			}),
			B: NewDenseMatrix(2, 3, []float64{
				1, 2, 3,
				4, 5, 6,
			}),
			expect: NewDenseMatrix(3, 3, []float64{
				26, 37, 48,
				16, 23, 30,
				6, 9, 12,
			}),
		},
		{
			A:      shared,
			B:      shared,
			expect: sharedSq,
		},
	} {
		got := MatMul(tC.A, tC.B, nil)
		if !matrixEqual(got, tC.expect) {
			t.Error("bad result", got, tC.expect)
		}
	}
	// test shared memory
	got := MatMul(shared, shared, shared)
	if !matrixEqual(got, sharedSq) {
		t.Error("bad shared memory result", got, sharedSq)
	}
}

I'm not saying don't allow using overlapping memory, just saying if the result is invalid the program should let the user know. Or else you're going to find yourself in a world of hurt when debugging these things.

I would suggest instead that you fork Gonum and just start deleting everything that isn't pure Go until you get the binary size down to where you want it.

gooby pls. I dunno if you've looked at gonum, but that stuff is not even close to linalg. It's a framework basically. Worth mentioning gonum also has a pure go implementation. It's called gonum, but with the noasm tag.

I was actually flabbergasted when I found that tinygo readily compiles the whole lot of the blas and lapack package. That means most LAPACK routines are readily usable with microcontrollers without needing a Makefile system. Dunno, thought it was a pretty cool fact.

Anyways, I'll get out of your hair, I'll probably regret my decision of making gonum-lite in the future eventually and come crawling back to this repo. Godspeed Brandon!

Yes, it's true bracket based indexing is king, there's no doubt about it. Sadly Go does not as of yet support multidimensional slicing (a.k.a strided slices)

C does not either, but numpy does. The slicing "engine" is about 3,000 lines of C. Python's dunders enable it to exist in ergonomic form, but there is an implementation deep down.

As for the At accesor: I've mainly included it to create the Matrix interface to be able to define other kinds of matrices, such as Sparse matrices. Overall, if there were true brace strided access to [][]T types then we wouldn't be having this conversation, but alas, bracket indexing is very weak in Go. Go arguably shines with interfaces. Define an Eye(1000) with the Matrix interface and total space taken is 8 bytes. Try the same in matlab with G=eye(1000);whos G and what you get is 8000000 bytes. You can take this a step further and compose matrices easily (or lazily generate the elements) and put Matlab's memory usage to shame. So yes, At is fugly but gonum wasn't totally wrong with this abstraction. It's a powerful one. They just had to go ahead and add the darn T method gosh darn them.

It is true there are certain party tricks that can be done with interfaces, but many decades of computing have been done without those tricks, with more than adequate performance. In your identity matrix example, the same hypothetical DiagonalMatVecProd reduces the storage from 8 MB to 8 KB. Chances are if 8 KB is prohibitive in space, it is also prohibitive in time. And if not, you are in an awfully unusual circumstance where some specialized code is a perfectly viable solution. Many years ago in python I hid all sorts of computations behind @property on classes. After rewriting all of that code without dynamically computed properties like that, the performance was 8x greater. That experience taught me to just write plain dumb code without the tricks, although it makes the style more fortran-esque, which not everyone necessarily likes.

For example, try doing a matrix vector product with your At() identity and a vector of some numbers. Compare it to the naive approach, and to one that is smart and just computes the elementwise product between the vector of diagonal elements and the vecrtor to be "MatVecProd" 'd. You'll probably find that the plain dumb code is twice as fast as At() and that the pseudo-sparse code is few times faster than the plain dumb code.

I'd let higher-than-second-order tensors be their own type. I don't use them much, or when I do they usually have a second-order model representation

speaking at least for computational diffraction, up to 5D arrays of complex-valued numbers are used. 3D cubes are very very common, 4D structures a bit less common, and 5D again a bit less common than 4D. 5D arises when you have a thin film stack which has some spatial structure (2D, or 3D if you count the stack layers) and you want to compute the complex reflectance or transmittance as a function of wavelength (now 4D), and if you want both s and p polarizations it is a 5D output.

Put a basic cache in front of those mem accesses and pretty soon you're cooking with electricity, or so I'd believe. You just avoided allocating w*h float64s onto the heap!

At least looking at the body of that function, most any cache lookup is as expensive as the operation itself and will be no faster.

I'm not saying don't allow using overlapping memory, just saying if the result is invalid the program should let the user know. Or else you're going to find yourself in a world of hurt when debugging these things.

Conversely, the more safety rails you put on the code the slower it goes. I did not make this package for math with training wheels; I made it to have code for real time systems. Written or not, the way I use it is to prototype in python and then convert the implementation or Go, and my silent expectation is that others use it this way too.


I never had grand visions for this package. I made it because I needed a few of the functions in it now, and will probably put more things in it in the future (or accept PRs, etc) as more is needed. But I never intended there to be anything other than dumb as a bag of bricks easy to understand code in it, because it isn't trying to be something like Numpy. Anything more clever requires careful design, and is subject to being reverted later anyway. Unfortunately of all of the "hobby" code I write prysm is the most serious, and I have not even had the motivation/desire to finish v0.21 of that in the last year or so. I just do not have the cycles to spend any time on linalg right now.

It is true there are certain party tricks that can be done with interfaces

You got me heh, I'm all about those party tricks.

I just do not have the cycles to spend any time on linalg right now.

Gosh dern, I need a vacation myself too