arkanis/single-header-file-c-libs

Matrix uses anonymous structs which are a gnu extension.

Closed this issue · 4 comments

Hello,

I noticed that your matrix structure uses anonymous structs in order to access data by index, or through member names, I am currently searching for the best way to make a similar C structure, but discovered that this method relies on gnu extensions, or C11. Were you already aware of this? Do you have any thoughts on how it can be avoided? I am currently leaning towards simple float arrays for all vector inputs.

Hi there,

it was actually a deliberate decision since anonymous / unnamed struct and union members are one of those ubiquitous extensions supported by the C frontends of GCC and clang and probably a host of others C compilers (e.g. TinyCC). Just like the "pragma once" directive. Unfortunately only GCC properly documents extensions for their C frontend. For clang I only found a note in the LLVM 3.2 change log that they support this. Their documentation seems to focus almost exclusively on C++.

Also note that as far as I'm aware MSVC isn't a C compiler, just C++. That gives it compatibility with C89 and bits and pieces after that but advanced C stuff will probably create problems. So this library will probably not work there. And I don't see the need to support C++ since there are better math libraries in C++ (e.g. glm).

The reasoning behind using this extension was to make the code more readable during development, more like the stuff we wrote on the whiteboard. With arrays you always have to worry about the correct memory layout and change your indices accordingly. We also had great trouble just working with indices on the whiteboard and in the code. It's damn easy to accidentally flip one here and there. And if you additionally have to worry about flipping them for the right memory layout it can really screw your head.

We used the named fields so we didn't have to worry about the memory layout. We defined them with the names used in the mathematical matrix definition. So we could write proper algorithms and move the named fields in memory later on. Sounds like a small thing but it helped to keep our heads from spinning. Accidental index flips where by far the error source number one in this project.

I would like to keep the code that way it is. But since this project is in maintenance mode right now that feature isn't all that helpful now. If there is a compelling reason to remove that extension from the source code it wouldn't hurt much. The easiest way is to give the struct a short name like "c" and use that name to access all of the structs members (e.g. replace matrix.m01 with matrix.c.m01). Not that easy to read but works.

If you're absolutely unshakably sure about your memory layout you can also use a 1D array. But the confusion on the mathematical side is bad enough. There is no universal way to access a matrix with indices or in which order the indices have to be written. We always had to double check the notation used by every source. With all that going on just one extra step to translate indices into proper array positions became a source of errors for us. I tried that in an earlier version and can't recommend it (the library here is actually the 5th rewrite). I personally would at least use a 2D array.

But I suppose this strongly depends on how much you like index based algorithms and how much training you have in juggling indices around without making errors. So use whatever works best with your way of programming.

Sorry for the wall of text. Short version: I would like to keep the code the way it is. If there is a usecase for removing the extension we can do it. For example to support a special C compiler. But I wouldn't remove it just because it's an extension.

I appreciate the time you spent to write an in depth reply. I am actually researching how to best design a vector/matrix library in C. I sent this message to see if you were aware of the fact that it is an extension and whether you had considered that in your design. You do not need to change anything in your project for me. Since you mentioned it was a deliberate decision, that answers my question.

I have found the following options (referring specifically to vector):

  1. Struct with named fields only. The main drawback is that they cannot be indexed, so no generic for loops unless you cast to a float.
  2. Typedef array of 3 floats. I don't like this approach because it appears to be a struct when it is really a array. The type cannot be returned from functions, etc.
  3. Array of 3 floats with no typedef. Similar to 2, a little uglier but at least the array type isn't hidden.
  4. Union of named fields and array (similar to what you did for matrix). The only downside here is that you either have to rely on compiler extensions or add extra typing, such as vector.v.x. (as you mentioned)

As you decided in your project, the best option is probably 4 with compiler extensions. I agree that it is unlikely that a major compiler would have a problem, however it still feels a little shaky to me (probably an unjustified fear).

I agree that your matrix design of including named fields along with the array is very convenient. I personally prefer to use a 1D array and then create a macro for 2d indexing. I have no problem with indexing tricks for a 1D array.

Thank you for the reply. Let me know if you have any other thoughts.

Ah, now I think I understand the original scope of your question. :)

For vectors I've pretty much always went with option 1 so I lack experience with the other options. In computer graphics definitions of vector operations are often written explicitly, written element by element. Meaning they don't use loops that often to define vector operations. Or if they do they also write a non-loop element by element version.

But I've only looked at 2D and 3D math for computer graphics. In higher dimensions this will be quite different I guess.

With matrices that's another story. There I've tried option 1, done 2, tried 3 and settled with 4.

Option 1 went out of the window pretty early because the matrix multiplication is defined via 2 nested loops and I wanted to keep it that way. Today I'm no longer so sure about that. It's nice to loop over matrix elements but ultimately we're currently only using that in the matrix multiplication and matrix print implementations. No where else I think. In all other cases we've written the matrices on the whiteboard and worked with the matrix elements by name. m4_invert_affine() is a good example of that.

Option 2 is pretty nice if you have many loop-based algorithms or very large matrices (or very large vectors I guess). The big difference between it and struct based options (1 & 4) is that you only pass the array pointer to functions. The user has to allocate the array somewhere. The user also has to keep track of storage needed for temporary vector or matrix variables. This reference semantics also makes it difficult to nest function calls so it's easier to just return nothing and write each operation on an extra line. But you can image that this makes complex mathematical operations quite long winded stuff to read.

That reference semantic was the main reason we choose a struct instead (option 4). Structs are copied on the stack so we can return new matrices by value and the user can nest function calls easily without overwriting anything by accident. This is ok for a 4x4 matrix (needs 4 of the 16 SSE registers) but won't work for very large dimensions.

For higher dimensions you think you have to expose the user to some kind of memory management. Either you let the user do it by hand or you malloc() new vectors within functions. But then the user as to free them later on. Reference counting might work since you won't get cyclic dependencies (unless you can create and combine slices). An library internal stack of vectors might also work. The user would then only work with indices to that stack. This allows your library to do the memory management but would force the user to write stack based code (push a, then b, then add them, etc.). I have done either of the two so I have no idea if that works. But mind you, this is only stuff relevant for very large vectors with millions of elements where memory management is a primary concern.

In the end I would take into account how you want users to write code with the library. Do you want to expose memory management or use the stack? Should users be able to nest function calls? If that's not enough to decide on an approach then I would look at the mathematical definitions. Do they use loops often to define operations? If not you might get by without index access.

I hope that isn't to confusing...

I appreciate you making such a thorough response. Your insight is very helpful.

I think I just need to keep it simple and concrete when using C and leave the general "solve everything" solutions to C++.