ajhynes7/scikit-spatial

Return metric indicating the "goodness" of a best-fit function

Closed this issue · 6 comments

I've noticed that, while the several best_fit functions for e.g. a Cylinder and other shapes return the respective object, there is currently no metric or function that could help determining the "goodness" of the for algorithm used.

The current situation is as follows: I have a point cloud, but I do not know in advance whether it is of a cylinder or a cuboid. In my very situation, the Cuboid luckily is always aligned with the axes of the Cartesian coordinate system, so I created a new class that just returns a Cuboid object which essentially is the bounding box of the point cloud.

So if I don't know the shape in advance, I would use some kind of metric that indicates how good the respective fit was. I can probably make this metric for an always-aligned Cuboid by checking how many points are within a specific distance to the nearest side plane of the cuboid, but I don't know if there is a plan to add something similar to e.g. the Cylinder.

An example can be seen in one of the repositories you pointed to in the best_fit function of the Cylinder class:

https://github.com/xingjiepan/cylinder_fitting
It returns the following values for its fit function:

from cylinder_fitting import fit
w_fit, C_fit, r_fit, fit_err = fit(data)

In this case, it would be the likely the "fitting error" (fit_err) that I'm interested in.

Would love to see this!

For my current situation, I've managed to find a solution that fits my need. Though this currently only works for the Cylinder and my custom Cuboid class:

https://stackoverflow.com/questions/79204030/determine-most-accurate-best-fit-algorithm-for-scattered-points

It would still be nice to have fitting errors returned as well, or adding a function for the shape that calculates it once and returns it directly via its getter.

If you want to define an error between points and any given line, a simple method that comes to mind is to sum up the distances (or squared distances) between the points and the line.

Sum of squared distances:

sum([Line.distance_point(point) ** 2 for point in points])

But if you specifically want an error for the line of best fit, I think we can get that from SVD (which is used in Line.best_fit). I can work on adding that to the Line.best_fit method if you'd like.

Hi! @ajhynes7 I really would like to have the error returned as well. I am available to help. Would it be fine to have the classmethod returning a tuple with the instance and the error value? Or would you suggest a different approach? Thanks!

I'll work on it this week. I'm planning to add a boolean kwarg to the best_fit method to toggle returning a tuple.

@ajhynes7 Would you add this new boolean kwarg to all best_fit methods, including e.g. Cylinder?

I'm just modifying Line.best_fit and Plane.best_fit as they use SVD. I'm not as familiar with the implementation of Cylinder.best_fit as @CristianoPizzamiglio wrote it.