PeculiarVentures/GammaCV

Documenting perspectiveProjection

jamesmfriedman opened this issue · 7 comments

I will happily add this to the documentation, but I need help figuring out how to use it ;).

Basically I want to extract a quadrilateral region from an image and perspective transform it to be a rectangle. Per my other open issue, I've been pointed to the undocumented gm.perspectiveProjection.

Per this operation

export default (tSrc, tTransform, shape = [10, 10, 4], dtype = tSrc.dtype) => new RegisterOperation('PerspectiveProjection')

I'm doing a lot of guessing here, but I'm assuming you use the perspective transform util to pass the transform into the perspectiveProjection op.

const input = await gm.imageTensorFromURL(...); // 1024 x 1024 input image

// I get my quadrilateral by some method... TL, TR, BL, BR.
const pts = [85.03672790527344, 228.44911193847656, 893.9627685546875, 234.818603515625, 49.670997619628906, 758.93505859375, 982.0530395507812, 724.961669921875];

// The Rect TL, TR, BR, BL
const rect = new gm.Rect(
	pts[0][0],
	pts[0][1],
	pts[1][0],
	pts[1][1],
	pts[3][0],
	pts[3][1],
	pts[2][0],
	pts[2][1]
);

const transform = gm.generateTransformMatrix(
	rect, // pass the rect
	[1024, 1024], // this is bounds? I am assuming an x and y of maxWidth and maxHeight
  new gm.Tensor('uint8', input.shape) // this I am unsure of. It's called transformMatrix, but it appears to just need an empty Tensor to save the transform date into. Not sure about type or shape
);

const operation = gm.perspectiveProjection(
	input, // pretty sure about this.
	transform, // also pretty sure about this/
	transform.shape // not so sure about this
);

const output = gm.tensorFrom(operation);
sess.init(operation);
sess.runOp(operation, 0, output);
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

The good news is I'm getting some output. The bad news is it is just a giant single color canvas. Any help would be greatly appreciated.

@jamesmfriedman, here is some comments, I hope it'll help

const input = await gm.imageTensorFromURL(...); // 1024 x 1024 input image

const pts = [85.03672790527344, 228.44911193847656, 893.9627685546875, 234.818603515625, 49.670997619628906, 758.93505859375, 982.0530395507812, 724.961669921875];

// The Rect TL, TR, BR, BL (order is correct)
const rect = new gm.Rect(
	pts[0][0],
	pts[0][1],
	pts[1][0],
	pts[1][1],
	pts[3][0],
	pts[3][1],
	pts[2][0],
	pts[2][1]
);

const tTransform = new gm.Tensor('float32', [3, 1, 4]) // this is a placeholder for a transformation matrix
// generated by `gm.generateTransformMatrix`
// The detailed description of this tensor: 3 rows, 1 column, depth 4, it stores a 3x3 [affine transformation](https://en.wikipedia.org/wiki/Affine_transformation) matrix. Each row of a tensor, is a row of a matrix, and each tensor depth is a matrix column
// Example: (3x3 matrix to tensor)
// |1 0 0|
// |0 1 0| => new gm.Tensor('float32', [3, 1, 4], new Float32Array([1, 0, 0, 0,    0, 1, 0, 0,   0, 0, 1, 0]));
// |0 0 1|

gm.generateTransformMatrix(
	rect, // pass the rect
	[480, 640], // this is an output shape you want to have [height, width],
// e.g you want your quadrilateral to be fixed to a rectangle with
// width 640 and height 480
         tTransform, // we use a placeholder instead of returning a new Tensor,
// because we want to reuse already allocated memory, and change the value
// from a call to call without reconstructing the graph
);

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640] // the output shape
);

const output = gm.tensorFrom(operation);
sess.init(operation);
sess.runOp(operation, 0, output);
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

Also, you may want to run a graph with different rect, because of real-time quadriliteral position changing:

// For this you must keep the same session and operations
gm.generateTransformMatrix(
	rect, // pass updated rect
	[480, 640], // output shape
         tTransform, // reuse already created Tensor
); // this will modify the data of tTransform
// you don't need to re-init the session
sess.runOp(operation, frameNumber, output); // frame number should be unique for each call, because the value output may be cached by this parameter
gm.canvasFromTensor(document.getElementById('my-canvas'), output);

Thank you so much for your quick response! I definitely could not have come to that on my own. This is my actual code, copied your example verbatim. Sorry if these are naive questions, I'm still trying to orient myself in the library concepts, as well as computer vision in general.

image

const rect = new gm.Rect(
        pts[0][0],
        pts[0][1],
        pts[1][0],
        pts[1][1],
        pts[3][0],
        pts[3][1],
        pts[2][0],
        pts[2][1]
      );

      const tTransform = new gm.Tensor('float32', [3, 1, 4]);

      gm.generateTransformMatrix(
        rect, // pass the rect
        [480, 640], // this is an output shape you want to have [height, width],
        // e.g you want your quadrilateral to be fixed to a rectangle with
        // width 640 and height 480
        tTransform // we use a placeholder instead of returning a new Tensor,
        // because we want to reuse already allocated memory, and change the value
        // from a call to call without reconstructing the graph
      );

      const operation = gm.perspectiveProjection(input, tTransform, [480, 640]);

      let output2 = gm.tensorFrom(operation);
      sess.init(operation);
      sess.runOp(operation, 1, output2);
      gm.canvasFromTensor(document.getElementById('contours'), output2);

@jamesmfriedman, sorry for a huge delay. There was a little mistake in my code example

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640] // the output shape
);

to

const operation = gm.perspectiveProjection(
	input, // input Tensor
	tTransform, // Transformation matrix Tensor
	[480, 640, 4] // the output shape, since shape requires a 3 components
);

Here is an example:
pespective_projection_example.zip

Thanks for the response :). I had eventually figured that out thankfully. I still need to add this to the docs.

@WorldThirteen
The output of the image transform from your example comes out a bit fuzzy (you can see jaggy edges around all the app icons, and really any UI element). Is there any way to have it come out a bit crisper?

image

@calebbergman, Hi! Thanks for the interest in GammaCV! For now, our perspective correction algorithm doesn't include antialiasing strategy, we plan to add it in the nearest future. But now that is how it works for this example, for a source with less skew, the result will be better.

@WorldThirteen
Ok. Well I'll be on the lookout for the anti-aliasing strategy implementation then ;) Thanks so much!