This Module implements Spatial Pyramid Pooling (SPP) and Temporal Pyramid Pooling (TPP) as described in different papers.
Sudholt, Fink: Evaluating Word String Embeddings and LossFunctions for CNN-based Word Spotting
Given an 2D input Tensor, Temporal Pyramid Pooling divides the input in x stripes which extend through the height of the image and width of roughly (input_width / x). These stripes are then each pooled with max- or avg-pooling to calculate the output.
He, et. al.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition
Given an 2D input Tensor, Spatial Pyramid Pooling divides the input in x² rectangles with height of roughly (input_height / x) and width of roughly (input_width / x). These rectangles are then each pooled with max- or avg-pooling to calculate the output.