Hi. Great job, I just have question about increasing inference speed

If I get it right - this code processed one crop at the time using for, so no optimization of this part? Because now I don't see any difference with the sahi library

To increase the inference speed, it is necessary to reduce the number of patches that are created. To make this easier and adaptable in the class MakeCropsDetectThem, there is an argument show_crops=True.

Regarding speed, you are correct; it is not exceptionally high because patch-based inference technologies are primarily developed not for real-time tasks but for projects involving high-resolution photo processing.

Our key distinction from SAHI: support for instance segmentation tasks with two levels of detection quality (less accurate but without a burden on the operational memory, more resource-intensive but accurate), an improved algorithm for suppressing duplicate crop intersections (thanks to additional sorting by the sizes of detected objects), a user-friendly interface with extensive options for selecting optimal parameters, and support for the most current models (everything provided by ultralytics: YOLOv8, YOLOv8-seg, YOLOv9, YOLOv9-seg, YOLOv10, FastSAM, and RTDETR).

@poriop Good afternoon!

I have some great news โ€“ the new version of the library now includes the ability to process all patches (crops) in one batch, which significantly increases inference speed. To take advantage of this functionality, you need to update the library ->

pip install --upgrade patched_yolo_infer

and then, when initializing the MakeCropsDetectThem class, pass the parameter batch_inference=True. This improvement has increased fps by approximately 1.5 times.

To use the library for processing video streams, you first need to configure all parameters on one frame (adjust the patch size based on the original image size and required accuracy) and then apply them to all frames in the stream. To conveniently set everything up initially, I recommend using the show_crops=True parameter when initializing the MakeCropsDetectThem class to get an image with a clear example of the patches obtained.

show_crops=True example

Logically, the more patches there are, the slower the individual inference. For example: With 16 patches from one frame on an RTX 3080 Laptop, it is possible to achieve 8 fps for yolov8 detection and yolov8-seg instance segmentation.

Video result example:

Important information: Additionally, the latest update to the library has added the capability to input any converted ultralytics detection and instance segmentation model into TensorRT, which further increases fps by another 1.5 times (more than 12 fps for detection task with 16 patches). When converting, you need to specify the size of the number of generated patches in the batch parameter:

!yolo export model=yolov8m.pt format=engine half=True device=0 batch=16

Below is an example of how you can write code to implement video stream processing and save the final processed video with the results of patch-based instance segmentation inference (as in the example from the GIF of the previous comment):

