Inference speed

Question

Inference speed

poriop opened this issue 5 months ago · 5 comments

Hi. Great job, I just have question about increasing inference speed

If I get it right - this code processed one crop at the time using for, so no optimization of this part? Because now I don't see any difference with the sahi library

Answer 1 · 2024-05-28T07:48:09.000Z

To increase the inference speed, it is necessary to reduce the number of patches that are created. To make this easier and adaptable in the class MakeCropsDetectThem, there is an argument show_crops=True.

Regarding speed, you are correct; it is not exceptionally high because patch-based inference technologies are primarily developed not for real-time tasks but for projects involving high-resolution photo processing.

Our key distinction from SAHI: support for instance segmentation tasks with two levels of detection quality (less accurate but without a burden on the operational memory, more resource-intensive but accurate), an improved algorithm for suppressing duplicate crop intersections (thanks to additional sorting by the sizes of detected objects), a user-friendly interface with extensive options for selecting optimal parameters, and support for the most current models (everything provided by ultralytics: YOLOv8, YOLOv8-seg, YOLOv9, YOLOv9-seg, YOLOv10, FastSAM, and RTDETR).

Answer 2 · 2024-06-18T14:02:43.000Z

Thanks for the great library @Koldim2001!

I am the creator of SAHI and an advisor at Ultralytics. I wanted to share some updates: SAHI now supports instance segmentation with a memory-efficient implementation for Ultralytics models and includes non-maximum-merging (NMM) for eliminating duplicate detections. 👍🏻

Answer 3 · 2024-06-18T16:24:52.000Z

@fcakyon It is very nice to receive a positive review from you. My colleague and I were inspired by your project when creating this library! Thanks for adding instance segmentation. We will be in touch! 👍🏻

Answer 4 · 2024-07-01T17:46:37.000Z

@poriop Good afternoon!

I have some great news – the new version of the library now includes the ability to process all patches (crops) in one batch, which significantly increases inference speed. To take advantage of this functionality, you need to update the library ->

pip install --upgrade patched_yolo_infer

and then, when initializing the MakeCropsDetectThem class, pass the parameter batch_inference=True. This improvement has increased fps by approximately 1.5 times.

To use the library for processing video streams, you first need to configure all parameters on one frame (adjust the patch size based on the original image size and required accuracy) and then apply them to all frames in the stream. To conveniently set everything up initially, I recommend using the show_crops=True parameter when initializing the MakeCropsDetectThem class to get an image with a clear example of the patches obtained.

Logically, the more patches there are, the slower the individual inference. For example: With 16 patches from one frame on an RTX 3080 Laptop, it is possible to achieve 8 fps for yolov8 detection and yolov8-seg instance segmentation.

Video result example:

Important information: Additionally, the latest update to the library has added the capability to input any converted ultralytics detection and instance segmentation model into TensorRT, which further increases fps by another 1.5 times (more than 12 fps for detection task with 16 patches). When converting, you need to specify the size of the number of generated patches in the batch parameter:

!yolo export model=yolov8m.pt format=engine half=True device=0 batch=16

Answer 5 · 2024-07-01T17:52:55.000Z

Below is an example of how you can write code to implement video stream processing and save the final processed video with the results of patch-based instance segmentation inference (as in the example from the GIF of the previous comment):

import cv2
from ultralytics import YOLO
from patched_yolo_infer import MakeCropsDetectThem, CombineDetections, visualize_results

# Load the YOLOv8 model
model = YOLO("yolov8m-seg.pt")  #or yolov8m-seg.engine in case of TensorRT

# Open the video file
cap = cv2.VideoCapture("video.mp4")

# Check if the video file was successfully opened
if not cap.isOpened():
    exit()

# Get the frames per second (fps) of the video
fps = cap.get(cv2.CAP_PROP_FPS)
# Get the width and height of the video frames
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

# Define the codec and create VideoWriter object
fourcc = cv2.VideoWriter_fourcc(*'mp4v')  # Codec for MP4
out = cv2.VideoWriter('output.mp4', fourcc, fps, (width, height))

while True:
    # Read a frame from the video
    ret, frame = cap.read()

    # Break the loop if there are no more frames
    if not ret:
        break

    # Detect elements in the frame using the YOLOv8 model
    element_crops = MakeCropsDetectThem(
        image=frame,
        model=model,
        segment=True,
        shape_x=640,
        shape_y=500,
        overlap_x=35,
        overlap_y=35,
        conf=0.2,
        iou=0.75,
        imgsz=640,
        resize_initial_size=True,
        show_crops=False,
        batch_inference=True,
        classes_list=[0, 1, 2, 3, 4, 5, 6]
    )

    # Combine the detections from the different crops
    result = CombineDetections(element_crops, nms_threshold=0.2, match_metric='IOS')

    # Visualize the results on the frame
    frame = visualize_results(
        img=result.image,
        confidences=result.filtered_confidences,
        boxes=result.filtered_boxes,
        polygons=result.filtered_polygons,
        classes_ids=result.filtered_classes_id,
        classes_names=result.filtered_classes_names,
        segment=True,
        thickness=3,
        show_boxes=False,
        fill_mask=True,
        show_class=False,
        alpha=1,
        return_image_array=True
    )

    # Resize the frame for display
    scale = 0.5
    frame_resized = cv2.resize(frame, (-1, -1), fx=scale, fy=scale)

    # Display the frame
    cv2.imshow('video', frame_resized)

    # Write the frame to the output video file
    out.write(frame)

    # Break the loop if 'q' is pressed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture and writer objects
cap.release()
out.release()

# Close all OpenCV windows
cv2.destroyAllWindows()