/juxtapose

juxtapose: Multi-Person Pose Tracking Inference SDK with RTMDet, YOLOv8, GDino, RTMPose (ONNX) & Trackers (ByteTrack & BotSORT) & Tapnet with custom ROIs + FastAPI GPU exe

Primary LanguagePythonMIT LicenseMIT

🍿 Intro

Juxtapose is a 2D multi person pose detection, tracking, and estimation inference toolbox for sports + kinematics analysis. Visit Docs.

See how we integrated juxtapose into this app: Juxt Space

🍄 Overview

Code mostly adopted from four repos -> ultralytics, mmdeploy, mmdetection, mmpose.

Supported Detectors: rtmdet-s, rtmdet-m, rtmdet-l, groundingdino, yolov8
Supported Pose Estimators: rtmpose-s, rtmpose-m, rtmpose-l
Supported Trackers: bytetrack, botsort Supported Point Trackers: Tapnet

🥒 Updates

  • 2024/05/16 Remove ultralytics dependency, port yolov8 to run in ONNX directly to improve speed.
  • 2024/04/27 Added FastAPI to EXE example with ONNX GPU Runtime in examples/fastapi-pyinstaller.
  • 2024/01/11 Added Nextra docs + deployed to Vercel at sdk.juxt.space.
  • 2024/01/07 Reduce dependencies by removing MMCV, MMDet, MMPose SDK, run fully on ONNX.
  • 2023/11/01 Added juxtapose to PYPI repository so that we can install it using pip install juxtapose.
  • 2023/08/25 Added custom region of interests (ROI) drawing tools that enables multi ROIs filtering while performing pose estimation/tracking. See usage below.
  • 2023/08/15 Added GroundingDino & YOLOv8 object detector.
  • 2023/08/09 Added keypoints streaming to csv file using csv module.
  • 2023/07/31 Added ByteTrack and BotSORT. Completed engineering effort for top down inferences in any sources. See supported sources below.
  • 2023/06/15 Converted RTMDET (s/m/l) and RTMPOSE (s/m/l) to ONNX using MMDeploy.

👉 Getting Started

Install Using PIP

pip install juxtapose

Note: If you faced any issues, kindly review this github issue

🧀 Local Development

git clone https://github.com/ziqinyeow/juxtapose
pip install .

🤩 Feel The Magic

🌄 Basic Usage

from juxtapose import RTM

# Init a rtm model (including rtmdet, rtmpose, tracker)
model = RTM(
    det="rtmdet-m", # see type hinting
    pose="rtmpose-m", # see type hinting
    tracker="bytetrack", # see type hinting
    device="cpu",  # see type hinting
)

# Inference with directory (all the images and videos in the dir will get inference sequentially)
model("data")

# Inference with image
model("data/football.jpeg", verbose=False) # verbose -> disable terminal printing

# Inference with video
model("data/bike.mp4")

# Inference with the YouTube Source
model("https://www.youtube.com/watch?v=1vYvTbDJuFs&ab_channel=PeterGrant", save=True)

🎨 Select Region of Interests (ROIs)

It will first prompt the user to draw the ROIs, press r to remove the existing ROI drawn. After drawing, press SPACE or ENTER or q to accept the ROI drawn. The model will filter out the bounding boxes based on the ROIs.

😁 Note: Press SPACE again to redraw the bounding boxes. See custom implementation with cv2 here.

from juxtapose import RTM

model = RTM(det="groundingdino", pose="rtmpose-l", tracker="none")
model("data/bike.mp4", roi="rect") # rectangle roi

# 1. Draw ROI first
# 2. Press r or R to reset ROI
# 3. Press SPACE or Enter or q or Q to continue with the ROI

🚴‍♂️ Accessing result for each frame: More Flexibility

# Adding custom plot
import cv2
from juxtapose import RTM, Annotator

model = RTM()
annotator = Annotator(thickness=3, font_color=(128, 128, 128)) # see rtm.utils.plotting

# set show to true -> cv2.imshow the frame (you can use cv2 to plot anything in the frame)
# set plot to false -> if you want to ignore default plot -> see rtm.rtm (line `if plot:`)
for result in model("data/bike.mp4", show=True, plot=False, stream=True):
    # do what ever you want with the data
    im, bboxes, kpts = result.im, result.bboxes, result.kpts

    # e.g custom plot anything using cv2 API
    cv2.putText(
        im, "custom text", (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (128, 128, 128)
    )

    # use the annotator class -> see rtm.utils.plotting
    annotator.draw_bboxes(
        im, bboxes, labels=[f"children_{i}" for i in range(len(bboxes))]
    )
    annotator.draw_kpts(im, kpts, thickness=4)
    annotator.draw_skeletons(im, kpts)

⚽️ Custom Forward Pass: Full Flexibility

# Custom model forward pass
import cv2
import torch
from juxtapose import RTMDet, RTMPose, Annotator

frame = cv2.imread("data/football.jpeg")
device = "cuda" if torch.cuda.is_available() else "cpu"

# s, m, l
rtmdet = RTMDet("l", device=device)
rtmpose = RTMPose("l", device=device)
annotator = Annotator()


bboxes, scores, labels = rtmdet(frame)  # [[x1, y1, x2, y2], ...], [], []
kpts = rtmpose(frame, bboxes=bboxes)  # shape: (number of human, 17, 2)

annotator.draw_bboxes(frame, bboxes, labels=[f"person_{i}" for i in range(len(bboxes))])
annotator.draw_kpts(frame, kpts, thickness=4)
annotator.draw_skeletons(frame, kpts)

cv2.imshow("frame", frame)
cv2.waitKey(0)
cv2.destroyAllWindows()

Supported Sources

Adopted from ultralytics repository -> see https://docs.ultralytics.com/modes/predict/

Source Argument Type Notes
image 'image.jpg' str or Path Single image file.
URL 'https://ultralytics.com/images/bus.jpg' str URL to an image.
screenshot 'screen' str Capture a screenshot.
PIL Image.open('im.jpg') PIL.Image HWC format with RGB channels.
OpenCV cv2.imread('im.jpg') np.ndarray of uint8 (0-255) HWC format with BGR channels.
numpy np.zeros((640,1280,3)) np.ndarray of uint8 (0-255) HWC format with BGR channels.
torch torch.zeros(16,3,320,640) torch.Tensor of float32 (0.0-1.0) BCHW format with RGB channels.
CSV 'sources.csv' str or Path CSV file containing paths to images, videos, or directories.
video 'video.mp4' str or Path Video file in formats like MP4, AVI, etc.
directory 'path/' str or Path Path to a directory containing images or videos.
glob 'path/*.jpg' str Glob pattern to match multiple files. Use the * character as a wildcard.
YouTube 'https://youtu.be/Zgi9g1ksQHc' str URL to a YouTube video.
stream 'rtsp://example.com/media.mp4' str URL for streaming protocols such as RTSP, RTMP, or an IP address.