This repository is a fork from Ultralytics' repository for YOLOv5 open-source object detection method. Please check original repo for up-to-date code, information on pretrained checkpoints, and tutorials on how to reproduce their environment or train with custom data. Main changes are a focus on inference only and ensuring it works in Windows 10.
YOLOv5 provides SOTA realtime object detection. For more information see the Roboflow's blog post and guide to train it on a custom dataset. This HackerNews thread provides additional insights and potential naming controversy.
This repo was tested in Windows 10 Home (Version 2004) with Nvidia GeForce RTX 2060.
Credit to Joseph Redmon for YOLO: https://pjreddie.com/darknet/yolo/.
The release of YOLOv5 includes four different models sizes: YOLOv5s (smallest), YOLOv5m, YOLOv5l, YOLOv5x (largest).
Ensure you have downloaded the pretrained checkpoints (GDrive link) or from the original repo links.
This code does not automatically download any models, please download the pretrained checkpoints from link. There's a tradeoff between AP and latency: larger models are more precise but exhibit slower inference.
Setting up a virtual environment is recommended.
With Python 3.7 or later and CUDA 10.1 on Windows 10, install torch==1.5
and torchvision==0.6
. To install run:
$ pip install torch==1.5.0+cu101 torchvision==0.6.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
Once that, install all requirements.txt
dependencies. Run the following:
$ pip install -U -r requirements.txt
Inference can be run on images or videos but also directly from webcam, rtsp or http streaming links. Results are saved to ./inference/output
.
$ python detect.py --source file.jpg # image
file.mp4 # video
./dir # directory
0 # webcam
rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa # rtsp stream
http://qthttp.apple.com.edgesuite.net/1010qwoeiuryfg/sl.m3u8 # http stream
To run inference on examples in the ./inference/images
folder:
$ python detect.py --source ./inference/images/ --weights yolov5x.pt --conf 0.4
Namespace(agnostic_nms=False, augment=False, classes=None, conf_thres=0.4, device='', fourcc='mp4v', half=False, img_size=640, iou_thres=0.5, output='inference/output', save_txt=False, source='./inference/images/', view_img=False, weights='yolov5x.pt')
Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2060', total_memory=6144MB)
image 1/5 inference\images\bus.jpg: 640x512 4 persons, 1 buss, Done. (0.072s)
image 2/5 inference\images\dog.jpg: 448x640 1 persons, 1 dogs, 1 couchs, Done. (0.057s)
image 3/5 inference\images\fashion.jpg: 640x448 2 persons, 2 handbags, Done. (0.058s)
image 4/5 inference\images\tokyo.jpg: 512x640 6 persons, 3 handbags, 3 ties, 1 books, Done. (0.056s)
image 5/5 inference\images\zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.047s)
Results saved to \inference\output