WALDO V1.0

Whereabouts Ascertainment for Low-lying Detectable Objects (get it?)

What is this?

WALDO is a trained detection (bounding-box) deep neural network to enable overhead detection of land-based objects! These are the current detection classes:

1 --> car
2 --> van
3 --> truck
4 --> building
5 --> human
6 --> gastank
7 --> digger
8 --> container
9 --> bus
10 --> pylon
11 --> boat
12 --> bike \

This AI system is primarily designed to be used for ground-risk mitigation for large flying objects traveling over populated areas, but it can also be useful for all sorts of other things like "search and rescue"-type operations, disaster relief etc... it's up to you!

If you need ground-risk segmentation instead of object detection make sure to check out OpenLander here: https://github.com/stephansturges/OpenLander/

How well does it work?

Check out the video below for a high-level idea of the performance of the default 960px full-size model. \

Multiple other versions of the model are coming very soon, including (of course!) versions that are optimized to be embedded for real-time ground-risk mitigation.

Some detection metrics from training of the yolov7 NOMS 960px model (20230420_yolov7_12class_960px_noms) :

An alternative yolov7 model (20230502_yolov7_12class_960px_noms_adam_30b):

Confusion matrix of the yolov7x NOMS 960px:

Confusion matrix of the yolov7_e6e NOMS 960px (largest model):

Confusion matrix of the yolov7_tiny NOMS 960px (smallest model):

As you can see classes that are close like "car" and "van" suffer from some confusion. Same goes for things where you often require more context to understand the object like "truck" / "container" / "gastank"... and some classes require more data / just more training to get better. Feel free to donate with the Ko-Fi link to help me make it better!

Is it free?

Yes, 100% free-as-in-beer. It comes with no warranties, but you can do whatever you want with it. See license below.

Can you help me deploy it / make a version that detects X / make a version for me?

Sure, send me an email. I do lots of commercial perception stuff for UAV and other things.

How can I use this?

There are many ways you can use WALDO!

example 1: real-time inference!

Download the model weights and run with yolov7 straight from the command-line like this:
python3 detect.py --weights best.pt --img-size 960 --save-txt --source /your_frames/ --project /your_save_folder/

example 2: run on a single image!

Open the ONNX model from the /ONNX format with opencv and run your inference from there. Just take a look at the sample script run_local_onnx_boilerplate.py provided in the repo. If you have an Nvidia GPU set "cuda" in there to True, and point it at your files. If you're missing any dependencies you can use the provided requirements.txt to set them up... and that's all there is to it!

example 3: run on a very very large image (like earth observation stuff):

Use the provided script called run_local_onnx_largeinput_tiled_process.py Set "CUDA" to "True" if you have an Nvidia GPU, and point it to the image that you want to tile and process by changing this line: img = cv2.imread('./Columbus_COWC_1.png')

The script will spit out a re-built full-size image (below are crops):

....along with a count of objects to the console: \

example 4: export your own ONNX model with your favorite settings!

Load the .pt model and convert it to ONNX with your own settings using yolov7 (something like export.py --weights /best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.45 --conf-thres 0.3 --img-size 960 960 --max-wh 960 ) etc... read the export docs to figure out what you need.

Alternatively you can use the included ONNX model and deploy it wherever you want !

Why?

In the UAV space there is a need for FOSS AI tools that are usable by all for safety and ground risk mitigation! Additionally the space needs a reference system to serve as a benchmark for private-source comparables when defining security use cases with regulation authorities such as the FAA or EASA.

What is next

Over the coming weeks I'm going to update this repository with variants of the main model optmized for specific use cases, ~~along with some boilerplate code for deployment!~~ DONE!

Support this project :)

If you'd like to help me make this better please consider donating a few $$$ to keep my GPUs running using Ko-Fi:

Credits:

All code written by me and GPT4 😄

Thanks to https://gdo152.llnl.gov/cowc/ for the satellite image of Columbus, Ohio used in the tiling demo.

Copyright is MIT license

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

HavocLabs/WALDO