Team from the Defence Science and Technology Laboratory (Dstl) in the UK.
The challenge we were tackling was the Small UAV Visual Navigation problem, which is to develop an alternative approach to autonomous UAV navigation in a GPS denied environment, for example where there is GPS jamming.
Our solution is to use the camera available on the drone to georeference the drones current location to a reference map (for example satellite imagery). This will give an (approximate) location, which can be used in place of GPS. Whilst it won't have the same accuracy as GPS, it should provide enough accuracy to navigate back to a safe space or even complete wide area reconnaisance missions.
The georeferencing uses a Siamese neural network model to enable georeferencing between images taken at different angles, at different times of day/year, and zoom levels. The model was trained on the LandCover.ai dataset. This produces a number of predictions and probabilities, which can be combined with previously known positions to help choose the most likely location.
One key advantage of this approach is that there is no error propagation between iterations, so should the model prediction produce a bad location, this will not negatively affect future predictions.
The pseudo code below outlines our solution workflow.
# Use our start location (i.e. where we lost GPS signal) as our initial starting point
location = start_location
# Keep track of previous locations
previous_locations = [location]
# Whilst we have new data, retrieve the latest image and sensor data (e.g. altimeter, yaw, pitch, roll)
while image, drone_metadata = get_latest_data():
# Get reference mapping for a reasonable area around our last known location
map = get_reference_map(location, search_radius)
# This step is a future improvement, to project the image from the camera into an orthogonal (i.e. overhead) image
# image = correct_persepective(image, drone_metadata)
# Use the neural network to determine the location, weighting based on the last 5 locations
image_location = determine_location_nn(image, map, previous_locations[-5:])
# Use sensor data (e.g. sensor direction) to calculate offset of the image compared to drone and account for this
drone_location = calculate_offset(image_location, drone_metadata)
# Update list of previous locations
location = drone_location
previous_locations.append(location)
The following are a list of caveats and known limitations to our approach:
- Our approach is likely to work best on "feature rich" areas where there are lots of features within the imagery to match to the reference map. This could be roads, buildings, waterways, field boundaries or any other unique feature. Where the image is very generic (e.g. a large empty field, or a single straight road with no other features) then the approach will struggle. However, as errors don't propagate with our approach, as soon as there are discernible features the approach will be able to locate the drone again regardless of the loss of location.
- Our solution is currently limited by the amount of time and data we had available to train the model. We have provided the code to train new models, and with more time and some targeted data collection this would likely significantly improve performance.
The following is a list of ideas for improving the approach further in the future:
- Rectification of images (i.e. correcting for perspective) prior to feeding them into the neural network - see
orthorectification
folder here. - Improving the training data to include a wider range of perspectives and weather conditions
- Model is currently only trained on aerial imagery, as there was no suitable dataset of drone imagery linked to aerial imagery. If this data set can be found or built, then this would likely significantly improve the model quality. The model training code would need some small changes to support this. This is described here in the
mapping
folder. - Context injection as described in it's folder to supplement the visual information with input from the other sensors on the drone.
- With increased training data and time a Vision transformer could be trained for this task instead of the siamese network; this is how the state of the art methods approach the problem. This would however increase the hardware requirements beyond what is likely feasible for a drone.
Below are some of the other approaches that we tried during the Hackathon, that were abandoned either due to time constraints or deciding they weren't suitable for the problem.
We looked at trying to simplify the images by segmenting them into land cover usage (road, water, buildings, other). A write up of this approach can be seen in the segmentation
folder here.
We briefly looked at getting some existing Visual Inertial Navigation and SLAM tools running, with the idea of feeding our data in and seeing what we could get out of them. However, they proved difficult to get working on our laptops and once we had it wasn't easy to feed our data into them tools so we decided to abandon this approach. Some details of these methods can be found in the vio_and_slam
folder here.