Resources + Ideas Thread for Vision
JosiahMendes opened this issue · 7 comments
Just a thread for project management and recording how much time was spent for documentation in report.
Also to save ideas and figure out what functional and non-functional requirements there are for Vision.
Useful Links:
- Provided Example for Camera: https://github.com/edstott/EEE2Rover.
- Video output from example: https://youtu.be/PopaFoKaFRg
Functional Requirements
- Ping Pong Ball detection (colour based) see 46 on Piazza
- Terrain boundary detection, use optical sensor to map an area defined by command
- Sending Commands to Drive (need more info on interfacing)
- Receive commands from Command via ESP32?? (Need to clarify with Command and Control)
Project Log 1
Spent 4/5 hours, just today trying to run the example project. Eventually managed to run it and get some footage to see what the output looks like. Uploaded Demo SOF and ELF in commit 1c007c0
Lessons learnt:
- Only do compilation on EE servers if you have to, writing code and stuff can mostly be done with Quartus Lite
- For Linux: Use
scp
to transfer files from remote server to local machine for flashing on DE10 e.g:scp D8M_Camera_Test.elf josiahmendes@129.31.246.43:~/remoteDir
Find out IP address by runningip addr
- Use the mac for Remote Quartus, use the linux vm for transferring files between. Connect both to the VPN, make sure vm is on bridge mode so that it has a different IP address.
- Maybe go to uni some time for faster compilation, this small project took 5 mins for each compile, not including the laggy interface.
Ideas from Chat 14/05
"Ping Pong balls are considered obstacles and that by the end we should be able to detect the balls by their colors and the distance from their size in pixels and do the rover do "stuff" accordingly"
"slightly confused on what vision is responsible for, I'm guessing there's some autonomous driving, but doesn't the movement commands also come from the app? Is vision meant to handle processing those commands as well?"
Project Log 2
Spent quite a while trying to install Quartus 16.1 locally (3-4 hours wasted), following on from Tianyi's post on Piazza, can confirm that compilation works on Linux as well. Can't use 20.1 because of outdated components, but wonder if I can use the updated IP components from 20.1, performance might be better.
Also added project spec as a wiki page in GitHub for easy reference. The project spec says that the rover should be autonomous which I assume is a vision/command spec. Also says that the rover should be able to build a map of its local working area, so vision needs to send data back about obstacles as well.
Key Questions:
- Data protocols? How do we communicate with the drive subsystem and esp32?
- How do we move from a picture to depth sensing - figuring out distances?
- How do we redirect UART comms to the ESP32?
- Compilation on local is pretty slow, need to figure out a fast CI workflow
Project Log 3 17/05 am
Thoughts on Q46 on Piazza
Mapping
As the drive submodule can determine the location of the rover using the optical flow sensor , mapping could work by user sending a command through web/app telling it to cover a certain area, drive/command would then send the commands to cover all that area, and then the vision module could detect the objects and send messages saying that an object has been detected, and add them to the map (stored where? either in command or locally). This would be a high level of autonomy. Use mapped balls as a reference for other distances.
Object Detection
Seems like only ball detection is necessary. Wonder if we could combine with an ultrasonic sensor to have better depth perception for better object detection and avoidance of non ball objects? Still not completely sure why we need colour detection, but I guess it could be used as extra info on the map, - this object is green, this object is pink etc...
Ed uploaded an updated version of the rover example with exposure and autofocus controls as well, will try with those and upload new sofs and elfs today. EDIT: Done, 3348984
Commands and Info on Hardware
- KEY1 - Located on board, controls crop/zoom, not sure if digital or physical
- KEY0 - Located on board, toggles autofocus
- SW0 - Toggles between camera and bounding boxes
- HEX0 HEX1 show the FPS of the camera
Commands to be entered in nios2-terminal
- 'e' & 'd' - increases and decreases exposure respectively 0-2200
- 't' & 'g' - increases and decreases gain (?) respectively range is 0-800
- 'r' & 'f' - manual focus increase and decrease respectively`
Project Log 4 18/05
Worked today on understanding potential methods for object detection. Two main schools of thought - 1 with just scanning all the pixels and trying to work something out, 2 using neural networks and machine learning to detect objects. 2nd method lends itself better to hardware, so have been watching series on Neural Networks from 3Blue1Brown - https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
Also for translating from software neural networks to hardware - https://www.youtube.com/watch?v=Qgjawf20v7Y&list=PLGzeDuLmmxDpEsCAjf_sYrMC6p-Y0Ummk&index=3 Very useful and relevant as they are also doing image processing.
Things to Figure Out
- What parameters to use for object detection, pixel by pixel or rgb etc
- pixel by pixel will probably give best accuracy, but might be slow and required too many MACs, so may need to do some compression to reduce size.
- RGB would work well depending on the test environment, speed wise should be quick, but may be inaccurate.
- above is purely speculative, need to do more research
- Obtain test footage from @GeorgiosChaimalas with rover, to test and train neural network without hardware implementation first.
Project Log 5 20/05 am - Chat with Ed
How to send data from fpga to esp32? Full live video, need to do video encoding?
Very difficult to do this, unless we significantly reduce frame rate, takes 8 seconds to send 1 frame of the video, might have to scrap this idea, but still looking into libraries, wonder if we can dedicate a core to doing video processing on the ESP32.
Can we reduce fps/resolution to get more time for processing locally?
Resolution can be controlled by ignoring pixels, but is also quite complicated. , frame rate can be controlled by skipping frames selectively while just running the camera at 60 fps.
Can we run image processing blocks in parallel?
With the streaming pipeline need to keep up with the data coming in, governed by video input and video output. Recommend is that we design video processing so it can process 1 pixel every cycle. Buffer it so that we can do compare, store in register, needs to accept 1 pixel per cycle, at least, with the current setup.
How to process pixels in the order that they arrive, what is the order, how long does it take?
Follows a raster pattern, like reading a book in Roman text. Output clock is 25Mhz, input clock is running at a different speed, blanking period is caused by camera scanning 1 by 1. Happens in bursts, the video streaming pipeline is controlled at input by the camera and at output by the VGA output.
Project Log 5 21/05
Worked on SPI comms between FPGA and ESP32, testing using Arduino for now, will check with Raghav when ready.
Neural Network Understanding
A basic NN would work pixel by pixel basis and just use the colours to detect balls, this is what is described in Prof Marco Winzker's Introductory Videos on Machine Learning on FPGAs. Source Code is also provided in VHDL
A CNN - convolutional neural network works by convolving parts of the image with a matrix. This is useful in detecting lines/shapes in images as it convolves the images with matrices, and hence needs top and bottom pixels to compare (would require storing top and bottom pixels in a register). This is obviously the best option disregarding compute power.
Wonder if we can combine the Circle Hough Transform and the basic NN for colour detection to recognise areas. But again, requires storing multiple pixels.
The challenge in the Computer Vision is recognising the shape of the circle. Colour is quite easy as we can just check for pixels and their rgb value, but shapes are difficult. (Let me know if anyone has any ideas?) Shape is important as for the dark balls, if classification is by colour alone, the background colour may contain colours similars to the ball and hence make ball detection difficult.