nchudleigh/vimac

Linux port using a neural net

Opened this issue · 3 comments

joihn commented

I would be interested in a linux port.
Since "accessibility API" (uses to retrieve button location) are not available on linux, one needs something else to gather the buttons location.

I'm thinking about building a small neural net performing object detection, to retrieve the button locations.
Recent advance have made lightweight but CPU run network possible, with an acceptable detection performance and input to output delay.
example

Alternatively, since bounding box are not really needed (only 1 coordinate / button is needed), a segmentation neural net trained to output a heatmap of button location could be another approach/
kinda like this paper (ignoring the segmentation part of course)

The most challenging aspect would be collecting a good dataset of various GUI with labeled buttons.
Web scrapping and HTML parsing could be done to find the button location, giving a big dataset for cheap.

However one would only have "web looking" button, and no "desktop looking" button.
One could use MacOs GUI + accessibilty API to further diversify the dataset.

The advantage of such an approach would be that such tool "should" be compatible with all apps out of the box.
What are your thought on such an approach ?

joihn commented

relevant link, haven't tested yet :
https://github.com/phil294/vimium-everywhere

Last year I made this https://github.com/garywill/vimouse
Uses opencv to do vision recognition based click
1
2
The screenshot may seem ugly right now. The algorithm and parameters may need changing. Haven implemented any AI. Currently it just finds any "object" on screen (at least almost every button found lol)

It is in very very very early stage.

Cross-platform & lightweight. I made that in 300 lines python code.

BTW, I listed many similar projects in that readme