VisionLocator: A Python repository from weijizeal

This project will be reorganized to these links below

core features
- https://github.com/linyeh1129/vloc
optional functions:
- https://github.com/linyeh1129/vloc_plugin_selenium
- https://github.com/linyeh1129/vloc_plugin_playwright

Quick Links

Overview

Features

Repository Structure

Getting Started

Requirements

Installation

Basic Usage

Project Roadmap

Contributing

License

Acknowledgments

Overview

An UI automation testing tool based on YOLO, OCR, Appium, Allure.

Features

Detect UI elements by YOLO.
Query elements by Tesseract.
Interact with UI elements by Appium.

Repository Structure

└── VisionLocator
    └── vision_locator
        ├── detect.py
        ├── detect_attribute.py
        └── remote.py

Getting Started

Requirements

Python: version 3.10
OpenCV: https://pypi.org/project/opencv-python/
ultralytics: https://pypi.org/project/ultralytics/
pytesseract: https://pypi.org/project/pytesseract/
tesseract-ocr: https://github.com/tesseract-ocr/tesseract/
allure-pythome-commons: https://pypi.org/project/allure-python-commons/

Installation

pip install -i https://test.pypi.org/simple/ VisionLocator

Basic Usage

Initiate:

from vision_locator.remote import Remote
Remote.driver = webdriver.Remote(appium_server, capabilities) # initiate Appium web driver.
Remote.windows_size = Remote.driver.get_window_size() # get the Appium window size.
Remote.device_resolution = {'width':1290, 'height':2796} # setup the testing device's real resolution.
Remote.ai_model = 'pre_trained_model_folder' # setup path which saves the .pt files.

Detect element:

from vision_locator.detect import ai_detect
element = ai_detect(label=Label.example,
                    model='best',
                    numbers=1,
                    sort_axis=['y','x'],
                    sort_group=None,
                    timeout=None,
                    show=False)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| sort_axis  | List, sorting the detected elements by order, default=['y', 'x'] |
| sort_group | Int, sorting the elements by group, default=None.                |
| timeout    | Int, default=None (set default timeout).                         |
| show       | Bool, to show the YOLO predict screen, default=False.            |

Detect element by text:

from vision_locator.detect import ai_detect_text
element = ai_detect_text(label=Label.example,
                         model='best',
                         numbers=1,
                         text='example',
                         segment=False,
                         timeout=None,
                         show=False)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| text       | Str | List[Str], regex search text in detected elements.         |
| segment    | Bool, OCR config for multiple lines, default=False.              |
| timeout    | Int, default=None (set default timeout).                         |
| show       | Bool, to show the YOLO predict screen, default=False.            |

Detect element not exist

from vision_locator.detect import ai_detect_not_exist
ai_detect_not_exist(label=Label.example,
                    models='best',
                    numbers=1,
                    delay_start=1,
                    timeout=1)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| delay_start| Int, a waiting time before YOLO predict default=1.               |
| timeout    | Int, default=None (set default timeout).                         |

Slide

from vision_locator.detect import slide_up, slide_down, slide_e2e

| Function   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| slide_up   | Action by Appium to slide up.                                    |
| slide_down | Action by Appium to slide down.                                  |
| slide_e2e  | Action by Appium to slide from element to element.               |

Click, Input, Text

from vision_locator.detect import ai_detect, ai_detect_text

element detected by ai_detect, ai_detect_text can be click(), input(), text()

| Function   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| click      | Action by Appium to click.                                       |
| input      | Action by Appium to send_keys.                                   |
| text       | Action by OCR to query the text inside the element.              |

Project Roadmap

► First release
► Optimize the compatibility of Android
► Improve performance
► Optimize the compatibility of Selenium

Contributing

Contributions are welcome! Here are several ways you can contribute:

Submit Pull Requests: Review open PRs, and submit your own PRs.
Join the Discussions: Share your insights, provide feedback, or ask questions.
Report Issues: Submit bugs found or log feature requests for VisionLocator.

Contributing Guidelines

Fork the Repository: Start by forking the project repository to your GitHub account.
Clone Locally: Clone the forked repository to your local machine using a Git client.
```
git clone https://github.com/linyeh1129/VisionLocator
```
Create a New Branch: Always work on a new branch, giving it a descriptive name.
```
git checkout -b new-feature-x
```
Make Your Changes: Develop and test your changes locally.
Commit Your Changes: Commit with a clear message describing your updates.
```
git commit -m 'Implemented new feature x.'
```
Push to GitHub: Push the changes to your forked repository.
```
git push origin new-feature-x
```
Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.

Once your PR is reviewed and approved, it will be merged into the main branch.

License

This project is protected under the MIT License. For more details, refer to the LICENSE.

Acknowledgments

Appium Document: https://appium.io/docs/en/2.2/
ultralytics: https://www.ultralytics.com/
tesseract-ocr: https://github.com/tesseract-ocr/tesseract

Return

weijizeal/VisionLocator