/VisionLocator

An UI Automator based on Appium, YOLO, OCR

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

This project will be reorganized to these links below

license last-commit repo-top-language repo-language-count



Overview

  • An UI automation testing tool based on YOLO, OCR, Appium, Allure.

Features

  • Detect UI elements by YOLO.
  • Query elements by Tesseract.
  • Interact with UI elements by Appium.

Repository Structure

└── VisionLocator
    └── vision_locator
        ├── detect.py
        ├── detect_attribute.py
        └── remote.py

Getting Started

Requirements

Installation

pip install -i https://test.pypi.org/simple/ VisionLocator


Basic Usage

  • Initiate:
from vision_locator.remote import Remote
Remote.driver = webdriver.Remote(appium_server, capabilities) # initiate Appium web driver.
Remote.windows_size = Remote.driver.get_window_size() # get the Appium window size.
Remote.device_resolution = {'width':1290, 'height':2796} # setup the testing device's real resolution.
Remote.ai_model = 'pre_trained_model_folder' # setup path which saves the .pt files.
  • Detect element:
from vision_locator.detect import ai_detect
element = ai_detect(label=Label.example,
                    model='best',
                    numbers=1,
                    sort_axis=['y','x'],
                    sort_group=None,
                    timeout=None,
                    show=False)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| sort_axis  | List, sorting the detected elements by order, default=['y', 'x'] |
| sort_group | Int, sorting the elements by group, default=None.                |
| timeout    | Int, default=None (set default timeout).                         |
| show       | Bool, to show the YOLO predict screen, default=False.            |
  • Detect element by text:
from vision_locator.detect import ai_detect_text
element = ai_detect_text(label=Label.example,
                         model='best',
                         numbers=1,
                         text='example',
                         segment=False,
                         timeout=None,
                         show=False)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| text       | Str | List[Str], regex search text in detected elements.         |
| segment    | Bool, OCR config for multiple lines, default=False.              |
| timeout    | Int, default=None (set default timeout).                         |
| show       | Bool, to show the YOLO predict screen, default=False.            |
  • Detect element not exist
from vision_locator.detect import ai_detect_not_exist
ai_detect_not_exist(label=Label.example,
                    models='best',
                    numbers=1,
                    delay_start=1,
                    timeout=1)

| Argument   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| label      | Enum, this class should be a mapping of the label's index.       |
| model      | Str, name of the pre-trained file, default='best'.               |
| numbers    | Int, an expected number of YOLO predict, default=1.              |
| delay_start| Int, a waiting time before YOLO predict default=1.               |
| timeout    | Int, default=None (set default timeout).                         |
  • Slide
from vision_locator.detect import slide_up, slide_down, slide_e2e

| Function   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| slide_up   | Action by Appium to slide up.                                    |
| slide_down | Action by Appium to slide down.                                  |
| slide_e2e  | Action by Appium to slide from element to element.               |
  • Click, Input, Text
from vision_locator.detect import ai_detect, ai_detect_text

element detected by ai_detect, ai_detect_text can be click(), input(), text()

| Function   | Description                                                      |
| ---------- | ---------------------------------------------------------------- |
| click      | Action by Appium to click.                                       |
| input      | Action by Appium to send_keys.                                   |
| text       | Action by OCR to query the text inside the element.              |

Project Roadmap

  • ► First release
  • ► Optimize the compatibility of Android
  • ► Improve performance
  • ► Optimize the compatibility of Selenium

Contributing

Contributions are welcome! Here are several ways you can contribute:

Contributing Guidelines
  1. Fork the Repository: Start by forking the project repository to your GitHub account.
  2. Clone Locally: Clone the forked repository to your local machine using a Git client.
    git clone https://github.com/linyeh1129/VisionLocator
  3. Create a New Branch: Always work on a new branch, giving it a descriptive name.
    git checkout -b new-feature-x
  4. Make Your Changes: Develop and test your changes locally.
  5. Commit Your Changes: Commit with a clear message describing your updates.
    git commit -m 'Implemented new feature x.'
  6. Push to GitHub: Push the changes to your forked repository.
    git push origin new-feature-x
  7. Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.

Once your PR is reviewed and approved, it will be merged into the main branch.


License

This project is protected under the MIT License. For more details, refer to the LICENSE.


Acknowledgments

Return