llama-pinyinIME

Android Pinyin IME for port of Facebook's LLaMA model in C/C++

Demo:

demo.mp4

Background
Install
Usage
Examples
Related Efforts
Maintainers
Contributing
License

Background

llama-jni implements further encapsulation of common functions in llama.cpp with JNI, enabling direct use of large language models (LLM) stored locally in mobile applications on Android devices.

llama-pinyinIME is a typical use case of llama-jni.

By adding an input field component to the Google Pinyin IME, llama-pinyinIME provides a localized AI-assisted input service based on a LLM without the need for internet connectivity.

The goals of llama-pinyinIME include:

Developing new features for the Google Pinyin IME so that users can enter tasks that require a LLM to complete in the input field of the IME. After submission, users can observe the streaming printout of the results in the actual target input position.
Calling the built-in Prompt feature in the input field to allow users to quickly switch and use Prompt text stored in local files for performing tasks such as translation, grammar correction, and general Q&A.
Supporting input mode switching in the input field. In addition to general input and local LLM input, llama-pinyinIME also provides an internet-assisted input mode based on the Openai API and supports custom local Prompt text.
Providing support for various models and input parameters related to llama.cpp in the application settings of llama-pinyinIME.
Optimizing necessary adaptation features in the Google Pinyin IME, such as Chinese input, cursor control, candidate words, etc.

Install

llama-pinyinIME supports the creation of an Android application package (APK). The packaging process requires the support of NDK and CMake. The relevant tool configuration information is as follows.

apply plugin: 'com.android.application'

android {
    // 和 libs/android.jar 保持一致
    compileSdkVersion 25
    buildToolsVersion "26.0.2"

    aaptOptions {
        noCompress 'dat'
    }

    defaultConfig {
        minSdkVersion 23
        // 和 libs/android.jar 保持一致
        // noinspection ExpiredTargetSdkVersion
        targetSdkVersion 25
        versionCode 1
        versionName "1.0.0"
        applicationId "com.sx.llama.pinyinime"

        externalNativeBuild {
            cmake {
                cppFlags "-Wall"
                abiFilters 'armeabi-v7a', 'arm64-v8a', 'x86', 'x86_64'
            }
        }
    }

    buildTypes {
        release {
            minifyEnabled false
            proguardFiles getDefaultProguardFile('proguard-android.txt'), 'proguard-rules.txt'
        }
    }

    externalNativeBuild {
        cmake {
            path 'src/main/cpp/CMakeLists.txt'
            version '3.22.1'
        }
    }

    lintOptions {
        checkReleaseBuilds false
        // Or, if you prefer, you can continue to check for errors in release builds,
        // but continue the build even when errors are found:
        abortOnError false
    }

    compileOptions {
        sourceCompatibility JavaVersion.VERSION_1_8
        targetCompatibility JavaVersion.VERSION_1_8
    }

    android.applicationVariants.all { variant ->
        variant.outputs.all {
            outputFileName = "ST_Pinyin_V${defaultConfig.versionName}.apk"
        }
    }
}

dependencies {
    testImplementation 'junit:junit:4.13.2'
    androidTestImplementation 'androidx.test.ext:junit:1.1.5'
    implementation 'com.alibaba:fastjson:1.2.83'
    implementation 'cz.msebera.android:httpclient:4.5.8'
    provided files('libs/android.jar')
}

Usage

Preparations

llama-pinyinIME does not contain model files. Please prepare your own LLM, which need to be supported by the specified version of llama.cpp.

The necessary LLM (e.g. GPT4All) need to be stored in a dedicated folder for mobile applications on the Android external storage device, assuming the path is

/storage/emulated/0/Android/data/com.sx.llama.pinyinime/ggml-vic7b-q5_0.bin

Then the following piece of code in PinyinIME.java needs to correspond to its filename.

private String modelName = "ggml-vic7b-q5_0.bin";

The dedicated folder also needs to store the necessary Prompt text file (e.g. 1.txt), assuming the path is

/storage/emulated/0/Android/data/com.sx.llama.pinyinime/1.txt

If you need to use the Openai API for networking, the following code in PinyinIME.java needs to be filled with the available API Key before creating the APK:

private String openaiKey = "YOUR_OPENAI_API_KEY";

Run

Select AVD in Android Studio and click on the Run icon . After the necessary system settings, users can quickly adjust the running mode of llama-pinyinIME with the button on the left side of the input field.

Note: The subsequent demonstration is based on a virtual simulator with 12GB of RAM, while the test results based on real physical devices show that the inference speed of the existing hardware is far from the application standard. llama-pinyinIME can only be used as a validation prototype of the technical route, and the completion level is for reference only.

Examples

General input mode

The general usage of llama-pinyinIME is similar to other IME on Android devices, which also support Chinese, English and punctuation input, and the mobile application built on this basis can also be installed and used on real physical devices.

mode-close.mp4

Local LLM input mode

Based on the Prompt text file stored in a folder dedicated to mobile applications on the Android external storage device, the user simply enters the text content in the llama-pinyinIME input field preceded by its filename + space (default 1.txt if nothing is added), clicks the submit icon on the far left and is ready to use.

In fact, users can customize as many Prompt text files as they want to cope with a variety of input reasoning tasks and scenarios by calling the equivalent llama.cpp command after the corresponding file name as

./main -m "/storage/emulated/0/Android/data/com.sx.llama.pinyinime/ggml-vic7b-q5_0.bin" -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f "/storage/emulated/0/Android/data/com.sx.llama.pinyinime/1.txt"
./main -m "/storage/emulated/0/Android/data/com.sx.llama.pinyinime/ggml-vic7b-q5_0.bin" -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f "/storage/emulated/0/Android/data/com.sx.llama.pinyinime/2.txt"

Taking the syntax correction task of 1.txt and the translation task of 2.txt and as an example, the actual printing effect of llama-pinyinIME is as follows

mode-local-1.mp4

mode-local-2.mp4

Openai API assisted input mode

This mode has reached a usable level of response due to the text inference left to the Openai API, and also supports direct calls to local Prompt text files.

mode-cloud-zh.mp4

mode-cloud-en.mp4

Related Efforts

LLaMA — Inference code for LLaMA models.
llama.cpp — Port of Facebook's LLaMA model in C/C++.
llama-jni — Android JNI for port of Facebook's LLaMA model in C/C++.

Maintainers

@shixiangcap

Contributing

Feel free to dive in! Open an issue or submit PRs.

Contributors

This project exists thanks to all the people who contribute.

miscdec/llama-pinyinIME