/ConvoKit

Make SwiftUI apps more accessible + powerful with conversation

Primary LanguageSwift

ConvoKit

ConvoKit Header.png

Goal:

Make SwiftUI apps more accessible + powerful with conversation

Why:

ConvoKit makes apps easier to use for everyone, but especially for people with disabilities.

Idea:

Use Whisper to transcribe spoken natural language requests and tiny LLMs to understand the context behind the request to decide which swift function to call.

The development experience is as simple as adding a macro to the classes with at least one public function that you'd like to expose to the LLM and initializing the framework within your SwiftUI View.

Setup:

@Observable
// Exposes all public functions to ConvoKit
@ConvoAccessible
class ContentViewModel {
        
    var navigationPath = NavigationPath()
    
    public func navigateToHome() {
        navigationPath.append(ViewType.home)
    }
    // Public functions are exposed to ConvoKit
    public func printNumber(number: Int) {
        print(number)
    }
}
// Initializes a view model that can interpret natural language through voice and speak back if you have a backend endpoint
@StateObject var streamer = ConvoStreamer(baseThinkURL: "", baseSpeakURL: "", 
                                          localWhisperURL:  Bundle.main.url(forResource: "ggml-tiny.en", withExtension: "bin")!)

func request(text: String) async {
    // A string that holds all options (the functions you marked as public)
    let options = #GetConvoState
    // LLM decides which function to call here
    if let function = await streamer.llmManager.chooseFunction(text: text, options: options) {
        callFunctionIfNeeded(functionName: function.functionName, args: function.args)
    }
}
    
func callFunctionIfNeeded(functionName: String, args: [String]) {
    // A macro that injects a bunch of if statements to handle the called function
    #ConvoConnector
}

Basic Input + Output:

ConvoKit Example.png

Future Use Cases (ConvoKit isn't at this complexity yet, but this is the vision for it):

YouTube

  1. Open YouTube, say "Find a video that is a SQL databases college level course"
  2. YouTube finds a video based on your spoken words and plays it

Apple Health

  1. Open Apple Health, say "How was my walking this week compared to last week?"
  2. Apple health responds with, "Your overall mobility has declined but your step count is up!"
  3. It then shows graphs explaining this data

Live Simple Example:

RPReplay_Final1704503781.mov