huggingface/tflite-android-transformers

Has the demo gif for text generation been sped up?

farazk86 opened this issue · 1 comments

Hi,

I cannot achieve the speed demonstrated in the gif: https://github.com/huggingface/tflite-android-transformers/tree/master/gpt2

It takes about 7 seconds to generate a single word on my build. I am even using gpuDelegate to run interpreter on GPU rather than CPU and its still slower.

Has the gif been sped up? am I the only one having this poor performance?

Thanks

Hi @farazk86,

I know that this message (and repo) is rather old but I'm testing this demo and struggle to find a way to make it working with GPU delegate. Would you mind sharing what you did ?

My understanding is that the model is not adapted to run on GPU but I can't even start the app without crash, so I'm curious to know how you did it. Without that modification below, the app runs perfectly and outputs about 1 word/sec.

If anyone else has insights about that, I would be really grateful as well (@Pierrci ? @sayakpaul ?). Sorry if that's a very noob question !

I had some difficulties related to gradle / TF version but now I can build a valid APK supporting GPU with the following modifs :

GPT2Client.kt

            import org.tensorflow.lite.gpu.CompatibilityList
            import org.tensorflow.lite.gpu.GpuDelegate
            .......
            //val opts = Interpreter.Options()
            //opts.setNumThreads(NUM_LITE_THREADS)

            val compatList = CompatibilityList()

            val opts = Interpreter.Options().apply{
                if(compatList.isDelegateSupportedOnThisDevice){
                    // if the device has a supported GPU, add the GPU delegate
                    val delegateOptions = compatList.bestOptionsForThisDevice
                    this.addDelegate(GpuDelegate(delegateOptions))
                } else {
                    // if the GPU is not supported, run on 4 threads
                    this.setNumThreads(NUM_LITE_THREADS)
                }
            }

and of course adding in build.gradle

    implementation 'org.tensorflow:tensorflow-lite:2.5.0'
    implementation 'org.tensorflow:tensorflow-lite-gpu:2.5.0'

But when I run the app it crashes on startup with the following error (tflite 2.3)

12-12 10:55:56.204  3214  3214 D Launcher: onStop
12-12 10:55:56.241 15950 15956 I zygote64: Do partial code cache collection, code=59KB, data=38KB
12-12 10:55:56.241 15950 15956 I zygote64: After code cache collection, code=57KB, data=37KB
12-12 10:55:56.241 15950 15956 I zygote64: Increasing code cache capacity to 256KB
12-12 10:55:56.461 15950 15969 D libGLESv3: Successfully load libGLESv2_oneplus.so, this=0x7581a5c008
12-12 10:55:56.463 15950 15969 I tflite  : Created TensorFlow Lite delegate for GPU.
12-12 10:55:56.466 15950 15969 I tflite  : Initialized TensorFlow Lite runtime.
12-12 10:55:56.477 15950 15969 I tflite  : Created 0 GPU delegate kernels.
12-12 10:16:41.414  8335  8335 E AndroidRuntime: FATAL EXCEPTION: main
12-12 10:16:41.414  8335  8335 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 8335
12-12 10:16:41.414  8335  8335 E AndroidRuntime: java.lang.IllegalArgumentException: ByteBuffer is not a valid flatbuffer model
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.createModelWithBuffer(Native Method)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:60)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:224)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:137)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 10:16:41.414  8335  8335 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: FATAL EXCEPTION: main
12-12 10:55:56.488 15950 15950 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 15950
12-12 10:55:56.488 15950 15950 E AndroidRuntime: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Following operations are not supported by GPU delegate:
12-12 10:55:56.488 15950 15950 E AndroidRuntime: DEQUANTIZE: 
12-12 10:55:56.488 15950 15950 E AndroidRuntime: DIV: Op can only handle 1 or 2 operand(s).
12-12 10:55:56.488 15950 15950 E AndroidRuntime: GATHER: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: PACK: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: POW: Op can only handle 1 or 2 operand(s).
12-12 10:55:56.488 15950 15950 E AndroidRuntime: SPLIT: Operation is not supported.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 106 operations will run on the GPU, and the remaining 2317 operations will run on the CPU.
12-12 10:55:56.488 15950 15950 E AndroidRuntime: TfLiteGpuDelegate Init: SLICE: Output batch don't match
12-12 10:55:56.488 15950 15950 E AndroidRuntime: TfLiteGpuDelegate Prepare: delegate is not initialized
12-12 10:55:56.488 15950 15950 E AndroidRuntime: Node nu
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:351)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:82)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:266)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:155)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 10:55:56.488 15950 15950 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 10:55:56.494 15950 15979 D OSTracker: OS Event: crash
12-12 10:55:56.496  1222  2239 W ActivityManager:   Force finishing activity co.huggingface.android_transformers.gpt2/.MainActivity
12-12 10:55:56.498  1222  1748 I ActivityManager: Showing crash dialog for package co.huggingface.android_transformers.gpt2 u0
12-12 10:55:56.502  1222  1747 D RestartProcessManager: Duration is too short, ignore : 696 in co.huggingface.android_transformers.gpt2

With tflite 2.4 it's a bit different :

12-12 11:08:18.914 17407 17426 I tflite  : Created TensorFlow Lite delegate for GPU.
12-12 11:08:18.917 17407 17426 I tflite  : Initialized TensorFlow Lite runtime.
12-12 11:08:18.928 17407 17426 I tflite  : Created 0 GPU delegate kernels.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: FATAL EXCEPTION: main
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Process: co.huggingface.android_transformers.gpt2, PID: 17407
12-12 11:08:18.959 17407 17407 E AndroidRuntime: java.lang.IllegalArgumentException: Internal error: Failed to apply delegate: Following operations are not supported by GPU delegate:
12-12 11:08:18.959 17407 17407 E AndroidRuntime: DEQUANTIZE: 
12-12 11:08:18.959 17407 17407 E AndroidRuntime: GATHER: Operation is not supported.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: MEAN: Mean operation supports only HW plane
12-12 11:08:18.959 17407 17407 E AndroidRuntime: SPLIT: Operation is not supported.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 147 operations will run on the GPU, and the remaining 2276 operations will run on the CPU.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: TfLiteGpuDelegate Init: Tensor "Identity_8" has bad input dims size: 5.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: TfLiteGpuDelegate Prepare: delegate is not initialized
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Node number 2423 (TfLiteGpuDelegateV2) failed to prepare.
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 
12-12 11:08:18.959 17407 17407 E AndroidRuntime: Restored
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegate(Native Method)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.applyDelegates(NativeInterpreterWrapper.java:367)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.init(NativeInterpreterWrapper.java:85)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at org.tensorflow.lite.NativeInterpreterWrapper.<init>(NativeInterpreterWrapper.java:63)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at org.tensorflow.lite.Interpreter.<init>(Interpreter.java:277)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at co.huggingface.android_transformers.gpt2.ml.GPT2Client$loadModel$2.invokeSuspend(GPT2Client.kt:155)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at kotlinx.coroutines.DispatchedTask.run(Dispatched.kt:241)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:594)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler.access$runSafely(CoroutineScheduler.kt:60)
12-12 11:08:18.959 17407 17407 E AndroidRuntime: 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:740)
12-12 11:08:18.966 17407 17436 D OSTracker: OS Event: crash
12-12 11:08:18.967  1222  3147 W ActivityManager:   Force finishing activity co.huggingface.android_transformers.gpt2/.MainActivity