continuedev/continue

Auto complete seems does not work

Opened this issue · 3 comments

Before submitting your bug report

Relevant environment info

- OS: MacOS Tahoe  
- Continue version: 1.2.4
- IDE version: 1.104.1 
- Model: Gemini-2.5 Flash
- config:
  
name: Local Agent
version: 1.0.0
schema: v1
models:
  - name: OpenRouter Gemini 2.5 Flash
    provider: openrouter
    model: google/gemini-2.5-flash
    roles:
      - chat
      - edit
      - apply
      - autocomplete
    apiBase: https://openrouter.ai/api/v1
  
  OR link to agent in Continue hub:

Description

Hi there.
I am new to here. I use OpenRouter as the backend then the chat and edit works fine, but the autocomplete does not pop out the candidate code even from Continue Debug Console it shows Gemini has already replied code, as shown in the following image:

Image

I've tried reccommed troubleshooting approchs that mentioned in the doc but it does not help. Hope someone could help me. Thank in advance.

To reproduce

No response

Log output

And I also tried to use Continue in Jetbrains Pycharm and it does not work either. Here is the log:

Code: undefined
Error number: undefined
Syscall: undefined
Type: aborted

wze: The operation was aborted.
    at I (/snapshot/continue/binary/out/index.js:8055:16904)
    at AbortSignal.u (/snapshot/continue/binary/out/index.js:8055:17091)
    at [nodejs.internal.kHybridDispatch] (node:internal/event_target:645:20)
    at AbortSignal.dispatchEvent (node:internal/event_target:587:26)
    at abortSignal (node:internal/abort_controller:292:10)
    at AbortController.abort (node:internal/abort_controller:322:5)
    at suA.cancel (/snapshot/continue/binary/out/index.js:8456:2845)
    at auA._createListenableGenerator (/snapshot/continue/binary/out/index.js:8456:3583)
    at auA.getGenerator (/snapshot/continue/binary/out/index.js:8456:4055)
    at getGenerator.next (<anonymous>)

i've encountered the similar situation. After a few debug running in jetbrain plugins, i found that it's the shared config between roles, contextLength and maxTokens in defaultCompletionOptions config causing a big latency when triggering autocomplete, which should have been carrying less context and responding way faster than other chat/edit role. Well, in my case it is, at least. So, i'd recommend u setting up an independent model named like 'Gemini AutoComplete Only' with a single role 'autocomplete' and smaller contextLength(, such as 1024), maxTokens(, 512), which should be enough context input and code result output for autocomplete scenario. demo config is like below, hope it helps🙂:

  - name: DeepSeek Autocomplete
    provider: deepseek
    model: deepseek-chat
    apiBase: https://api.deepseek.com
    apiKey: $MY_API_KEY
    useLegacyCompletionsEndpoint: false
    roles:
      - autocomplete
    defaultCompletionOptions:
      temperature: 0
      stream: true
      contextLength: 1024
      maxTokens: 512
  - name: DeepSeek Common
    provider: deepseek
    model: deepseek-chat
    apiBase: https://api.deepseek.com
    apiKey: $MY_API_KEY
    useLegacyCompletionsEndpoint: false
    roles:
      - chat
      - edit
      - summarize
    defaultCompletionOptions:
      temperature: 0
      stream: true
      contextLength: 131072
      maxTokens: 8192

@SoLoHiC I see. Thank you very much and I'll check it out later. :D

LFd3v commented

Just to add more info to what @SoLoHiC reported, it seems that Continue has a low tolerance to high latency. When using Ollama for completion locally, when editing/deleting text, usually I get a bunch of these errors in Ollama log:

set 27 21:29:36 ollama[2595044]: time=2025-09-27T21:29:36.420-03:00 level=ERROR source=server.go:1459 msg="post predict" error="Post \"http://127.0.0.1:34119/completion\": context canceled"

Then Continue stops responding all together, not only completion. Only restarting the editor fixes this. While the editor isn't restarted, there is a high CPU usage, while the builtin VSCode inline suggestion icon in the status bar keeps spinning (by the way, probably this is what was reported in #7372 by "conflicts with copilot" perhaps?).

In my case, increasing debounceDelay in the LLM configuration helps a bit, this can be used if the model used does not respond fast enough.