quarkiverse/quarkus-langchain4j

Preloading ollama model should only happen if the model doesn't already exist

Closed this issue · 6 comments

When preloading an Ollama model, it should first check to see whether or not the model already exists.

For example, if I'm using the mixtral model, it takes almost 10 minutes to download/install. Isn't there an API call to detect whether the model is already present?

Maybe additional logic in

@Override
public void preloadChatModel(String modelName) {
String serverUrl = String.format("http://%s:%d/api/chat", options.host(), options.port());
try {
HttpRequest httpRequest = HttpRequest.newBuilder()
.uri(new URI(serverUrl))
.POST(HttpRequest.BodyPublishers.ofString(String.format("{\"model\": \"%s\"}", modelName)))
.build();
HttpResponse<String> httpResponse = HttpClient.newHttpClient().send(httpRequest,
HttpResponse.BodyHandlers.ofString());
if (httpResponse.statusCode() != 200) {
throw new RuntimeException(
"Unexpected response code: " + httpResponse.statusCode() + " response body: "
+ httpResponse.body());
}
} catch (URISyntaxException e) {
throw new IllegalStateException("Unable to convert " + serverUrl + " to URI", e);
} catch (ConnectException e) {
throw new OllamaClient.ServerUnavailableException(options.host(), options.port());
} catch (IOException e) {
throw new UncheckedIOException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}

Also, why is Ollama-specific stuff inside the core deployment module? Shouldn't it belong in the ollama extension?

Isn't there an API call to detect whether the model is already present?

We already do that, so I'm interested in how you reproduce what you mention

If I already have the model present

╰─ ollama ls
NAME                    ID              SIZE    MODIFIED       
nomic-embed-text:latest 0a109f422b47    274 MB  39 minutes ago  
mixtral:latest          d39eb76ed9c5    26 GB   39 minutes ago  

When I run quarkus dev I see

Ollama model pull: 2024-05-31 14:39:40,154 INFO  [io.qua.lan.dep.dev.DevServicesOllamaProcessor] (build-59) Preloading model mixtral

And it sits there for about 15 minutes.

There isn't any code which reaches out to see if the model is already present. It's instructing ollama to re-pull the model.

if ((ollamaChatModels.size() == 1) && (config.devservices().preload())) {
String modelName = ollamaChatModels.get(0).getModelName();
LOGGER.infof("Preloading model %s", modelName);
client.preloadChatModel(modelName);
}

If I also add -Dquarkus.langchain4j.devservices.preload=false, it skips that step and immediately starts and my app, which works fine, because the model is already loaded.

It looks like the processor tries to see what local models are available:

Set<ModelName> localModels = client.localModels().stream().map(mi -> ModelName.of(mi.name()))
                    .collect(Collectors.toSet());

I'm not sure what this returns. All I know is that this block of code in the processor

            if ((ollamaChatModels.size() == 1) && (config.devservices().preload())) {
                String modelName = ollamaChatModels.get(0).getModelName();
                LOGGER.infof("Preloading model %s", modelName);
                client.preloadChatModel(modelName);
            }

is triggering ollama to re-pull the model, which on my machine takes 15 minutes.

That sounds like an Ollama bug TBH, but I'll try it on Monday

I tried this and preloading a model works exactly as expected, I could not reproduce the behavior you are seeing.

Closing as I cannot reproduce.

Feel free to add more information and I can have another look.

Sorry I've been at a f2f this week. I'll be back in the office tomorrow.