kherud/java-llama.cpp

Questions and Feedback

Solido opened this issue · 3 comments

Solido commented

Hi!

I'm starting using your library.
here are some points to chat around.

  • It's great that autofill is already present. I would prefer a more explicit name than generate and more dedicated one like the name of the exec.

  • I would like to get some stats about the tokens and right now the signature only expose them as String.

  • I'm testing my prompts with main then converted them to java-llama and even with the same parameters I get very different responses as far as un qualifying the generations. How is that possible?

I hope to complete some more feedback with time. Thank you.

kherud commented

It's great that autofill is already present. I would prefer a more explicit name than generate and more dedicated one like the name of the exec.

Yeah I agree it isn't optimal. There is a combinatorial blowup from generation/completion, answering/infilling, and default/custom parameters. Currently, both generate() and complete() have four different signatures. On the other hand overly verbose method names wouldn't be great either, something like generateInfilling(), generateAnswer(), completeInfilling(), ...

I would like to get some stats about the tokens and right now the signature only expose them as String.

Since yesterday (version 2.2.1) generated outputs also contain the token id and probabilities. Note, that you have to configure InferenceParameters#nProbs in order for probabilities to be returned. Let me know if you need any other information. In general I try to keep the communication overhead from C++ to Java as low as possible to optimize performance.

I'm testing my prompts with main then converted them to java-llama and even with the same parameters I get very different responses as far as un qualifying the generations. How is that possible?

Hm, I'm not sure what the problem could be. Is it possible for you to share a prompt that I could try to reproduce this with? And which model did you use? If you can't share anything, that's fine, I will make a comparison with my own setup.

Solido commented
  • I can confirm that the new distribution solved the missing metal file and token stats are exposed
  • Concerning the naming I would stick to the executable name and avoid using generate as prefix keeping it only because main is not clear enough. We would have generate(...), infill(...), embeddings(...), finetune(...), llava(...) thus when new options are release we are aligned.
  • For comparaison I'm using Mistral and ported every parameters from main to Inference and Model params but I'm not sure those are respected like temperature! it's clear when instructing with structured format and fewshots. Those are totally ignored by the output.

I just released version 3.0 and decided to simplify the API. There aren't different overloads of LLamaModel#complete(...) and LLamaModel#generate(...) anymore, but the kind of task is set via InferenceParameters, e.g. InferenceParameters#setInputPrefix(String) and InferenceParameters#setInpufSuffix(String). This simplifies the library internally, improves code re-use, and better aligns with the llama.cpp server API.

Also, the library was upgraded to the newest llama.cpp features and almost the whole Java binding code was reworked. I'm curious, if you still experience degraded LLM quality.

I'll close this issue, but feel free to re-open if you have feedback.