codelion/optillm

Optimizing inference proxy for LLMs

PythonApache-2.0

Issues

Some observations and questions on Google FRAMES Benchmark readurls&memory-gpt-4o-mini method evaluation
#106 opened a month ago by RGSmirnov
1
I want to try to use this project for local llm, but I'm not sure how.
#93 opened a month ago by matbee-eth
2
Warning: Do not support sampling multiple responses
#99 opened a month ago by Mushoz
4
Add a web search plugin that can be used to ground the responses
#97 opened a month ago by codelion
0
How to reproduce RTC Eval 100% locally?
#96 opened a month ago by botelhorui
1
Implement cot decoding with llama.cpp
#65 opened a month ago by codelion
5
parse conversation reports error
#84 opened 2 months ago by femto
0
I get the following error: list index out of range
#67 opened 2 months ago by ErykCh
14
Feature: easy to add new approaches
#72 opened 2 months ago by ErykCh
1
Setting the default approach doesn't work
#69 opened 2 months ago by ErykCh
6
Ambiguous configuration for mcts
#68 opened 2 months ago by ErykCh
6
Resulting docker image size (6.36GB) is quite large - is there any opportunity to reduce this?
#71 opened 2 months ago by sammcj
3
Thanks for adding the entropy based sampling, by any chance do you have a comprasion with other alternative methods?
#70 opened 2 months ago by shamanez
2
(MOA) Fails with "List Index Out of Range" Error on OpenAI-Compatible Ollama API Endpoint
#60 opened 2 months ago by chrisoutwright
6
Add a lighting template for running optillm
#56 opened 3 months ago by codelion
1
Is there any possibility we align some interest?
#57 opened 2 months ago by femto
1
Using llama-server issue with 'no_key' API key
#61 opened 2 months ago by s-hironobu
1
Scripts to reproduce benchmark results
#63 opened 2 months ago by zhxieml
1
Request for Reference Citations for CoT Prompting Methods
#64 opened 2 months ago by qsunyuan
1
Implement routing
#37 opened 2 months ago by codelion
1
I can see cot_decode method has implemented, but we can't use it with the proxy.
#59 opened 2 months ago by shamanez
13
When I tried the optillm with my own openai API compatible hosted model I get this error
#58 opened 2 months ago by shamanez
6
Integration with Gemini 1.5 models
#54 opened 3 months ago by tranhoangnguyen03
2
token counting
#52 opened 3 months ago by darkacorn
2
[Question]: Which paper is mcts.py based on?
#51 opened 3 months ago by RomanKoshkin
1
Can't install z3-solver, is it possible to support lean4?
#38 opened 3 months ago by femto
20
Error processing request: litellm.AuthenticationError: AuthenticationError
#43 opened 3 months ago by meeen
2
Add support for logging with --log=debug
#44 opened 3 months ago by codelion
1
Add support for sympy in solver approach
#41 opened 3 months ago by codelion
1
Possible error in calculate_confidence() logic for cot_decoding.py
#45 opened 3 months ago by jovanwongzixi
1
Add support to pass slug as extra_body argument instead of prefix of model name
#39 opened 3 months ago by codelion
0
Response text missing when using third-party AI frontend with local endpoint
#36 opened 3 months ago by ibagur
4
Streaming, Context, Port & Proxy vs Library
#3 opened 3 months ago by Spiritdude
7
Clarification: proxy or library for cot_decoding??
#35 opened 3 months ago by lee-b
2
Change api-key to optillm-api-key
#24 opened 3 months ago by codelion
0
use with llama.cpp
#8 opened 3 months ago by scalar27
8
Flask import fails
#23 opened 3 months ago by vanetreg
5
Create a gradio based GUI to compare different approaches
#9 opened 3 months ago by codelion
0
Support AzureOpenAI client
#13 opened 3 months ago by codelion
1
Gsm8k bad test
#16 opened 3 months ago by Tostino
1
Minimal working MCTS example
#5 opened 3 months ago by RomanKoshkin
3
Too many tokens
#4 opened 3 months ago by integral-llc
2
initial_query both in system message and user message
#2 opened 3 months ago by nitinrathi
2