theroyallab/tabbyAPI

An OAI compatible exllamav2 API that's both lightweight and fast

PythonAGPL-3.0

Issues

[REQUEST] Don't error when max_tokens request are too long causing Job required pages too small, just generate up to the available pages.
#262 opened 10 days ago by Originalimoc
1
[BUG] OAI doc recommends parameter max_completion_tokens over max_tokens. Support aloneside with max_tokens.
#256 opened 12 days ago by Originalimoc
1
[BUG] json_schema not always enforced
#258 opened 12 days ago by afoland
4
[BUG] Llama 3.3 models not working
#260 opened 12 days ago by wypiki
11
[BUG] Much slower than vllm/lmdeploy and GPU utilization is low
#259 opened 15 days ago by fzyzcjy
7
[BUG] EXL2 v0.2.5+ is broken with yarn RoPE.
#261 opened 15 days ago by Originalimoc
3
[REQUEST] Better document rope_scaling/rope_alpha in wiki, and add config of yarn_rope_factor/yarn_rope_original_max_position_embeddings
#239 opened a month ago by Originalimoc
6
[REQUEST] Vision Models
#235 opened a month ago by bdashore3
7
[REQUEST] Automatic Model Unloading while idling
#216 opened 2 months ago by TetrisBlack
3
[BUG] Manual GPU split fails on quen-VL when using 3 GPUs.
#257 opened a month ago by Ph0rk0z
0
[REQUEST] Auto switch draft model on and off according to context length(prompt + completion).
#255 opened a month ago by Originalimoc
2
[BUG] Didn't close connection properly when error
#251 opened a month ago by Originalimoc
1
[BUG] "Disabling GPU split because one GPU is in use" - then Tabby only uses 1 GPU
#250 opened a month ago by sammcj
3
[BUG] Tool Calling not working for Llama 3.2 3B
#234 opened a month ago by raisbecka
5
[REQUEST] Defered Type Hints & Type_checking paradigm
#218 opened a month ago by michaelfeil
1
[BUG] [Dev branch] Failing to load non vision model
#248 opened a month ago by TyraVex
4
[BUG] 2 concurrent requests make both stream.. FASTER than single stream? Exllamav2 issue or?
#247 opened a month ago by Originalimoc
4
[BUG] max_seq_len can not be <= 2047
#240 opened a month ago by Originalimoc
1
[BUG] The ability to ignore model field in request and just use current loaded model. (Ignore admin key checking if inline_model_loading set to false)
#236 opened a month ago by Originalimoc
11
[BUG] Not possible to use string as input for v1/embeddings
#245 opened a month ago by ro99
2
[BUG] Compatibility mode (CUDA < 8.0) not working with draft models
#238 opened a month ago by matatonic
4
[REQUEST] Remove default value of draft_model_dir let it can be defined tabby_config.yml
#242 opened a month ago by Originalimoc
8
[REQUEST] Better Infinity Embeddings support
#211 opened a month ago by arbi-dev
2
[REQUEST] Document tabby_config.yml in wiki
#241 opened a month ago by Originalimoc
0
[REQUEST] Nested model_name key
#231 opened a month ago by SinanAkkoyun
2
[REQUEST] Fix the links on the landing page of the wiki
#227 opened a month ago by Feldherren
1
[REQUEST(Maybe)] is yarn_rope_factor/yarn_rope_original_max_position_embeddings config passed/loaded to exllama?
#237 opened a month ago by Originalimoc
1
[REQUEST] Vision support.
#229 opened a month ago by Ph0rk0z
2
Update "getting started" using docker with a little more info
#217 opened 3 months ago by johnwee1
0
Exceptions when shutting down with no model loaded
#202 opened 2 months ago by awatuna
3
[BUG] Inline loading doesn't respect config.yml
#226 opened 2 months ago by Async0x42
3
[REQUEST] Add support for new output formatter Formatron
#212 opened 2 months ago by saturosfz
4
[REQUEST] Modify strings probability, rather than outright banning with banned_strings
#223 opened 2 months ago by atisharma
2
[REQUEST] FlashAttention 1 Support.
#221 opened 2 months ago by Abdulhanan535
2
[BUG] Structured Outputs?
#219 opened 2 months ago by ExtinctionEvent
0
[BUG] After fasttensors was removed, the memory usage seemed abnormal.
#215 opened 2 months ago by Pevernow
9
Very Strange OOM errors across multiple GPU's. OOM's, BSOD's, extreme driver crashesh all stemming from TabbyAPI
#187 opened 2 months ago by SytanSD
7
[BUG] Draft model section not being read in config.yml
#209 opened 3 months ago by atisharma
2
[REQUEST] Support for OpenAI Message Object Content Array
#207 opened 3 months ago by strikeoncmputrz
6
[BUG] Batch not working, all requests sequential?
#210 opened 3 months ago by SinanAkkoyun
2
[BUG] TabbyAPI uses 100% of a CPU core after a request failed due to excessively long prompt
#203 opened 3 months ago by NeoChen1024
1
[BUG]docker cannot found cuda-toolkit
#205 opened 3 months ago by ultranationalism
1
[REQUEST] Tensor-parallel support for Gemma2ForCausalLM
#188 opened 3 months ago by Abdulhanan535
2
[BUG] Completitions are broken
#179 opened 3 months ago by TyraVex
13
[BUG] Lack of documentation/unable to use JSON/Grammar
#200 opened 3 months ago by fullstackwebdev
2
[REQUEST] Make docker build action faster
#186 opened 4 months ago by bdashore3
4
[BUG] v1/template/switch is broken
#198 opened 3 months ago by SecretiveShell
1
[BUG] 'TabbyConfig' object has no attribute 'from_file'. Did you mean: '_from_file'?
#196 opened 3 months ago by atisharma
2
[REQUEST] Update the docker section in the wiki
#182 opened 4 months ago by AmgadHasan
1
[BUG] After updating to exllamav2-0.1.9 (from 0.1.8) cannot load Mistral Large 2 123B with a draft model
#177 opened 4 months ago by Lissanro
9