Pinned issues
Issues
- 7
[Feature Request] OpenAI-compatible `stop` param
#1731 opened by josephrocca - 5
[Bug] Many concurrent requests with `--enable-prefix-caching` AND `--quant-policy 8` crashes with: `CUDA runtime error: an illegal memory access was encountered /opt/lmdeploy/src/turbomind/utils/allocator.h:231`
#1744 opened by josephrocca - 1
logits输出有问题[Bug]
#1742 opened by GZL11 - 2
[Feature] Qwen 2 Support
#1746 opened by suptejas - 1
[Bug] xcomposer 4khd lora weight error in lmdeploy
#1747 opened by ztfmars - 0
[Bug] Space is incorrectly removed from start of generated text for `/v1/completion` endpoint
#1743 opened by josephrocca - 0
[Feature] `min_p` sampling parameter
#1745 opened by josephrocca - 2
[Bug] `detokenize_incrementally`: OverflowError: out of range integral type conversion attempted
#1739 opened by josephrocca - 20
[Bug] AttributeError: 'InternVLChatConfig' object has no attribute 'hidden_size'
#1725 opened by DefTruth - 1
[Docs] Guidance on setting `num_tokens_per_iter` and `max_prefill_iters` to optimal values
#1740 opened by josephrocca - 2
[Feature] Speculative Decoding
#1738 opened by josephrocca - 3
[Docs] Where is prefix cache data stored?
#1737 opened by josephrocca - 4
[Bug] 量化模型时无输出
#1735 opened by NB-Group - 4
[Feature] InternVL-Chat-V1-5-AWQ merge LoRA adapter
#1691 opened by isongxw - 2
about InternVL−Chat−V1.5 8 bit quantization
#1727 opened by tairen99 - 7
[Bug] internlm2-chat-1_8b模型使用4bit KV量化的时候找不到key_stats.pth
#1720 opened by jxfruit - 4
- 10
lmdeploy0.4.2 8卡推理llama7-70b-instruct无反应
#1712 opened by yak9meat - 3
- 1
[Bug] failed to set temperature 1.2
#1732 opened by zhyncs - 2
[Bug] CUDA OOM during calibration even with 5x 4090s? Falling back to `--device cpu` also fails (with different error)
#1729 opened by josephrocca - 16
- 0
[Bug] 部署cogvlm2运行时,接受的多个并发之间存在干扰,后面的请求使用前面请求传的图像
#1730 opened by LRHstudy - 1
[Feature] Support for THUDM/glm-4v-9b
#1726 opened by Iven2132 - 1
High GPU memory for running InternVL-Chat-V1-5-AWQ
#1728 opened by tairen99 - 4
[Feature] Create Cuda 12 docker images
#1709 opened by nickmitchko - 7
[Bug] torch.cuda.OutOfMemoryError when loading the 4 Bit InternVL-Chat-V1-5 vision
#1704 opened by tairen99 - 5
- 1
[Feature] have plan to support MiniCPM-V?
#1723 opened by HaoLiuHust - 0
How to trace multiple GPUs using nsight system
#1722 opened by sleepwalker2017 - 2
[Feature] Make torchvision optional
#1717 opened by zhyncs - 1
batch inference
#1689 opened by dirtycomputer - 6
[Bug] not support inference qwen1.5
#1697 opened by zzc0208 - 1
[Bug] ModuleNotFoundError: No module named '_turbomind' loading llava Mistral 7B
#1699 opened by Alexis-IMBERT - 1
[Feature] V100量化推理
#1711 opened by QwertyJack - 12
[Bug] RuntimeError: [TM][ERROR] Assertion fail: D:\a\lmdeploy\lmdeploy\src/turbomind/models/llama/Barrier.h:20
#1703 opened by NB-Group - 1
[Feature] 想问下有打算支持GLM4V模型吗
#1713 opened by will-wiki - 2
AWQ small batches optimization
#1707 opened by zhyncs - 1
- 2
[Bug]
#1695 opened by xiaoajie738 - 2
[Bug] 使用LM启动API服务器的InternVL-1.5无法识别图片
#1701 opened by BigWhiteFox - 0
How is the support for RoPE difference between `hf llama` and `meta llama`?
#1700 opened by sleepwalker2017 - 1
[Feature] Support for LLaVA-NeXT
#1685 opened by deece - 2
Encountered core dump issue when quantifying the model
#1698 opened by zzc0208 - 3
[Bug] output diff when temperature set zero
#1688 opened by zhyncs - 5
[Docs] How are multiple images handled?
#1686 opened by pseudotensor - 3
- 1
[Feature] support for MiniCPM-Llama3-V 2.5
#1693 opened by LRHstudy - 0
[Feature] peft<=0.9.0 要求的版本要求太低,与较多环境要求peft>0.10冲突,能否修改
#1682 opened by OKC13 - 11
[Bug] 下载代码执行internvl-v1.5量化,导入本地模型时报错
#1681 opened by qingchunlizhi