Issues
- 0
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
#3204 opened by aalbersk - 4
- 2
torchserve bloom7b1 demo Load model failed
#3202 opened by zqc2011hy - 0
Handling of subsequent RegisterModel calls to Management gRPC endpoint with same model & version
#3199 opened by mihaidusmanu - 6
- 7
TorchServe crashes in production with `WorkerThread - IllegalStateException error'
#3087 opened by MaelitoP - 4
How to send a torch array via request
#3195 opened by lschaupp - 1
pytest: test_example_torch_compile.py is faling
#3189 opened by agunapal - 5
Follow up on token authentication PR comments
#3185 opened by mreso - 8
question to model inference optimization
#3134 opened by geraldstanje - 0
Locust monkey patching leads to test cross-talking
#3193 opened by mreso - 0
- 1
- 1
- 1
- 1
Running segment_anything_fast example locally
#3186 opened by yousofaly - 1
Support "model-control-mode" in configuration
#3158 opened by lxning - 0
TorchServe linux aarch64 plan
#3072 opened by agunapal - 0
[RFC] Token Authorization by default
#3184 opened by udaij12 - 0
"Model mode control" for gRPC
#3180 opened by udaij12 - 0
"Token Authorization" for gRPC
#3181 opened by udaij12 - 0
NotImplementedError: Cannot copy out of meta tensor; no data! + Models not generating output text
#3167 opened by bjorquera1 - 0
Two-way authentication/Mutual SSL in gRPC
#3172 opened by MohamedAliRashad - 1
- 1
Update LLM/llama2 to Llama3
#3099 opened by mreso - 0
Make torchserve-kfs docker image multiplatform
#3161 opened by DanielTemesgen - 0
install dependency via conda
#3156 opened by lxning - 0
Enable token authentication as default
#3157 opened by lxning - 0
model archiver example very long 1 liner
#3154 opened by GeeCastro - 2
Limit resource in docker compose and worker in model
#3150 opened by ToanLyHoa - 0
- 5
CUDA out of Memory with low Memory Utilization (CUDA error: device-side assert triggered)
#3114 opened by emilwallner - 6
Load model failed - error: Worker died
#3104 opened by geraldstanje - 1
Duplicate base_neuronx_continuous_batching_handler.py
#3136 opened by mreso - 1
If micro_batch_size of micro-batch is set to 1, then model inference is still batch processing?
#3120 opened by pengxin233 - 4
How to pass parameters from preprocessing to postprocessing when using micro-batch operations
#3103 opened by pengxin233 - 1
Update cpp/llamacpp to Llama 3
#3098 opened by mreso - 0
- 0
Exchange Llama2 against Llama3 in HF_accelerate example
#3107 opened by mreso - 2
- 1
Whether the pre- and post-processing operations of batch processing are parallel
#3096 opened by pengxin233 - 1
- 0
Update large_models/gpt_fast to llama3
#3102 opened by mreso - 0
Update large_models/tp_llama to llama3
#3101 opened by mreso - 0
Update large_models/inferentia2/llama2 to Llama3
#3100 opened by mreso - 1
Server crashes in production with `WorkerThread - IllegalStateException error'
#3091 opened by MaelitoP - 0
improve security doc for model security check
#3065 opened by lxning - 1
Metrics collector crashes when NVIDIA MIGs are present
#3090 opened by UrkoAT - 3
Serve multiple models with both CPU and GPU
#3078 opened by hungtrieu07 - 2