Issues
- 3
Qwen2: can not run with the latest Qwen2 models
#2256 opened by zensh - 3
Since cudarc 0.11.4 error with PTX -- CUDA_ERROR_UNSUPPORTED_PTX_VERSION
#2237 opened by CoffeeVampir3 - 0
How to Implement New Operators Using CUDA Host Functions Along with Thrust and CUB Libraries
#2258 opened by chenwanqq - 26
- 3
Linear layer with same weights, biases, and inputs gives different output than Pytorch
#2250 opened by EricLBuehler - 0
ONNX: MaxPool with pads != 0
#2255 opened by limymy - 0
unsupported op_type STFT for op
#2254 opened by mzdk100 - 1
Unsupported op_type Pad for op
#2196 opened by mzdk100 - 1
Improve extracting values from `gguf_file::Value`
#2245 opened by polarathene - 0
Dynamic linking feature breaks pyo3 wrappers
#2252 opened by qooba - 1
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
#2241 opened by kdletters - 1
[question] difference with tvm-unity / mlc-llm
#2249 opened by louis030195 - 6
- 2
Automatically upcasting GGUF values
#2243 opened by EricLBuehler - 0
Improving the versatility of Tensor::slice_assign
#2242 opened by EricLBuehler - 5
Tensor::to_scalar very high latency
#2239 opened by RoggeOhta - 1
Meta voice WASM example?
#2232 opened by overheat - 2
CUBLAS_STATUS_NOT_SUPPORTED for Conv2d
#2218 opened by EricLBuehler - 0
Misleading `Tensor::matmul` documentation
#2228 opened by kckeiks - 3
Unsupported cuda toolkit version: `12050`
#2210 opened by Gadersd - 3
Unable to convert t5 model to GGUF
#2215 opened by niranjanakella - 1
SeparableConv2d implementation
#2219 opened by PacoDu - 2
Implement `torch.bucketize`
#2185 opened by EricLBuehler - 2
Quantization issue - Mixtral 8x22b
#2201 opened by edesalve - 0
Error: cannot seed the CPU rng with set_seed
#2216 opened by siddthartha - 2
Unsupported cuda toolkit version: `12040`
#2169 opened by kdletters - 2
- 0
- 0
Using MKL Documentation goes to 404
#2198 opened by CoffeeVampir3 - 1
How to slice a tensor?
#2197 opened by Gadersd - 6
Problem loading metadata of gguf file
#2152 opened by cnlancehu - 6
- 0
Example with model via `include_bytes!`?
#2186 opened by boustrophedon - 2
- 0
Whisper microphone example outputs gibberish
#2182 opened by krzysztofwos - 0
`sort_last_dim` fails on cuda
#2181 opened by lucasavila00 - 6
VarBuilder::from_bytes?
#2177 opened by boustrophedon - 0
Upgrade cudarc dependency to v0.11.1
#2173 opened by sidharthrajaram - 2
Transparent Huge Pages Support
#2149 opened by michaeleisel - 2
How to do a Axum's sse function for Candle?
#2167 opened by sunnyregion - 4
Why is the answer of my Gemma example not as expected? Did I miss something?
#2170 opened by coolbeevip - 0
- 0
No backward pass for `RmsNorm` if tensor is contiguous
#2168 opened by agerasev - 9
`broadcast_as` error when processing multiple tokens at once in quantized example
#2153 opened by EricLBuehler - 2
Error: Metal error Error while loading function: "Function 'cast_bf16_f16' does not exist" with llama3
#2163 opened by yIllusionSky - 2
Model to architecture mapping
#2161 opened by BDUG - 4
Quantized Phi-3 example fails "cannot find llama.attention.head_count in metadata"
#2154 opened by MoonKraken - 3
Top-p halves the generation speed in the Llama example
#2147 opened by Ayuei - 0
Tensor Filtering
#2148 opened by michaeleisel - 3