Lightning-AI/lightning-thunder
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
PythonApache-2.0
Issues
- 0
LitGPTSDPABenchmark runs incorrect configs
#317 opened by vedaanta - 0
- 8
NumberProxy is no Number
#272 opened by jjsjann123 - 2
Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT
#246 opened by mpatel31415 - 3
- 2
Expose `torch.compile` arguments as compile options
#281 opened by carmocca - 6
implement zip lookaside in Python interpreter (enables e.g. thunder.jit with zip from LitGPT LLaMAMoE)
#284 opened by IvanYashchuk - 3
- 3
`thunder.jit` fails with `nn.Softmax` raising `got an unexpected keyword argument '_stacklevel'`
#258 opened by ptrblck - 0
Add the torch.compile executor as a test executor
#299 opened by carmocca - 0
Support FSDP and torch.compile
#298 opened by carmocca - 0
- 0
Dynamic constraints and NumberProxies
#262 opened by jjsjann123 - 1
torch.unflatten not supported by thunder.jit
#288 opened by Fuzzkatt - 1
torch.nn.MultiheadAttention with thunder.jit error
#287 opened by Fuzzkatt - 6
Remove all occurances of thunder.compile and TestExecutor.make_callable_legacy
#198 opened by IvanYashchuk - 1
jit: `torch.cuda.stream` and other related functionality are silently ignored when jitting.
#280 opened by kshitij12345 - 1
`thunder.distributed.utils.sort_waits` is broken
#277 opened by IvanYashchuk - 3
Support NeMo StableDiffusion network
#266 opened by athitten - 0
- 0
- 0
Implement TensorBase.gather
#267 opened by athitten - 0
Implement TensorBase.long
#268 opened by athitten - 0
Implement _VariableFunctionsClass.randint of torch
#269 opened by athitten - 5
[ci] : We should add a CI flow with TransformerEngine installed so that we can run the relevant tests.
#196 opened by kshitij12345 - 0
Timeout for Platypus-30B and Thunder compile
#294 opened by mpatel31415 - 0
getitem grad is not calculated properly on Windows
#296 opened by mruberry - 8
`torch.Tensor.numel` method : Don't know how to interpret a callable with type <class 'int'>
#240 opened by kshitij12345 - 0
- 7
increase of GPU memory footprint
#216 opened by mpatel31415 - 1
Long compilation time
#229 opened by mpatel31415 - 1
Feature request: Support sharding parameters where first dimension is not divisible by 8
#248 opened by mpatel31415 - 3
- 5
cuDNN SDPA executor has CPU overhead
#241 opened by parthmannan - 0
Add support for torch.gather
#223 opened by IvanYashchuk - 2
Add sanity check for primitive inplace copy operator
#265 opened by kiya00 - 2
benchmarking — create a notebook showing how to work with the single gpu benchmarks
#205 opened by mruberry - 4
- 0
Add support for FP8E4M3 and FP8E5M2 dtypes
#254 opened by IvanYashchuk - 10
If saved_for_backward returns NumberProxy, the value is taken from compile time, not runtime
#231 opened by kiya00 - 0
- 1
- 0
Benchmarking suite that runs scripts
#224 opened by riccardofelluga - 0
Enable xfailed tests from test_apex_executor.py
#220 opened by IvanYashchuk - 0
optimizer: jitting the optimizer step
#204 opened by kshitij12345 - 0
- 0
Support non_blocking in Tensor.to
#197 opened by kshitij12345 - 3
Mixtral 8x7B network support
#194 opened by riccardofelluga - 0
Support `torch.nonzero`
#195 opened by riccardofelluga - 0