Issues
- 0
Update nvFuser number math to be in float
#2583 opened by mruberry - 14
Example of Gather/Scatter from CrossEntropyLoss
#2556 opened by kevinstephano - 1
- 0
warp reduction for complex numbers
#2526 opened by liqiangxl - 1
Welford for complex numbers
#2527 opened by liqiangxl - 2
- 5
remove pytorch core code-base from nvfuser repo
#2388 opened by jjsjann123 - 1
- 3
codegen error: RuntimeError: producer->getMemoryType() == MemoryType::Global || producer->getMemoryType() == MemoryType::Shared INTERNAL ASSERT FAILED
#2559 opened by jjsjann123 - 1
Indexing failure
#2560 opened by naoyam - 0
Compilation failure with real()/imag() on complex64
#2564 opened by jacobhinkle - 0
- 1
ValReplacementMutator fails to mutate an expression even when its output is mutated
#2554 opened by naoyam - 6
broadcast_in_dim: The size of contiguity must equal to the number of non-broadcasting IterDomains
#2549 opened by IvanYashchuk - 0
Failed test cases on H100, node cl1-2547
#2550 opened by liqiangxl - 2
Expose Normal and Uniform nvFuser functions through the Python Frontend
#2386 opened by kevinstephano - 0
`replay_swizzle` argument should come after `error_on_failure`
#2531 opened by naoyam - 0
Improve shared memory reuse
#2529 opened by naoyam - 0
cache var used by each iteration in grid persistent kernel, e.g. weight in layer norm backward
#2525 opened by liqiangxl - 0
- 0
Segfault when using addcmul in the frontend
#2506 opened by jacobhinkle - 0
disable welford translate if the original scheduler before translate is not persistent
#2508 opened by liqiangxl - 0
- 2
`var_mean` fails when reduction occurs across all possible tensor dimensions
#2486 opened by kevinstephano - 0
Add complex specific support for `sign` operation
#2492 opened by kevinstephano - 0
- 0
- 0
Persistent buffer with L2 cache
#2469 opened by naoyam - 0
Persistent buffer with Local and Shared
#2468 opened by naoyam - 0
Heuristics tuning of outer grid persistence with iteration domain sizes that are not evenly distributed
#2467 opened by naoyam - 0
Limited vectorization with mixed input types
#2466 opened by naoyam - 0
Outer grid reduction kernels should be optimized as outer grid welford kernels are
#2465 opened by naoyam - 0
Move vectorized domains innermost always
#2464 opened by naoyam - 1
thread predication error when dim0 of a tensor is paralled by tidy then wirte temp result to gmem and reload then parallel by tidx
#2458 opened by liqiangxl - 0
- 2
`views` in HF-Bart Self-Attention
#2456 opened by kevinstephano - 0
Feature request: iota prim
#2419 opened by mruberry - 0
- 0
- 7
Python where op floatxfloat promotes to float64
#2380 opened by mruberry - 0
squeeze prim should accept a sequence of dimensions to squeeze, not just one
#2421 opened by mruberry - 0
TypeError when defining complex constants
#2425 opened by jacobhinkle - 1
Extend Tensor Dimension support from 8 to 32
#2424 opened by kevinstephano - 0
Feature request: slice prim
#2396 opened by mruberry - 0
reload manually modified cuda file from python script
#2410 opened by liqiangxl - 1
- 3
- 1
- 0
- 0