csarofeen/pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

C++NOASSERTION

Issues

Update nvFuser number math to be in float
#2583 opened a year ago by mruberry
0
Example of Gather/Scatter from CrossEntropyLoss
#2556 opened 2 years ago by kevinstephano
14
`full` python API is broken and doesn't print a proper definition
#2502 opened a year ago by kevinstephano
1
warp reduction for complex numbers
#2526 opened a year ago by liqiangxl
0
Welford for complex numbers
#2527 opened a year ago by liqiangxl
1
Get Segmentation fault when I compile the following FusionDefinition
#2582 opened a year ago by ftxj
2
remove pytorch core code-base from nvfuser repo
#2388 opened a year ago by jjsjann123
5
detect parallel pattern error and provide a more helpful message
#2566 opened 2 years ago by liqiangxl
1
codegen error: RuntimeError: producer->getMemoryType() == MemoryType::Global || producer->getMemoryType() == MemoryType::Shared INTERNAL ASSERT FAILED
#2559 opened 2 years ago by jjsjann123
3
Indexing failure
#2560 opened 2 years ago by naoyam
1
Compilation failure with real()/imag() on complex64
#2564 opened 2 years ago by jacobhinkle
0
Take internal buffer size into account to decide on indexing type size
#2558 opened 2 years ago by mmigdal-nv
0
ValReplacementMutator fails to mutate an expression even when its output is mutated
#2554 opened 2 years ago by naoyam
1
broadcast_in_dim: The size of contiguity must equal to the number of non-broadcasting IterDomains
#2549 opened 2 years ago by IvanYashchuk
6
Failed test cases on H100, node cl1-2547
#2550 opened 2 years ago by liqiangxl
0
Expose Normal and Uniform nvFuser functions through the Python Frontend
#2386 opened 2 years ago by kevinstephano
2
`replay_swizzle` argument should come after `error_on_failure`
#2531 opened 2 years ago by naoyam
0
Improve shared memory reuse
#2529 opened 2 years ago by naoyam
0
cache var used by each iteration in grid persistent kernel, e.g. weight in layer norm backward
#2525 opened 2 years ago by liqiangxl
0
nearbyint fails with NVRTC compile error for integer inputs
#2524 opened 2 years ago by IvanYashchuk
0
Segfault when using addcmul in the frontend
#2506 opened 2 years ago by jacobhinkle
0
disable welford translate if the original scheduler before translate is not persistent
#2508 opened 2 years ago by liqiangxl
0
Return a copy for integer data types with 'trunc' operation.
#2499 opened 2 years ago by rdspring1
0
`var_mean` fails when reduction occurs across all possible tensor dimensions
#2486 opened 2 years ago by kevinstephano
2
Add complex specific support for `sign` operation
#2492 opened 2 years ago by kevinstephano
0
Initialization of reduction output may need to be predicated
#2487 opened 2 years ago by naoyam
0
lower_double_buffer.cpp is generating wrong sync for cp.async
#2463 opened 2 years ago by zasdfgbnm
0
Persistent buffer with L2 cache
#2469 opened 2 years ago by naoyam
0
Persistent buffer with Local and Shared
#2468 opened 2 years ago by naoyam
0
Heuristics tuning of outer grid persistence with iteration domain sizes that are not evenly distributed
#2467 opened 2 years ago by naoyam
0
Limited vectorization with mixed input types
#2466 opened 2 years ago by naoyam
0
Outer grid reduction kernels should be optimized as outer grid welford kernels are
#2465 opened 2 years ago by naoyam
0
Move vectorized domains innermost always
#2464 opened 2 years ago by naoyam
0
thread predication error when dim0 of a tensor is paralled by tidy then wirte temp result to gmem and reload then parallel by tidx
#2458 opened 2 years ago by liqiangxl
1
Thread predicate map should be cleared if a RAW sync is inserted
#2459 opened 2 years ago by naoyam
0
`views` in HF-Bart Self-Attention
#2456 opened 2 years ago by kevinstephano
2
Feature request: iota prim
#2419 opened 2 years ago by mruberry
0
Runtime error when the shape of `index` tensor is [1] in `index_select`
#2442 opened 2 years ago by ftxj
0
Runtime error when using multiple fusion output of `native_dropout` op
#2440 opened 2 years ago by ftxj
0
Python where op floatxfloat promotes to float64
#2380 opened 2 years ago by mruberry
7
squeeze prim should accept a sequence of dimensions to squeeze, not just one
#2421 opened 2 years ago by mruberry
0
TypeError when defining complex constants
#2425 opened 2 years ago by jacobhinkle
0
Extend Tensor Dimension support from 8 to 32
#2424 opened 2 years ago by kevinstephano
1
Feature request: slice prim
#2396 opened 2 years ago by mruberry
0
reload manually modified cuda file from python script
#2410 opened 2 years ago by liqiangxl
0
Compile error in `where(x, a, b)` with single precision `a` or `b`
#2403 opened 2 years ago by jacobhinkle
1
Cuda Kernel and Scheduled IR print functions from FusionDefinition
#2387 opened 2 years ago by kevinstephano
3
python view (reshape) op doesn't support zero dim tensors
#2383 opened 2 years ago by mruberry
1
combined inner outer reduction used in layer norm backward
#2399 opened 2 years ago by liqiangxl
0
Double counting of a tensor size in the calculation of projected buffers
#2381 opened 2 years ago by naoyam
0