pytorch/xla

Enabling PyTorch on XLA Devices (e.g. Google TPU)

C++NOASSERTION

Pinned issues

[RFC] PyTorch/XLA Auto-Sharding API

#6322 opened 5 months ago by yeounoh

Open10

[RFC] PR Cherrypicking Process After a Release Branch Cut

#7203 opened 3 days ago by lsy323

Open0

Issues

Try running inference on an ARM CPU
#7185 opened 4 days ago by duncantech
2
Incomplete Checkpoints for Non-Sharded Parameters During SPMD Training in PyTorch XLA
#7215 opened 2 days ago by huzama
3
How do I know which pytorch parameter corresponds to which parameter in hlo ir
#7191 opened 4 days ago by yao-jz
5
Create a glossary
#7181 opened a day ago by duncantech
1
[torch-xla 2.1] when functionalization is on, there are no aliasing for gradients when using gradient accumulation
#7174 opened 5 days ago by jeffhataws
2
Try running Resnet example on GPU
#7182 opened 4 days ago by duncantech
1
Add a table on hardware compatability
#7186 opened 4 days ago by duncantech
1
Test export HLO instructions
#7188 opened 4 days ago by duncantech
1
Try using the CPU PJRT plugin
#7184 opened 4 days ago by duncantech
5
[RFC] PR Cherrypicking Process After a Release Branch Cut
#7203 opened 3 days ago by lsy323
0
Distributed spmd training with multiple compilations
#7196 opened 4 days ago by mars1248
4
Fatal error when training on TPU litepod16
#7138 opened 2 days ago by tengomucho
13
multi_tensor_sgd triggers extra xla execution
#7051 opened a month ago by shenh10
4
Run and suggest improvements for GPU setup
#7178 opened 4 days ago by duncantech
3
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED when using torch.repeat
#7197 opened 4 days ago by ajayvohra2005
2
Update diagrams to work with dark mode
#7187 opened 3 days ago by duncantech
1
In-place operations on an DLPack aliased XLA tensor does not propagate.
#7198 opened 4 days ago by ysiraichi
0
Saving checkpoint silently hangs when including nn.Module in params
#7123 opened 12 days ago by dead-water
3
Select a model to train and run on TPUs
#7190 opened 4 days ago by duncantech
6
Improve examples README
#7179 opened 4 days ago by duncantech
2
Add example for training small LLM
#7189 opened 4 days ago by duncantech
1
Create a distributed and single device example
#7183 opened 4 days ago by duncantech
0
Why not register low precision autocast for scaled dot product attention?
#7177 opened 5 days ago by lingzhi98
1
Adding a new arg to a PyTorch op
#7180 opened 4 days ago by davidberard98
0
Persistent Cache will not recompile when `XLA_IR_DEBUG` and `XLA_HLO_DEBUG` changed
#7169 opened 5 days ago by JackCaoG
7
General PJRT device support for torchbench
#7167 opened 6 days ago by lingzhi98
2
A large number of Tensors (>8000) in the graph will trigger an spmd sharding error
#7161 opened 9 days ago by mars1248
3
[torchbench] `timm_nfnet` training failing on non-dynamo.
#7084 opened 6 days ago by ysiraichi
1
Problem with mesh shape in HybridMesh on TPU
#7102 opened 17 days ago by manh3152924
12
[Feature] add cp311, cp312 support for XLA device
#7100 opened 17 days ago by Mon-ius
11
torch.matmul output buffer dtype is not respected when output dtype is different from input dtype
#7160 opened 9 days ago by HahTK
4
Mismatch between XLA Tensor and PyTorch Native Tensor Results for `torch.matmul` in FP16 Precision on NVIDIA GPU
#7077 opened 23 days ago by lausannel
8
Core dump on TPU using transformers' generate
#7122 opened 13 days ago by tengomucho
4
Distributed Checkpointing Saving Empty Files with 2D SPMD
#7118 opened 11 days ago by huzama
4
Experiencing slow recompilation when manually building XLA
#7057 opened a month ago by wfckl789
9
Setting FrontEnd attributes for CC ops replica groups in the HLO
#7139 opened 11 days ago by amithrm
2
Cannot Import _XLAC
#7070 opened 24 days ago by DarkenStar
4
`upsample_bilinear2d` HLO returns unexpected data-type.
#7095 opened 11 days ago by ysiraichi
6
TPU Freezing on loss.backward() on same epoch.
#7101 opened 17 days ago by axhero7
2
Why does my 3-layer linear graph need to output two Transposes?
#7103 opened 17 days ago by mars1248
3
DDP Hangs on TPU v3-8
#7109 opened 13 days ago by vivekjoshy
7
[torchbench] `timm_efficientdet` training failing on non-dynamo.
#7083 opened 16 days ago by ysiraichi
0
`is_master_ordinal` breaks dataloader
#7074 opened 18 days ago by Jiayi-Pan
8
Failed to build with bazel: ERROR: Source forest creation failed
#7076 opened 22 days ago by yao-jz
1
torchdynamo + XLA crash
#7053 opened a month ago by pritamdamania87
10
[torchbench] The official benchmark for performance and accuracy check
#7040 opened a month ago by shenh10
7
Spmd whether expert parallelism is supported？
#7049 opened a month ago by mars1248
3
Export nn.Module.forward with kwargs to StableHLO
#7056 opened a month ago by johnmatter
2
The behavior of `torch.einsum` significantly differs between TPU and other devices.
#7050 opened a month ago by jqhoogland
1
model.to(xla_device) increases the number of named_parameters
#7042 opened a month ago by shenh10
4