Pinned issues
Issues
- 2
Try running inference on an ARM CPU
#7185 opened by duncantech - 3
Incomplete Checkpoints for Non-Sharded Parameters During SPMD Training in PyTorch XLA
#7215 opened by huzama - 5
How do I know which pytorch parameter corresponds to which parameter in hlo ir
#7191 opened by yao-jz - 1
Create a glossary
#7181 opened by duncantech - 2
[torch-xla 2.1] when functionalization is on, there are no aliasing for gradients when using gradient accumulation
#7174 opened by jeffhataws - 1
Try running Resnet example on GPU
#7182 opened by duncantech - 1
Add a table on hardware compatability
#7186 opened by duncantech - 1
Test export HLO instructions
#7188 opened by duncantech - 5
Try using the CPU PJRT plugin
#7184 opened by duncantech - 0
[RFC] PR Cherrypicking Process After a Release Branch Cut
#7203 opened by lsy323 - 4
Distributed spmd training with multiple compilations
#7196 opened by mars1248 - 13
Fatal error when training on TPU litepod16
#7138 opened by tengomucho - 4
multi_tensor_sgd triggers extra xla execution
#7051 opened by shenh10 - 3
Run and suggest improvements for GPU setup
#7178 opened by duncantech - 2
RuntimeError: isDifferentiableType(variable.scalar_type()) INTERNAL ASSERT FAILED when using torch.repeat
#7197 opened by ajayvohra2005 - 1
Update diagrams to work with dark mode
#7187 opened by duncantech - 0
- 3
- 6
Select a model to train and run on TPUs
#7190 opened by duncantech - 2
Improve examples README
#7179 opened by duncantech - 1
Add example for training small LLM
#7189 opened by duncantech - 0
Create a distributed and single device example
#7183 opened by duncantech - 1
- 0
Adding a new arg to a PyTorch op
#7180 opened by davidberard98 - 7
Persistent Cache will not recompile when `XLA_IR_DEBUG` and `XLA_HLO_DEBUG` changed
#7169 opened by JackCaoG - 2
General PJRT device support for torchbench
#7167 opened by lingzhi98 - 3
A large number of Tensors (>8000) in the graph will trigger an spmd sharding error
#7161 opened by mars1248 - 1
- 12
Problem with mesh shape in HybridMesh on TPU
#7102 opened by manh3152924 - 11
[Feature] add cp311, cp312 support for XLA device
#7100 opened by Mon-ius - 4
torch.matmul output buffer dtype is not respected when output dtype is different from input dtype
#7160 opened by HahTK - 8
Mismatch between XLA Tensor and PyTorch Native Tensor Results for `torch.matmul` in FP16 Precision on NVIDIA GPU
#7077 opened by lausannel - 4
Core dump on TPU using transformers' generate
#7122 opened by tengomucho - 4
Distributed Checkpointing Saving Empty Files with 2D SPMD
#7118 opened by huzama - 9
- 2
- 4
Cannot Import _XLAC
#7070 opened by DarkenStar - 6
`upsample_bilinear2d` HLO returns unexpected data-type.
#7095 opened by ysiraichi - 2
TPU Freezing on loss.backward() on same epoch.
#7101 opened by axhero7 - 3
- 7
DDP Hangs on TPU v3-8
#7109 opened by vivekjoshy - 0
- 8
`is_master_ordinal` breaks dataloader
#7074 opened by Jiayi-Pan - 1
- 10
torchdynamo + XLA crash
#7053 opened by pritamdamania87 - 7
- 3
Spmd whether expert parallelism is supported?
#7049 opened by mars1248 - 2
Export nn.Module.forward with kwargs to StableHLO
#7056 opened by johnmatter - 1
The behavior of `torch.einsum` significantly differs between TPU and other devices.
#7050 opened by jqhoogland - 4