XTPU

Boost AI and LLM application dev on TPU.

Overview

🚧 Buiding in 2024.. 🚧

1. ⚙ New vm instance and user

# Run on Cloud Shell Terminal
curl -fsSL bit.ly/new-gcp-vm-instance | sh
## Here, USER=m0nius ZONE=asia-east1-b  TEMPLATE=xvm
curl -fsSL bit.ly/new-gcp-vm-instance | sh -s -- m0nius asia-east1-b xvm

# Generate new ssh key
curl -fsSL bit.ly/ssh-vm-gen | sh

2. 💽 Attach vm disk

# Run on Cloud Shell Terminal
curl -fsSL bit.ly/attach-gcp-vm-disk  | sh
## Here, DISK=disk-1 ZONE=asia-east1-b VM_NAME=xvm-1
curl -fsSL bit.ly/attach-gcp-vm-disk | sh -s -- disk-1 asia-east1-b xvm-1

3. ⛓ TPUv2, TPUv3, TPUv4, TPUv5 nodes

# Clean all queued TPU nodes
curl -fsSL bit.ly/clean-tpu-nodes | sh -s -- proj_name asia-east1-b
# Run on Cloud Shell Terminal, TPUv2
curl -fsSL bit.ly/new-tpu-v2-node | sh -s -- -y
# Run on Cloud Shell Terminal, queued TPUv4
curl -fsSL bit.ly/new-tpu-v4-queue | sh -s -- -y

4. 🫧 LLM training

4.1 Miniconda Environment

TPU

curl -fsSL bit.ly/tpu-torch-xla | sh
#OR
curl -fsSL bit.ly/tpu-rootless-xla | sh

CUDA

curl -fsSL bit.ly/cuda-torch-xla | sh
#OR
curl -fsSL bit.ly/cuda-rootless-xla | sh

4.2 Model Training

# Run on Cloud Shell Terminal, TPUv2
curl -fsSL bit.ly/new-LLM-TPUv2-train | sh -s -- -y
# Run on Cloud Shell Terminal, queued TPUv4
curl -fsSL bit.ly/new-LLM-TPUv4-train | sh -s -- -y

5. 🥋 Optimize HW

# Replace OS of the VM to Alpine Linux 
curl -fsSL bit.ly/os-LLM-Alpine-acc | sh -s -- 3.19

6. 🪢 Dataset Mount

# Mount remote dataset
curl -fsSL bit.ly/remote-LLM-dataset-mount | sh -s -- dataset

7. API Create

curl -fsSL bit.ly/new-gcp-api | sh -s -- project_name api_num api_target
curl -fsSL bit.ly/new-gcp-dns | sh -s -- cf_token cf_domain cf_zone
curl -fsSL bit.ly/new-gcp-sb | sh -s -- cf_token cf_domain cf_zone
curl -fsSL bit.ly/new-gcp-wg | sh -s

8. API Test

curl -fsSL bit.ly/vertex-test | sh -s -- project_name model_name

Mon-ius/XTPU

XTPU

Overview

1. ⚙ New vm instance and user

2. 💽 Attach vm disk

3. ⛓ TPUv2, TPUv3, TPUv4, TPUv5 nodes

4. 🫧 LLM training

4.1 Miniconda Environment

4.2 Model Training

5. 🥋 Optimize HW

6. 🪢 Dataset Mount

7. API Create

8. API Test

Reference

Basic

SPMD

FSDP

Finetune