TWCC Container Service Tutorials

以下為一系列使用TWCC 建立GPGPU Container 執行運算工作的教學：

Preparation

各項Tutorial的前置作業

Tutorial 1 -- MNIST(手寫數字辨識資料集)

進行MNIST手寫數字圖像的AI training。

Tutorial 2 -- GPU Burn Testing

利用GPU Burn來測試GPU用量是否正常。

Tutorial 3 -- InceptionV3 Training

Container掛載S3 bucket，以S3 tool將dataset上傳至S3 bucket後，再進行影像辨識training；結果儲存於S3 bucket，以供外部存取。

Tutorial 4 -- InceptionV3 Inference

使用InceptionV3進行影像辨識。

Preparation

Step 1. Sign up and Sign in

請先申請iService帳號與可使用TWCC資源之計畫
登入TWCC

Step 2. Create Container

依user guide建立一個GPGPU container，Solution選擇Tensorflow。
Image建議選擇支援Python 3的版本，Hardware選擇一顆GPU的設定即可。
掛載S3 Bucket (Only for Tutorial 3)

Storage需掛載上傳dataset所使用的S3 bucket。 p.s. 左側Storage選完已建立的S3 bucket後①，需點右邊的加號才能完成掛載S3 bucket②，完成結果將顯示在下方③。
建立container後，待狀態顯示ready，即已成功建立container。

Step 3. Download S3 Tool (only for Tutorial 3)

下載S3 tool (S3儲存工具)，如S3 browser (for Windows)或Cyberduck (for Linux)。
執行S3 tool，依TWCC S3 Storage Overview提供之URL、Access Key與Secret Key連線。

以S3 browser為例：
- 開啟後點選左上角Account→Add new account
- 建立Account名稱①、選擇S3 Compatible Storage②、填入URL③、Access Key④、Secret Key⑤，即完成
- 確認建立成功，browser左側會帶出同計畫下的所有bucket

Step 4. Clone Git

從Container細節頁面點擊使用Jupyter terminal進入container(或以SSH連線進入)。

若欲以Jupyter terminal連線，點擊右側的New①與內部的Terminal②以開啟連線。若以SSH連線，請使用iService主機帳號與密碼登入。
輸入以下指令，將NCHC_GitHub training程式複製到container。
git clone https://github.com/TW-NCHC/AI-Services.git

Tutorial 1 -- MNIST

在TWCC建立一個GPGPU容器，並使用Jupyter Notebook進行MNIST(手寫數字辨識資料集)的AI訓練。

Step 1. Start & Run Jupyter Notebook

回到container細節頁面並連線到Jupyter Notebook
點進AI-Services/Tutorial_One，點擊右側的New再點選內部的Python3以開啟notebook。
開啟Notebook後請將原目錄底下的Keras_MNIST.txt內的程式碼複製到Notebook內
將程式碼複製完後，點選Run按鈕即可開始訓練

訓練的結果會顯示在程式下方

Tutorial 2 -- GPU Burn Testing

Step 1. Run GPU Burn

輸入以下指令，進入Tutorial_Two目錄。
cd AI-Services/Tutorial_Two
輸入以下指令，會將GPU_Burn程式下載下來並開始進行GPU。
bash gpu_testing.sh
當看到以下訊息表示已測試完畢

Tutorial 3 -- InceptionV3 Training

Step 1. Download Cifar 10 datasets

從Cifar 10下載dataset (CIFAR-10 python version)：

Step 2. Upload the dataset to S3 Bucket with S3 tool

執行S3 tool，並上傳資料(cifar-10-python.tar.gz)。
確認是否上傳成功的方法：可在TWCC S3 bucket列表查看已使用的空間，是否因資料上傳而增加

或以Search Metadata搜尋檔案。

Step 3. AI Training

3-1 準備Training程式

輸入以下指令，進入Tutorial_Three目錄。
cd AI-Services/Tutorial_Three
輸入以下指令，在GPFS備好dataset。資料將從S3 bucket移至GPFS掛載路徑且準備進行訓練。
bash V3_training.sh --path <your_S3_bucket_name>

在ternimal可看到如下圖訊息，此訊息表示準備開始訓練模組：
在training過程中，可在MONITORING頁面監控CPU/GPU、記憶體與網路使用狀況。
Training結果將會存放於S3 bucket裡的weights資料夾(如下圖S3 browser所示)供外部存取。

Step 4. Terminate Container

從TWCC的container列表可刪除container。若S3 bucket內的檔案不需保留，可利用S3 tool刪除檔案。

Tutorial 4 -- InceptionV3 Inference

### Step 1. AI Inference

1-1 準備Inference程式

開啟cmd，使用SSH連線進入Container。
ssh -p <container_port> -L 5000:127.0.0.1:5001 <computer_account>@<container_ip>

指令的參數可點入container細節頁面查詢： ① container_port ② computer_account ③ container_ip
輸入以下指令，進入Tutorial_Three目錄。
cd AI-Services/Tutorial_Three
輸入以下指令，會開啟AI Inference 的服務。
bash V3_inference.sh
開啟瀏覽器，並輸入以下網址可開始使用AI Inference 的服務。
localhost:5000

看到以下畫面即可選擇要進行預測的圖片並上傳
預測的結果將顯示在瀏覽器上。

vscv/AI-Services

TWCC Container Service Tutorials

Preparation

Step 1. Sign up and Sign in

Step 2. Create Container

Step 3. Download S3 Tool (only for Tutorial 3)

Step 4. Clone Git

Tutorial 1 -- MNIST

Step 1. Start & Run Jupyter Notebook

Tutorial 2 -- GPU Burn Testing

Step 1. Run GPU Burn

Tutorial 3 -- InceptionV3 Training

Step 1. Download Cifar 10 datasets

Step 2. Upload the dataset to S3 Bucket with S3 tool

Step 3. AI Training

3-1 準備Training程式

Step 4. Terminate Container

Tutorial 4 -- InceptionV3 Inference

1-1 準備Inference程式