l294265421/alpaca-rlhf

how to run it, need more details

Opened this issue · 2 comments

and how to install alpaca-rlhf

and how to install alpaca-rlhf

  1. download this repo
  2. Enter ./alpaca_rlhf directory
  3. Run the step1, step2 and step3 commands in the Stey by Step section of README

(gh_alpaca-rlhf) amd00@asus00:/llm_dev/alpaca-rlhf$
(gh_alpaca-rlhf) amd00@asus00:
/llm_dev/alpaca-rlhf$ sh run.sh --num_gpus 1 ./alpaca_rlhf/deepspeed_chat/training/step1_supervised_finetuning/main.py --sft_only_data_path MultiTurnAlpaca --data_output_path ./rlhf-tmp/ --model_name_or_path /hf_model/llama-7b-hf --per_device_train_batch_size 2 --per_device_eval_batch_size 2 --max_seq_len 128 --learning_rate 3e-4 --num_train_epochs 1 --gradient_accumulation_steps 8 --num_warmup_steps 100 --output_dir ./rlhf/actor --lora_dim 8 --lora_module_name q_proj,k_proj --only_optimize_lora --deepspeed --zero_stage 2
start 20230602162350--------------------------------------------------
[2023-06-02 16:23:51,869] [WARNING] [runner.py:191:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/amd00/anaconda3/envs/gh_alpaca-rlhf/bin/deepspeed:6 in │
│ │
│ 3 from deepspeed.launcher.runner import main │
│ 4 │
│ 5 if name == 'main': │
│ ❱ 6 │ main() │
│ 7 │
│ │
│ /home/amd00/anaconda3/envs/gh_alpaca-rlhf/lib/python3.8/site-packages/deepspeed/launcher/runner. │
│ py:407 in main │
│ │
│ 404 │ │ resource_pool = {} │
│ 405 │ │ device_count = get_accelerator().device_count() │
│ 406 │ │ if device_count == 0: │
│ ❱ 407 │ │ │ raise RuntimeError("Unable to proceed, no GPU resources available") │
│ 408 │ │ resource_pool['localhost'] = device_count │
│ 409 │ │ args.master_addr = "127.0.0.1" │
│ 410 │ │ multi_node_exec = False │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Unable to proceed, no GPU resources available
20230602162352
(gh_alpaca-rlhf) amd00@asus00:
/llm_dev/alpaca-rlhf$ nvidia-smi
Fri Jun 2 16:24:04 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 0% 45C P8 18W / 350W | 768MiB / 24576MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1085 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 1967 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 259783 C ...Speed-Chat/bin/python3.10 755MiB |
+-----------------------------------------------------------------------------+
(gh_alpaca-rlhf) amd00@asus00:~/llm_dev/alpaca-rlhf$

I got one 3090 and I changed gpu_nums to 1