Worked 1 times then stopped

Question

Worked 1 times then stopped

Closed this issue a year ago · 3 comments

So i tried the Lora colab to make one, it worked but results werent good (as i was expecting) when doing a new one following steps from a friend, i end up with this :

No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Loading settings from /content/drive/MyDrive/lora_training/config/CinderFall/training_config.toml...
/content/drive/MyDrive/lora_training/config/CinderFall/training_config
prepare tokenizer
Downloading (…)olve/main/vocab.json: 100% 961k/961k [00:00<00:00, 5.46MB/s]
Downloading (…)olve/main/merges.txt: 100% 525k/525k [00:00<00:00, 6.65MB/s]
Downloading (…)cial_tokens_map.json: 100% 389/389 [00:00<00:00, 80.5kB/s]
Downloading (…)okenizer_config.json: 100% 905/905 [00:00<00:00, 228kB/s]
update token length: 225
Load dataset config from /content/drive/MyDrive/lora_training/config/CinderFall/dataset_config.toml
prepare images.
found directory /content/drive/MyDrive/lora_training/datasets/CinderFall contains 95 image files
380 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 2
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False

[Subset 0 of Dataset 0]
image_dir: "/content/drive/MyDrive/lora_training/datasets/CinderFall"
image_count: 95
num_repeats: 4
shuffle_caption: True
keep_tokens: 1
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: None
caption_extension: .txt

[Dataset 0]
loading image sizes.
100% 95/95 [00:00<00:00, 596.50it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (320, 768), count: 8
bucket 1: resolution (384, 640), count: 120
bucket 2: resolution (448, 576), count: 164
bucket 3: resolution (512, 512), count: 36
bucket 4: resolution (576, 448), count: 24
bucket 5: resolution (640, 384), count: 28
mean ar error (without repeats): 0.05451873417765707
prepare accelerator
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/kohya-trainer/train_network.py:760 in │
│ │
│ 757 │ args = parser.parse_args() │
│ 758 │ args = train_util.read_config_from_file(args, parser) │
│ 759 │ │
│ ❱ 760 │ train(args) │
│ 761 │
│ │
│ /content/kohya-trainer/train_network.py:140 in train │
│ │
│ 137 │ │
│ 138 │ # acceleratorを準備する │
│ 139 │ print("prepare accelerator") │
│ ❱ 140 │ accelerator, unwrap_model = train_util.prepare_accelerator(args) │
│ 141 │ is_main_process = accelerator.is_main_process │
│ 142 │ │
│ 143 │ # mixed precisionに対応した型を用意しておき適宜castする │
│ │
│ /content/kohya-trainer/library/train_util.py:2693 in prepare_accelerator │
│ │
│ 2690 │ │ log_prefix = "" if args.log_prefix is None else args.log_pref │
│ 2691 │ │ logging_dir = args.logging_dir + "/" + log_prefix + time.strf │
│ 2692 │ │
│ ❱ 2693 │ accelerator = Accelerator( │
│ 2694 │ │ gradient_accumulation_steps=args.gradient_accumulation_steps, │
│ 2695 │ │ mixed_precision=args.mixed_precision, │
│ 2696 │ │ log_with=log_with, │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/accelerator.py:355 in │
│ init │
│ │
│ 352 │ │ if self.state.mixed_precision == "fp16" and self.distributed_ │
│ 353 │ │ │ self.native_amp = True │
│ 354 │ │ │ if not torch.cuda.is_available() and not parse_flag_from_ │
│ ❱ 355 │ │ │ │ raise ValueError(err.format(mode="fp16", requirement= │
│ 356 │ │ │ kwargs = self.scaler_handler.to_kwargs() if self.scaler_h │
│ 357 │ │ │ if self.distributed_type == DistributedType.FSDP: │
│ 358 │ │ │ │ from torch.distributed.fsdp.sharded_grad_scaler impor │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: fp16 mixed precision requires a GPU
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /usr/local/bin/accelerate:8 in │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/accelerate_cli.py │
│ :45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:1104 in │
│ launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == Com │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /usr/local/lib/python3.9/dist-packages/accelerate/commands/launch.py:567 in │
│ simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.return │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'train_network.py',
'--dataset_config=/content/drive/MyDrive/lora_training/config/CinderFall/dataset
_config.toml',
'--config_file=/content/drive/MyDrive/lora_training/config/CinderFall/training_c
onfig.toml']' returned non-zero exit status 1.

I tried doing the same as i did when it worked the first time without success, im lost and dont know what to do now

Answer 1 · 2023-04-20T05:55:23.000Z

Hello, it seems you're running the colab in CPU mode, this may happen if you run out of free GPU time. It means you won't be able to train until for a few hours (up to a couple days probably) unless you buy google's credits.

To make sure you just didn't disable GPU accidentally, on the top left click Edit -> Notebook Settings and check there.

Answer 2 · 2023-04-20T06:08:07.000Z

Hey thank you for the quick reply! I will look on this a little later, how does GPU time works? It comes randomly?

Answer 3 · 2023-04-25T20:48:51.000Z

Yes, gpu time varies a lot by usage and demand.