The training process cannot continue

Question

The training process cannot continue

xgySTATISICT opened this issue a year ago · 7 comments

xgySTATISICT commented a year ago

I tried to train, but the logs stopped updating at this step, even after 12 hours.

Answer 1 · 2023-07-04T13:22:46.000Z

@xgySTATISICT Can you post your configurations used to train the model?

Answer 2 · 2023-07-28T09:51:57.000Z

I also encountered the same problem, and I tried both CPU and GPU, but couldn't continue. Here is my configuration.

model = ClassificationModel(Model1, Model2,                                   
                                    args={'num_train_epochs':1,
                                          'overwrite_output_dir': True,
                                          'use_early_stopping':False,
                                          'use_cuda':False,
                                          'train_batch_size':50,
                                          'do_lower_case':True, 
                                          'silent':False,
                                          'no_cache':True, 
                                          'no_save':True
                                          }
                                    )

    # Train the Model
    model.train_model(train_df)

Answer 3 · 2023-07-28T10:30:12.000Z

@songzetao I have encountered similar problem and I tried the following workaround. You may try too. Add the following to your configurations. Basically we are turning off multiprocessing.

use_multiprocessing = False
use_multiprocessing_for_evaluation = False

Answer 4 · 2023-07-28T10:40:12.000Z

@DamithDR Thank you very much for your answer. It really worked. Thank you again!😊

Answer 5 · 2023-07-28T10:42:21.000Z

@songzetao Glad it worked :)

Answer 6 · 2023-08-18T14:51:43.000Z

I encounter the same problem. I have tried to add several fixes from others, as below.

args.use_multiprocessing = False, args.use_multiprocessing_for_evaluation = False args.process_count = 1

os.environ["TOKENIZERS_PARALLELISM"] = "false"

But still, the training stuck at: Converting to features started. Cache is not used.

Answer 7 · 2023-08-18T15:26:04.000Z

@swardiantara Can you post any logs you get and may be a screenshot where you got stuck?