ThilinaRajapakse/simpletransformers

The training process cannot continue

xgySTATISICT opened this issue ยท 7 comments

I tried to train, but the logs stopped updating at this step, even after 12 hours.
image

@xgySTATISICT Can you post your configurations used to train the model?

I also encountered the same problem, and I tried both CPU and GPU, but couldn't continue. Here is my configuration.

model = ClassificationModel(Model1, Model2,                                   
                                    args={'num_train_epochs':1,
                                          'overwrite_output_dir': True,
                                          'use_early_stopping':False,
                                          'use_cuda':False,
                                          'train_batch_size':50,
                                          'do_lower_case':True, 
                                          'silent':False,
                                          'no_cache':True, 
                                          'no_save':True
                                          }
                                    )

    # Train the Model
    model.train_model(train_df)

@songzetao I have encountered similar problem and I tried the following workaround. You may try too. Add the following to your configurations. Basically we are turning off multiprocessing.

use_multiprocessing = False
use_multiprocessing_for_evaluation = False

@DamithDR Thank you very much for your answer. It really worked. Thank you again!๐Ÿ˜Š

@songzetao Glad it worked :)

I encounter the same problem. I have tried to add several fixes from others, as below.

args.use_multiprocessing = False, args.use_multiprocessing_for_evaluation = False args.process_count = 1

os.environ["TOKENIZERS_PARALLELISM"] = "false"

But still, the training stuck at: Converting to features started. Cache is not used.

@swardiantara Can you post any logs you get and may be a screenshot where you got stuck?