more iteration than setting
wt12318 opened this issue · 6 comments
Hi,
When I set the num_iteration
is 50, the actual running iteration is more than 50:
config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 50
tuner = Tuner(HYPERPARAMETERS,
objective=run_one_training,
conf_dict=config)
results = tuner.minimize()
Hi,
Thanks for asking this question.
Internally, Mango will run a few random iterations to do a proper initialization
The number of these random iterations by default is 2.
You can modify this by the config parameter 'initial_random': 2
So, in most cases, your total iterations will be num_iteration + initial_random
However, this random parameter is a suggestion to the optimizer, and in some cases,
it may run more random iterations to do proper initialization. This happens for problems where the variation in the objective value is very little, and Mango may internally decide to more random iterations to make sure it finds good regions in the hyperparameter space. For most of the problems setting initial_random will make the iterations bounded as needed.
This may also happen in cases when some of the random iterations didn't succeed, and your objective function was able to consider their failures, due to which Mango ran more random iterations to make sure 2 random iterations succeeded.
Thank you
Hi,
When I set the initial_random is one, but it still run more iterations than I set. And the total number combination of my all parameter is 36, but it run more iterations than 36. Why this happened?
Thank you.
Can you share more details about your parameter space and the definition of your objective function?
Thank you for reply. This is my objective function and parameter space:
@scheduler.parallel(n_jobs=36)
def run_one_training(**params):
with mlflow.start_run() as run:
# Log parameters used in this experiment
for key in params.keys():
mlflow.log_param(key, params[key])
# Loading the dataset
print("Loading dataset...")
train_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="train_dt.csv",aaindex=aaindex, test=False, val=False)
test_dataset = TCRpMHCDataset(root="/public/slst/home/wutao2/TCR_neo/data/", filename="val_dt.csv", aaindex=aaindex, test=False, val=True)
# Prepare training
train_loader = DataLoader(train_dataset, batch_size=params["batch_size"], shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=params["batch_size"], shuffle=True)
# Loading the model
print("Loading model...")
model_params = {k: v for k, v in params.items() if k.startswith("model_")}
model = GNN(feature_size=train_dataset[0].x.shape[1], model_params=model_params)
model = model.to(device)
print(f"Number of parameters: {count_parameters(model)}")
mlflow.log_param("num_params", count_parameters(model))
# < 1 increases precision, > 1 recall
loss_fn = torch.nn.BCEWithLogitsLoss()##
optimizer = torch.optim.Adam(model.parameters(),
lr=params["learning_rate"],
weight_decay=0)
#scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma=params["scheduler_gamma"])
# Start training
best_loss = 1000
early_stopping_counter = 0
for epoch in range(20):
if early_stopping_counter <= 5: # = x * 5
# Training
model.train()
loss = train_one_epoch(epoch, model, train_loader, optimizer, loss_fn)
print(f"Epoch {epoch} | Train Loss {loss}")
mlflow.log_metric(key="Train loss", value=float(loss), step=epoch)
# Testing
model.eval()
if epoch % 1 == 0:
loss = test(epoch, model, test_loader, loss_fn)
print(f"Epoch {epoch} | Test Loss {loss}")
mlflow.log_metric(key="Test loss", value=float(loss), step=epoch)
# Update best loss
if float(loss) < best_loss:
best_loss = loss
# Save the currently best model
mlflow.pytorch.log_model(model, "model", signature=SIGNATURE)
early_stopping_counter = 0
else:
early_stopping_counter += 1
else:
print("Early stopping due to no improvement.")
return [best_loss]
print(f"Finishing training with best test loss: {best_loss}")
return [best_loss]
HYPERPARAMETERS = {
"batch_size": [32,64,128],
"learning_rate": [0.001,0.0001],
"model_embedding_size": [32,64,128],
"model_layers": [2,3],
"model_dropout_rate": [0.5]
}
torch.set_num_threads(36)
torch.manual_seed(2022060801)
print("Running hyperparameter search...")
config = dict()
config["optimizer"] = "Bayesian"
config["num_iteration"] = 36
config["initial_random"] = 1
tuner = Tuner(HYPERPARAMETERS,
run_one_training,
config)
results = tuner.minimize()
Hi,
Thanks for providing the details. I am a little busy due to an immediate deadline for the last few days.
I will work on reproducing this issue next week and will update you with a solution or more information.