ThilinaRajapakse/simpletransformers

Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

MuvvaThriveni opened this issue · 5 comments

I am facing an issue when i am trying to build multiclass classification model
here is my code from starting

import pandas as pd
data=pd.read_csv('/content/Normalized_Data_PBLD.csv')

y=data['label'].tolist()
X_train, X_test, y_train, y_test = train_test_split(data['comment'].tolist(), y, random_state=5, test_size=0.2) #train, test split
#validation split
from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight(class_weight="balanced",
classes=np.unique(y_train),
y=y_train)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, random_state=5, test_size=0.1)

list_of_class={'NEG':0,'NTL':1,'POS':2}
y_val=[list_of_class[i.strip()]for i in y_val]
y_train=[list_of_class[i.strip()]for i in y_train]
y_test=[list_of_class[i.strip()]for i in y_test]

d1 = {'comment': X_train, 'label': y_train}
df_train = pd.DataFrame(d1)

d2 = {'comment': X_val, 'label': y_val}
df_val = pd.DataFrame(d2)

d3 = {'comment': X_test, 'label': y_test}
df_test = pd.DataFrame(d3)

calling bert model

model = ClassificationModel('bert', 'bert-base-multilingual-cased', num_labels=3, args={'learning_rate':1e-5, 'num_train_epochs': 2, 'reprocess_input_data': True, 'overwrite_output_dir': True})

model.train_model(df_train)

result, model_outputs, wrong_predictions = model.eval_model(df_val)
when running this line facing below error

ERROR:
ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

n tried another way
from sklearn.metrics import f1_score, accuracy_score

def f1_multiclass(labels, preds):
return f1_score(labels, preds, average='weighted')

result, model_outputs, wrong_predictions = model.eval_model(df_val, f1=f1_multiclass, acc=accuracy_score)

even though same error

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

anyone please help to solve

I am also currently encountering a similar ValueError [1] while evaluating the model for multi-class classification using the DistilBERT model.

As a workaround, I have attempted the direct computation of the F1 score outside of the eval_model method.

[1] Error

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

[2] Workaround (tried the direct computation of the F1 score outside of the eval_model method)

from sklearn.metrics import f1_score
import numpy as np

# This part depends on how your model outputs predictions
predictions, raw_outputs = model.predict(valid_df['text'].tolist())

# This assumes `valid_df['labels']` contains the true class labels for each sample
true_labels = valid_df['labels'].values

# Calculate F1 Score
f1 = f1_score(true_labels, predictions, average='weighted')
print(f"Weighted F1 Score: {f1}")

It would be greatly appreciated if someone could provide input regarding this issue.

I'm downgrading to 0.64.3