SVM: evaluate on same train set

Question

SVM: evaluate on same train set

ai2010 opened this issue 5 years ago · 2 comments

ai2010 commented 5 years ago

Hello,

I am using the HDFS strucuted file you have : data/HDFS/HDFS_100k.log_structured.csv then I split train test and train with SVM:

(x_train, y_train), (x_test, y_test) = dataloader.load_HDFS(struct_log,
label_file=label_file,
window='session',
train_ratio=0.5,
split_type='uniform')

feature_extractor = preprocessing.FeatureExtractor()
x_train = feature_extractor.fit_transform(x_train, term_weighting='tf-idf')
model = SVM()
model.fit(x_train,y_train)

Now to check everything is right, I test in the same xtrain dataset:
precision, recall, f1 = model.evaluate(x_train, y_train)

However the metrics are:
'Precision: 1.000, recall: 0.365, F1-measure: 0.535' I would expect almost perfect metrics since I am predicting on the trained set. Do you know what the issue is?

Answer 1 · 2019-07-31T06:48:50.000Z

Hi. I also encountered this problem a few days ago. It's due to the hdfs_100k data. If you use the full hdfs data set, the performance will be similar to the paper.

Answer 2 · 2019-08-02T05:09:03.000Z

Yes, the benchmarking is made on the full data. Code is available at benchmarks/, not demo/