bazingagin/npc_gzip

Accuracy issue

lx-zg opened this issue · 4 comments

lx-zg commented

image
I tested Filipino and AG_NEWS datasets, but I couldn’t achieve the accuracy mentioned in your paper. I’m not sure where I went wrong.

lx-zg commented

I would like to add some additional information about the experimental results. I found that the performance is best when k is set to 2. However, during the 9 tests conducted with k=2, the accuracy was not stable and did not consistently reach above 80%. In general, the accuracy rate is 73%.

kts commented

What command did you run? Use --all_train and --all_test

lx-zg commented

Thank you very much for your response!
When using the "all_train" parameter as True, there was an unclear error message. So I extracted your core function, and the specific code is as follows. The result was also generated from this code. (I assume this is the result when "all_train = False" and "all_test = False".)

like this:

# Get the data first
num_test = 100
test_idx_fn = None
num_train = 100
train_idx_fn = "./save"
compressor = "gzip"
k = 5
para = False

# Print out the dataset pair, number of test samples, and test index file name
print("dataset_pair:", dataset_pair[1], "args.num_test:", num_test, "args.test_idx_fn:",
      test_idx_fn)

# Get the training data and labels by selecting a certain number of samples from each class in the dataset
train_data, train_labels = pick_n_sample_from_each_class_given_dataset(dataset_pair[0], num_train,
                                                                               train_idx_fn)

# Get the test data and labels by selecting a certain number of samples from each class in the dataset
test_data, test_labels = pick_n_sample_from_each_class_given_dataset(dataset_pair[1], num_test, test_idx_fn)

# Run the k-NN experiment without using neural networks
non_neural_knn_exp(compressor, test_data, test_labels, train_data, train_labels, agg_by_concat_space, NCD, k, para=para)

# Set the range of k values from 1 to 9
k_values = range(1, 10)
accuracies = []  # Store the accuracies for different k values

# Iterate through each k value
for k in k_values:
    # Run the k-NN experiment without using neural networks and get the predictions and correctness
    pred, correct = non_neural_knn_exp(compressor, test_data, test_labels, train_data, train_labels, agg_by_concat_space, NCD, k, para=para)
    # Calculate the accuracy by dividing the number of correct predictions by the total number of predictions
    accuracy = sum(correct) / len(correct)
    accuracies.append(accuracy)
    print("Accuracy:", accuracy)

# Plot the accuracy vs. k graph
plt.plot(k_values, accuracies)
plt.xlabel('k')
plt.ylabel('Accuracy')
plt.title('Accuracy vs. k')
plt.xlim(1, 9)
plt.ylim(0, 1)
plt.show()```

Hi @lx-zg, as @kts pointed out, the issue is that you are running on 100-shot settings instead of on the whole training set.

What's your error msg when setting all_train=True? I don't think your extracted code is for running on the whole training set.