ValueError

Question

ValueError

Opened this issue a year ago · 1 comments

When I run the flowing code:
payload_features = payload_features.view(np.float64).reshape(payload_features.shape + (-1,))
An error occurs:
ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype

Can you tell me how to solve this problem？

Answer 1 · 2023-06-04T12:10:01.000Z

Hi,
I tried but I am not able to replicate your error.
Unfortunately, I'm having a hard time finding the time to maintain the repository, so I decided to archive it.
Here are some tips on how to run the code: in order to run "preprocess-data_svm.py" you need to run "create_datasets.sh" (that one will trigger "preprocess-data_svm.py"). To do it for LSTM and RF, just change the name of the file inside "create_datasets.sh".

You can also find the data already preprocessed for SVM, RF and LSTM (ready to be used for those models) under the folder results. There you can find all the datasets for SVM, Random Forest and LSTM for all the different missing values strategies* and for a splitting strategy of 60% training set, 20% validation set and 20% testing set.

Missing values strategies:*
1- Clustering - Gaussian Mixture Model (GMM)
2- Clustering - K-means
3- Zeros imputation & indicators technique
4- Keeping the closest preceding feature value that is existing or is non-missed

To run the code I leave here an example. Let's say you want to run the random_forest_hyperparameters.py program with the binary dataset (1 for malicious packets and 0 for benign) and with the missing values strategy "keep" (take the value of the previous observation in the dataset), mean and std deviation normalization strategy and the number of iterations to search for the hyperparameters set to 100. Then you just need to call on the command line:
python src/random_forest_hyperparameters.py -d results/processed_data_RF/time-series-datasets/binary-ts-mean-keep -i 100

src/random_forest_hyperparameters.py is where the program is located and in results/processed_data_RF/time-series-datasets/binary-ts-mean-keep where the dataset is located (you might adapt it depending on where you host the code and datasets files). Do not forget to change the output directory as well (otherwise the output of the program will be saved where the dataset is by default).
Reminder: the datasets for Long Short Term Memory, SVM and Random Forests are all in the folder results (I used the dataset split configuration: 60% training set, 20% validation set and 20% test set).

This is the link to my paper, which could be of help too --> https://eudl.eu/pdf/10.4108/eai.25-1-2019.159348

Best regards,
Rocio