Training Data Format and Class Label for kmeans
Opened this issue · 1 comments
GoogleCodeExporter commented
Hi,
I have changed my training data into sparse data format you mentioned.
./sofia-kmeans --k 1000 --init_type random --opt_type batch_kmeans --iterations
1000 --objective_after_init --training_file demo/SMLFAutoTrain1s512val.txt
--model_out demo/CSMLFAutoTrain1s512val.txt
However, I am getting the following errors:
Reading data from: demo/SMLFAutoTrain1s512val.txt
Error reading file demo/SMLFAutoTrain1s512val.txt
I opened your demo.train, I saw that you have square box at the end of every
vector. How can I changed my data format to yours since the square box at the
end may not be the only one? I tried to fetch your demo.train file in matlab,
and it doesn't let me do that either.
For the example of kmeans:
> ./sofia-kmeans --k 5 --init_type random --opt_type mini_batch_kmeans
--mini_batch_size 100 --iterations 500 --objective_after_init
--objective_after_training --training_file demo/demo.train --model_out
demo/clusters.txt
the above command will return the five centroid location, right?
In this case, since only producing the 5 cluster center location, the class
label in the training data (demo.train) can be assigned with any values, right?
Of course, I chose, say, all 1 among these values: 1,0,-1.
I look forward to your clarification.
Thank you,
Fred
Original issue reported on code.google.com by fredro.h...@gmail.com
on 23 Sep 2011 at 3:56
Attachments:
GoogleCodeExporter commented
I have solved the training data by putting '\n' in every line of my training
data (SMLFAutoTrain1s512val.txt). But I found that a lot zeros in every lines
after my 78-dimensions in each vector in the output file
(CSMLFAutoTrain1s512val.txt). How can I run the kmeans program not having so
much zeros in every lines? What is the first field in every line of my output
data since they are all zeros? I assume that is the class label. Please correct
me if I am wrong here.
Original comment by fredro.h...@gmail.com
on 23 Sep 2011 at 4:48