- Python
2.7.x
$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold>
- The values given to
--datafile
and--datainfofile
can be either absolute or relative paths. - The value of
--threshold
should be between 0..1.
The resulting clusters will be printed to stdout
by default, but you can specify a path to a file to write results with the --outputfile
option:
$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold> --outputfile <path_to_outputfile>
As stated above, the script expects two files:
-
--datafile
- the data set, which must be aCSV
with each line of the form:<row_identifier>,<attr1_value>,<attr2_value>,...
Missing attribute values must be represented with the value
?
. -
--datainfofile
- metadata that describes the data set, which must be aCSV
with one line describing each attribute available in the data set:<attr_name>,<attr_type>,<attr_possible_values>,...
The supported values for
<attr_type>
arenominal
,ordinal
,binary_symmetric
,binary_asymmetric
, andnumeric
. The value for<attr_possible_values>
may be omitted.
Example data can be found in the example_data
folder, retrieved from http://archive.ics.uci.edu/ml/datasets/Sponge.