- Python
2.7.x
$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold>
- The values given to
--datafileand--datainfofilecan be either absolute or relative paths. - The value of
--thresholdshould be between 0..1.
The resulting clusters will be printed to stdout by default, but you can specify a path to a file to write results with the --outputfile option:
$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold> --outputfile <path_to_outputfile>
As stated above, the script expects two files:
-
--datafile- the data set, which must be aCSVwith each line of the form:<row_identifier>,<attr1_value>,<attr2_value>,...
Missing attribute values must be represented with the value
?. -
--datainfofile- metadata that describes the data set, which must be aCSVwith one line describing each attribute available in the data set:<attr_name>,<attr_type>,<attr_possible_values>,...
The supported values for
<attr_type>arenominal,ordinal,binary_symmetric,binary_asymmetric, andnumeric. The value for<attr_possible_values>may be omitted.
Example data can be found in the example_data folder, retrieved from http://archive.ics.uci.edu/ml/datasets/Sponge.