chickadee » classifiers

Classifiers

Some algorithms to develop classification models of data. This library uses dataset-utils; another classification algorithm is provided in libsvm.

Exported Procedures

The classifiers are constructed using a (build-X training-data target-class) procedure. Training data are a relation, as defined in dataset-utils, and target-class is the name of the attribute to be used for the target classification.

Generic procedures

(classify-instance classifier instance) procedure

Given a classifier and a data instance, returns a classification.

(to-string classifier) procedure

Given a classifier, returns a string representation of the classifier model.

ZeroR

A simple rule: always predicts the majority class of the training data.

(build-zero-r training-data target-class) procedure

Constructs an instance of a ZeroR classifier. Training data are a relation, as defined in dataset-utils, and target-class is the name of the attribute to be used for the target classification.

OneR

Finds the attribute which best predicts the training data.

(build-one-r training-data target-class) procedure

Constructs an instance of a OneR classifier. Training data are a relation, as defined in dataset-utils, and target-class is the name of the attribute to be used for the target classification.

ID3

(build-id3 training-data target-class) procedure

Constructs an instance of a decision-tree classifier using the ID3 algorithm. Training data are a relation, as defined in dataset-utils, and target-class is the name of the attribute to be used for the target classification. All attributes must be nominal.

Instance-based

(build-one-nn training-data target-class #!key metric) procedure

Constructs a simple instance-based classifier, which returns the class of nearest neighbour for any test instance. Training data are a relation, as defined in dataset-utils, and target-class is the name of the attribute to be used for the target classification. metric is a function taking two instance definitions and returning a real number reflecting the distance between the instances; see dataset-utils for some standard metrics. metric defaults to euclidean-distance.

Example

#;1> (use dataset-utils)
#;2> (use classifiers)
#;3> (use format)
#;4> (define weather 
           (make-relation 'plays-tennis
                 (list (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy)
                       (make-nominal-attribute 'temperature 'hot 'mild 'cool)
                       (make-nominal-attribute 'humidity 'high 'normal)
                       (make-nominal-attribute 'windy 'true 'false)
                       (make-nominal-attribute 'play 'yes 'no))
                 '((sunny hot high false no)
                   (sunny hot high true no)
                   (overcast hot high false yes)
                   (rainy mild high false yes)
                   (rainy cool normal false yes)
                   (rainy cool normal true no)
                   (overcast cool normal true yes)
                   (sunny mild high false no)
                   (sunny cool normal false yes)
                   (rainy mild normal false yes)
                   (sunny mild normal true yes)
                   (overcast mild high true yes)
                   (overcast hot normal false yes)
                   (rainy mild high true no)))) 
#;5> (define one-r (build-one-r weather 'play))
#;6> (format #t "~a~&" (to-string one-r))
OneR: uses attribute outlook
  : sunny => no
  : rainy => yes
  : overcast => yes

#t
#;7> (define id3 (build-id3 weather 'play))
#;8> (format #t "~a~&" (to-string id3))
ID3
Node: outlook
 value rainy
  Node: windy
   value false
    Leaf: yes
   value true
    Leaf: no
 value overcast
  Leaf: yes
 value sunny
  Node: humidity
   value normal
    Leaf: yes
   value high
    Leaf: no
#;9> (for-each (lambda (item) 
                    (format #t "~a Prediction is: ~a~&"
                              item
                              (classify-instance id3 item)))
         (relation-data weather))
(sunny hot high false no) Prediction is: no
(sunny hot high true no) Prediction is: no
(overcast hot high false yes) Prediction is: yes
(rainy mild high false yes) Prediction is: yes
(rainy cool normal false yes) Prediction is: yes
(rainy cool normal true no) Prediction is: no
(overcast cool normal true yes) Prediction is: yes
(sunny mild high false no) Prediction is: no
(sunny cool normal false yes) Prediction is: yes
(rainy mild normal false yes) Prediction is: yes
(sunny mild normal true yes) Prediction is: yes
(overcast mild high true yes) Prediction is: yes
(overcast hot normal false yes) Prediction is: yes
(rainy mild high true no) Prediction is: no

Author

Peter Lane.

License

GPL version 3.0.

Requirements

Works with dataset-utils.

Version History

in trunk.

Contents »