Outdated egg!
This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.
If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.
Dataset Utilities
A set of routines to load and manage datasets for machine learning / data mining tasks.
A dataset is a table:
Outlook | Temperature | Humidity | Windy | Plays |
---|---|---|---|---|
sunny | hot | high | false | no |
sunny | hot | high | true | no |
Each column in the table is an attribute, and each row is an instance. Instances have values for each attribute. The whole table is called a relation, and can be given a name.
Exported Procedures
Creating datasets
- make-nominal-attribute name value-1 ...procedure
Creates a nominal attribute with given values, e.g.:
> (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy)
- make-numeric-attribute nameprocedure
Creates a numeric attribute, e.g.:
> (make-numeric-attribute 'temperature)
- make-relation name attributes dataprocedure
Creates a relation with given name. The attributes must be a list of attribute instances, and the data are a list of lists: each sublist representing an instance, and giving the value for that instance of every attribute.
> (make-relation 'plays-tennis (list (make-nominal-attribute 'outlook 'sunny 'overcast 'rainy) (make-nominal-attribute 'temperature 'hot 'mild 'cool) (make-nominal-attribute 'humidity 'high 'normal) (make-nominal-attribute 'windy 'true 'false) (make-nominal-attribute 'plays 'yes 'no)) '((sunny hot high false no) (sunny hot high true no) (overcast hot high false yes) ... (rainy mild high true no)))
Managing datasets
- attribute-name attributeprocedure
Returns the name of given attribute.
- attribute-definition attributeprocedure
Returns a definition of the type of given attribute. This definition will be one of:
- '(numeric) for numeric attributes
- '(nominal value-1 ...) for nominal attributes, listing the possible values
- class-probability relation attribute-name valueprocedure
Returns the proportion of instances with the given attribute value.
- entropy relation attribute-nameprocedure
Computes entropy of given relation, using attribute-name to divide the relation into groups. attribute-name should be a nominal attribute.
- filter-instances relation attribute-name valueprocedure
Returns a new relation containing those instances of relation which have the given value for attribute-name.
- find-attribute-index relation attribute-nameprocedure
Returns the index number of given attribute name in relation.
- get-attribute-values relation attribute-nameprocedure
Returns the values taken by instances in relation for given attribute name.
- information-gain relation target-class attribute-nameprocedure
Computes the information gain from using the given attribute-name to split the data in relation over the entropy of the data as they are; target-class is used to compute the entropy.
- relation-attributes relationprocedure
Returns a list of attributes for given relation.
- relation-data relationprocedure
Returns a list of the instances in the given relation.
- relation-name relationprocedure
Returns the name of given relation.
- split-instances relation attribute-nameprocedure
Given a nominal attribute, returns a list of relations, each representing instances in relation with the same value for given attribute-name.
Metrics
- euclidean-distance instance-1 instance-2procedure
Computes the euclidean distance between the two instances.
Importing Data
- read-arff filenameprocedure
Reads an ARFF definition from given filename, and returns a relation. Currently supports nominal and numeric attribute types, and not sparse files.
Author
License
GPL version 3.0.
Version History
in trunk.