chickadee » dataframe

dataframe

Tabular data structure implementation for data analysis.

Documentation

The dataframe library provides an interface for representing numerical data in tables with rows and columns. It is inspired by the various dataframe implementations found in R, Python and Racket.

The dataframe library also provides functions for loading and saving data from data frames as well as routines for descriptive statistics and linear regression.

Columns

Each dataframe consists of a collection of columns, which in turn is an object consisting of a unique key, data collection, and an associative list of properties. The following operations are defined on columns.

column? objprocedure

Returns true if the given object is a column.

get-column-properties columnprocedure

Returns an associative list with column properties.

get-column-key columnprocedure

Returns the key of the column.

get-column-collection columnprocedure

Returns the data collection of the column.

column-deserialize column portprocedure

Loads the data collection of a column from the given port.

column-serialize column portprocedure

Stores the data collection of a column to the given port in an s-expression format.

Creating data frames

(make-data-frame [column-key-compare: compare-symbol])procedure

Creates a new dataframe, with optional argument a procedure that specifies how to compare column keys. Default is comparison on symbols. Returns the new dataframe.

df-insert-column df key collection propertiesprocedure

Inserts a new column with the given key, data collection, and properties. Returns a new dataframe with the inserted column.

df-insert-derived df parent-key key proc propertiesprocedure

Inserts a derived column, that is a column whose data elements are obtained by mapping a procedure onto the elements of an existing (parent) column. Returns a new dataframe with the inserted column.

df-insert-columns df lseqprocedure

Inserts the columns contained in the given lseq of column objects.

(df-from-rows column-keys source [column-key-compare: compare-symbol])procedure

Creates a data frame with the given column keys and populates it with data from the row generator SOURCE.

Accessing data frames

show dfprocedure

Displays a subset of the rows and columns contained in the dataframe.

row-count dfprocedure

Returns the number of rows in the dataframe.

df-column df keyprocedure

Returns the column indicated by the given key.

df-columns dfprocedure

Returns a lazy sequence containing the columns of the dataframe.

df-filter-columns df procprocedure

Returns a filtered lseq of the columns of the dataframe according to the given filter predicate procedure.

df-select-columns df keysprocedure

Returns an lseq of the columns of the dataframe that have the keys enumerated in the given list of keys.

df-keys dfprocedure

Returns the keys of all columns in the dataframe.

df-items dfprocedure

Returns an lseq of the key-column pairs contained in the dataframe.

apply-collections proc df key ...procedure

Applies the given procedure to the data collections of the named columns of the dataframe and returns the result as a list.

apply-columns proc df key ...procedure

Applies the given procedure to the named columns of the dataframe and returns the result as a list.

map-collections proc df key ...procedure

Applies the given procedure to the data collections of the named columns of the dataframe and returns the result as a dataframe.

map-columns proc df key ...procedure

Applies the given procedure to the named columns of the dataframe and returns the result as a dataframe.

reduce-collections proc df seed key ...procedure

Fold over the data collections of the named columns.

Iterators

df-for-each-column df procprocedure

Applies proc to each column.

df-for-each-collection df procprocedure

Applies proc to the data collection of each column.

df-gen-rows dfprocedure

Returns a generator procedure that returns the dataframe rows in succession.

df-gen-columns dfprocedure

Returns a generator procedure the returns the dataframe columns in succession.

Descriptive statistics

describe df portprocedure

Displays a table with the min/max/mean/sdev of each column in the dataframe.

cmin dfprocedure

Computes the minimum value of each column.

cmax dfprocedure

Computes the maximum value of each column.

mean dfprocedure

Computes the mean value of each column.

median dfprocedure

Computes the median value of each column.

mode dfprocedure

Computes the mode value of each column.

range dfprocedure

Computes the difference between maximum and minimum value of each column.

percentile dfprocedure

Computes the percentile values of each column.

variance dfprocedure

Computes the variance of each column.

standard-deviation dfprocedure

Computes the standard deviation of each column.

coefficient-of-variation dfprocedure

Computes the coefficient of variation of each column.

Regression and correlation

linear-regression df x yprocedure

Linear regression between columns x and y.

correlation-coefficient df x yprocedure

Correlation coefficient between columns x and y.

I/O

df-serialize df portprocedure

Stores the dataframe in an s-expression format to the given port.

df-deserialize df portprocedure

Loads the data collections of the dataframe columns from the given port.

Examples

(import scheme yasos dataframe dataframe-statistics)

(define df (make-data-frame))

(define df1
  (df-insert-column 
   df
   'base
   (list-tabulate 100 (lambda (x) (- x 10)))
   '()))


;;  exponential series
(define df2
  (df-insert-derived
   df1 'base 'exp
   (lambda (x) (* 2.0 (exp (* 0.1 x))))
   '()
   ))

(show df2 #f)
(describe df2 #f)

(linear-regression df2 'base 'exp)

About this egg

Author

Ivan Raikov

Repository

https://github.com/iraikov/chicken-dataframe

Version history

0.1
Initial release

License

Copyright 2019 Ivan Raikov.

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or (at
your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details.

A full copy of the GPL license can be found at
<http://www.gnu.org/licenses/>.

Contents »