generalized-arrays

Outdated egg!

This is an egg for CHICKEN 4, the unsupported old release. You're almost certainly looking for the CHICKEN 5 version of this egg, if it exists.

If it does not exist, there may be equivalent functionality provided by another egg; have a look at the egg index. Otherwise, please consider porting this egg to the current version of CHICKEN.

Generalized Arrays and Storage Classes

Provides a multidimensional array data type + associated reader and writer syntax. Likewise provides a storage class API, which backs the item storage for array types.

This egg is an evolution and implementation of the Arrays Cowan API (see the relevant ArraysCowan.md file). I call it an evolution of the ArraysCowan API because through the course of implementing this API, some interfaces have changed to be simpler, and I have used CHICKEN-specific language extensions (such as #!optional, #!rest, etc) to make the code simpler where necessary. This egg is not specifically written to be compatible across many Scheme implementations, however can likely be made so with minimal effort. I am not opposed to any patches that could contribute towards R7RS compliance.

Why not SRFI-122?

One question that might arise given that this is a CHICKEN specific extension is "Why did you choose to not use SRFI-122?" The purpose of the SRFI process is to provide a useful set of functionality for common data-types and language extensions. The SRFI process does this by providing a reference implementation and documentation describing a standard set of functionality such as in the case of SRFI-122. A natural expectation might be that as long as a standard API and reference implementation is available, it is preferable to use that rather than fragment the community across different, non-portable extensions. In principle, I agree, however in this scenario I have specific objections to SRFI-122 that made it difficult to adopt for my intended use case.

My primary objection to SRFI-122 is with regards to the API. Firstly, there is a schism between what are intervals, and what are arrays. Both data types are provided by SRFI-122, and they represent different concepts. This schism can make it somewhat difficult to predict performance for array vs. interval operations. In addition to the schism caused by keeping intervals as a separate data type, the array objects themselves are lazy, in that they do not evaluate any transformations done to them until an array-ref or some equivalent is called on the array cell. There are times when this can be preferable (i.e. performing many operations on a single array, and only having to loop through it once), but in large I have avoided doing this here as the performance of accessing a single element or many elements is indeterminable in the general case. I do not disparage the SRFI-122 authors for their approach, however while implementing this egg I have made specific decisions that will make it easier to interoperate this array API with BLAS and LAPACK optimizations in the future, without requiring special interfaces and trying to reduce copies as much as can be made feasible.

On a slightly less relevant note, I found the documentation for SRFI-122 to be very math-heavy, and for beginners may be somewhat inscrutable. I likewise find working with the interfaces to be burdensome compared to the ArraysCowan API, which is in some ways a lot closer to Numpy or APL, for users from those languages. As an aside, if the SRFI committee wants to eventually turn the ArraysCowan API into an SRFI, they are welcome to leverage the work I have done here to come up with a separate SRFI.

This is incomplete compared to the ArraysCowan API!

This egg does not implement array-inner-product or array-outer-product. While I am open to any patch that might contribute such functionality, I chose to avoid implementing these here initially because I expect that many users would prefer BLAS-optimized array/matrix products, and I did not want to spend forever to implement and release this library. If you notice anything obvious that seems like it is missing, please contact me and I'd be happy to discuss if it can be added.

Installation

   $ chicken-install -s generalized-arrays

This installs both the generalized-arrays module and storage-classes module. This egg depends on the following eggs:

   check-errors
   matchable
   numbers
   srfi-133
   typed-records

Note that numbers is only included as a dependency for complex number support, but otherwise these eggs should not affect regular usage.

Usage

   (use generalized-arrays storage-classes)

Note that while you may choose to use the storage class API on its own, it may be very difficult to use the arrays API without defining any storage classes.

Terminology

NOTE: Much of this is copied verbatim from the ArraysCowan description, because this egg originated as an implementation of that API.

An array is an object of a disjoint type, a container with elements arranged according to a rectilinear coordinate system. An array can have any number of dimensions or axes, including zero; this number is called the rank of the array. Arrays of rank zero contain exactly one element. Note that "rank" is a Fortran, Common Lisp, and APL term that has nothing to do with matrix ranks in the sense of linear algebra.

Each axis has an extent represented by two exact integers, the first representing the smallest possible coordinate for that axis, and the second representing the largest possible coordinate plus one. Extent is also used, by a mild abuse of language, for the difference between the two values. The smallest coordinates are collected into a Scheme vector known as the lower bound of the array; the largest coordinates are collected into another Scheme vector known as the upper bound of the array. An index of the array is any Scheme vector of exact integers which has the same number of elements as the array's rank, and whose values lie between the lower bound (inclusive) and the upper bound (exclusive) of the corresponding axis. Note that in this implementation, the lower bound of an array will always be zero for every axis.

An array can be a general array, meaning each element can be any Scheme object, or it can be a specialized array, meaning that each element can only belong to a given restricted type. This is accomplished by separating array objects from the underlying storage objects, which can be Scheme vectors or numeric vectors or other objects. Any object which can map a non-negative exact integer into an appropriate value can serve as a storage object by writing a storage class for it. Note that it is an error if the extent of any axis is non-positive.

In order to map an array's index (a Scheme vector of exact integers) into a storage index (a single exact integer), each array maintains another associated vector of exact integers, the stride, as well as a single additional exact integer, the offset. Multiplying each element of the stride by the corresponding element of the array index, summing the results, and adding the offset produces the corresponding storage index. Therefore, the offset is the storage index of the element whose array index consists of all zeros.

Procedures that accept an array object and return a new array sharing the same storage object but with different upper bound, lower bound, stride, and/or offset are known as array transformations, and this egg provides a number of them. The egg also provides other procedures for operating on arrays, all of which have the property that they are meaningful no matter what the elements of the array are. So array-map can be used to sum two arrays, since that is done element-wise over the + operation; but there is no operation provided for ordinary matrix multiplication, because it depends on the array elements being solely numeric, which any general two arrays may not be.

In the same way that the names start and end are applied to optional numerical indexes that default to the smallest element of a sequence (list, vector, string, or whatever) and the largest element plus one, in this egg they default to the lower bound and the upper bound of an array.

In certain procedures, the elements of an array are processed in lexicographic order, also known as row-major order. This means the order in which the highest-numbered axis changes most rapidly, and the other axes change only when the following axis returns to its lowest value.

NOTE: Arrays with high rank (>4) may have very poor performance compared to 1-4D arrays, because of some rank-specific algorithms used. I would venture to say that not many will use 5D+ arrays however, which has been considered a trade-off in this implementation.

General Requirements

Where a procedure that is passed to any procedure defined in this SRFI receives an index as an argument, it is an error for that procedure to mutate the index.

NOTE: that array objects are immutable, but their storage objects are usually mutable. It is possible to create arrays that are prohibited from mutating their storage objects.

The procedures of this egg that are not transformations may return arrays whose stride is implementation-dependent (so that the order of elements in the storage object may be either row-major or column-major), but the offset is always 0.

API Description

Storage Classes

A storage class is a group of storage objects with the same behavior. A storage object maps a non-negative exact integer index into a storage location. There are standard storage classes, but it is also possible for programmers to create their own storage classes. Each storage class allows creating a storage object of a given size, accessing a location by its index, and mutating a location by its index to a new value. Note that the procedures used to do this need not be the standard procedures such as make-vector, vector-ref, and vector-set!; they may be more efficient equivalents. Storage classes allow arrays and similar objects to be polymorphic in the type of storage they use.

Constructors

make-storage-class short-id constructor accessor mutator sizer predicate default-fillprocedure

Returns a storage class with the specified short-id, constructor, accessor, mutator, sizer, predicate, and default fill argument. An example might be as follows:

   (define vector-storage-class (make-storage-class 'v
                                                    make-vector
                                                    vector-ref
                                                    vector-set!
                                                    vector-length
                                                    vector?
                                                    (void)))

Predicates

storage-class? objectprocedure: Returns #t iff the object passed into the procedure is a storage class. Otherwise, returns #f.

Accessors

storage-class-short-id storage-classprocedure: Returns the symbol representing the storage-class' short ID (e.g. u32vector-storage-class has a short ID of u32).

storage-class-constructor storage-classprocedure: Returns the constructor of storage-class. This procedure returns a storage object belonging to the storage class, and can be called with one or two arguments: the first is an exact non-negative integer specifying the size of the object. If objects of the class do not have a fixed size, the size must be specified as #f. The second is a value to fill all the elements with. If the second argument is omitted, the elements will have arbitrary contents.

storage-class-accessor storage-classprocedure: Returns the accessor of storage-class as a procedure. This procedure takes two arguments, a storage object and an exact non-negative integer, and returns the value of the element indexed by the integer. It is an error if the index is greater than or equal to the size.

storage-class-mutator storage-classprocedure: Returns the mutator of storage-class as a procedure. This procedure takes three arguments, a storage object, an exact non-negative integer, and a value. It mutates the element of the object specified by the index to be the value. It is an error if the index is greater than or equal to the size, or if the object is not capable of storing the value.

storage-class-sizer storage-classprocedure: Returns the sizer of storage-class as a procedure. This procedure takes one argument, a storage object. It returns the size of the object specified when the object was created. This may be an exact non-negative integer.

storage-class-predicate storage-classprocedure: Returns the predicate for testing a storage class' storage object. This procedure takes one argument, a storage object. It evaluates #t if a storage object is of the type described by the storage class, and #f otherwise.

storage-class-default-fill storage-classprocedure: Returns the default fill value of a storage-class as a value. This procedure takes one argument a storage object. It returns the default fill value for the storage object when it was created.

Invokers

make-storage-object storage-class n #!optional fillprocedure: Returns a newly allocated storage object with class storage-class and length n, filled with value fill, if specified. If fill is not specified, then the default fill value for the storage object is used.

storage-object-ref storage-class storage-object nprocedure: Returns the nth element of storage-object as seen through the lens of storage-class. It is an error if n is not less than the size of storage-object.

storage-object-set! storage-class storage-object n valueprocedure: Mutates the nth element of storage-object as seen through the lens of storage-class so that its value is value. It is an error if n is not less than the size of storage-object.

storage-object-length storage-class storage-objectprocedure: Returns the size of storage-object as seen through the lens of storage-class.

Standard storage classes

vector-storage-classconstant
u8vector-storage-classconstant
s8vector-storage-classconstant
u16vector-storage-classconstant
s16vector-storage-classconstant
u32vector-storage-classconstant
s32vector-storage-classconstant
f32vector-storage-classconstant
f64vector-storage-classconstant
c64vector-storage-classconstant
c128vector-storage-classconstant: Standard set of storage classes provided for convenience. Note that because CHICKEN does not provide u64 and s64 vectors in SRFI-4, these are not provided here. Further, 64 and 128-bit complex numbers are implemented as f32 and f64 vectors respectively. This is done for BLAS and LAPACK compatibility, as this is how they would need to be passed in for BLAS compatibility. Consideration has been taken so that these details are not leaked, however when using array-storage-object this is exposed.