This SRFI describes the API for a full-featured sort toolkit.
For more information see: SRFI-132: Sort Libraries
This SRFI is based on SRFI 32 by Olin Shivers, which was withdrawn twelve years ago.
There are two backward incompatible changes to the API. The most important one is that the comparison predicate now precedes the data rather than following it. Most of the SRFI 32 commenters thought that this is the better order, despite the then-widespread precedent. The other change is that the algorithm-specific procedures have been removed from the SRFI, though not necessarily from the sample implementation. For documentation on them, see SRFI 32 or the source code.
Editorially, the text has been reorganized, removing the massive redundancies. As in R5RS and R7RS, procedures that return nothing of interest, and in SRFI 32 returned zero or more unspecified values, are now specified to return one otherwise unspecified value. The multiple packages, most with only a few procedures, into which SRFI 32 is divided have been consolidated. References to SRFI 95 and R6RS equivalents have been added.
In SRFI 32 there were two different interfaces: "what" (simple) and "how" (detailed).
- Simple: you specified semantics: datatype (list or vector), mutability, and stability.
- Detailed: you specified the datatype and the actual algorithm (quick, heap, insert, merge).
However, the difficulty with exposing specific algorithms by name is that the local optima in the search space of algorithms changes over time. For example, the "qsort" in the musl C library is actually smooth sort, not quick sort; Python and Java have switched from quick sort to timsort; some implementations of the C++ STL use introsort rather than quick sort. Having procedure names that imply e.g. quick sort, that do not actually implement quick sort, is confusing.
Therefore, only the "what" interface has been retained. The sample implementation uses some of the original SRFI 32 algorithms, but this is not a requirement of this SRFI.
Procedures share common signatures wherever possible, to facilitate switching a given call from one procedure to another.
These procedures uniformly observe the following parameter order: the ordering, equality, or comparison argument precedes the argument(s) denoting the data to be sorted. In SRFI 32, the data-first convention was used, because it was consistent with all the implementations examined in 2002, with the sole exception of Chez Scheme. However, R6RS adopted the procedure-first convention, which is more consistent with other Scheme libraries that put the procedure first -- the "operation currying" convention, as in for-each or map or find. This SRFI uses the R6RS convention throughout.
A "stable" sort is one that preserves the pre-existing order of equal elements. Suppose, for example, that we sort a list of numbers by comparing their absolute values using abs< (defined below). If we sort a list that contains both 3 and -3, then a stable sort is an algorithm that will not swap the order of these two elements, that is, the answer will look like ... 3 -3 ..., not ... -3 3 ....
A "stable" merge is one that reliably favors one of its data sets when equal items appear in both data sets. All merge operations specified in this SRFI are stable, breaking ties between data sets in favor of the first data set -- elements of the first list come before equal elements in the second list.
So, if we are merging two lists of numbers ordered by absolute value using the stable merge operation list-merge and abs< compares numbers by their absolute values, then
- list-merge abs< '(0 -2 4 8 -10) '(-1 3 -4 7)procedure
reliably places the 4 of the first list before the equal-comparing -4 of the second list: (0 -1 -2 4 -4 7 8 -10).
Here's a definition of abs<, introduced just for the examples as it is not part of this SRFI:
The vector operations specified below all take optional start and end arguments indicating a selected subrange of a vector's elements. This compensates for the absence in most Schemes of shared subvectors.
The other procedures ending in !, on the other hand, explicitly commit to the use of side-effects on their input lists in order to guarantee their key algorithmic properties (e.g., linear-time operation, constant-space stack use).
Note that sorting lists involves chasing pointers through memory, which can be a loser on modern machine architectures because of poor cache and page locality. Pointer writing, which is what the set-cdr!s of a destructive list-sort algorithm do, is even worse, especially if your Scheme has a generational GC -- the writes will thrash the write-barrier. Sorting vectors has inherently better locality.
In particular, all complexity guarantees assume that the basic accessors and mutators of standard Scheme have O(1) space and time complexity.
The reference implementation's destructive list merge and merge sort implementations are opportunistic -- they avoid redundant set-cdr!s, and try to take long already-ordered runs of list structure as-is when doing the merges.
Most of the procedures described below are variants of two basic operations: sorting and merging. These procedures are consistently named by composing a set of basic lexemes to indicate what they do.
- vector - The procedure operates upon vectors.
- list - The procedure operates upon lists.
- stable - This lexeme indicates that the sort is a stable one.
- sort - The procedure sorts its input data set by some ordering function.
- merge - The procedure merges two ordered data sets into a single ordered result.
- ! - Procedures that end in ! are allowed, and sometimes required, to reuse their input storage to construct their answer.
In the procedures specified below:
- A lis parameter is a list.
- A v parameter is a vector.
- An = parameter is an equality predicate. See SRFI 128 for the requirements on equality predicates. Note that neither this SRFI nor its sample implementation depend on SRFI 128.
- A < parameter is an ordering predicate. See SRFI 128 for the requirements on ordering predicates.
- A start parameter or start and end parameter pair are exact non-negative integers such that 0 <= start <= end <= (vector-length v), where v is the related vector parameter. If not specified, they default to 0 and the length of the vector, respectively. They are interpreted to select the range [start, end), that is, all elements from index start (inclusive) up to, but not including, index end.
Passing values to procedures with these parameters that do not satisfy these constraints is an error.
If a procedure is said to return "an unspecified value", this means that nothing at all is said about what the procedure returns, except that it returns one value.
- list-sorted? < lisprocedure
- vector-sorted? < v #!optional start endprocedure
These procedures return true iff their input list or vector is in sorted order, as determined by <. Specifically, they return #f iff there is an adjacent pair ... X Y ... in the input list or vector such that Y < X in the sense of <. The optional start and end range arguments restrict vector-sorted? to examining the indicated subvector.
These procedures are equivalent to the SRFI 95 sorted? procedure when applied to lists or vectors respectively, except that they do not accept a key procedure.
These procedures provide basic sorting and merging functionality suitable for general programming. The procedures are named by their semantic properties, i.e., what they do to the data (sort, stable sort, and so forth).
- list-sort < lisprocedure
- list-stable-sort < lisprocedure
These procedures do not alter their inputs, but are allowed to return a value that shares a common tail with a list argument.
The list-stable-sort procedure is equivalent to the R6RS list-sort procedure. It is also equivalent to the SRFI 95 sort procedure when applied to lists, except that it does not accept a key procedure.
- list-sort! < lisprocedure
- list-stable-sort! < lisprocedure
These procedures are linear update operators -- they are allowed, but not required, to alter the cons cells of their arguments to produce their results. They return a sorted list containing the same elements as lis.
The list-stable-sort! procedure is equivalent to the SRFI 95 sort! procedure when applied to lists, except that it does not accept a key procedure.
- vector-sort < v #!optional start endprocedure
- vector-stable-sort < v #!optional start endprocedure
These procedures do not alter their inputs, but allocate a fresh vector as their result, of length end - start. The vector-stable-sort procedure with no optional arguments is equivalent to the R6RS vector-sort procedure. It is also equivalent to the SRFI 95 sort procedure when applied to vectors, except that it does not accept a key procedure.
- vector-stable-sort! < v #!optional start endprocedure
These procedures sort their data in-place. (But note that vector-stable-sort! may allocate temporary storage proportional to the size of the input -- there are no known O(n lg n) stable vector sorting algorithms that run in constant space.) They return an unspecified value.
All four merge operations are stable: an element of the initial list lis or vector v will come before an equal-comparing element in the second list lis or vector v in the result.
- (list-merge < lis lis)procedure
This procedure does not alter its inputs, and is allowed to return a value that shares a common tail with a list argument.
This procedure is equivalent to the SRFI 95 merge procedure when applied to lists, except that it does not accept a key procedure.
- (list-merge! < lis lis)procedure
This procedure makes only a single, iterative, linear-time pass over its argument lists, using set-cdr!s to rearrange the cells of the lists into the list that is returned -- it works "in place." Hence, any cons cell appearing in the result must have originally appeared in an input. It returns the sorted input.
Additionally, list-merge! is iterative, not recursive -- it can operate on arguments of arbitrary size without requiring an unbounded amount of stack space. The intent of this iterative-algorithm commitment is to allow the programmer to be sure that if, for example, list-merge! is asked to merge two ten-million-element lists, the operation will complete without performing some extremely (possibly twenty-million) deep recursion.
This procedure is equivalent to the SRFI 95 merge! procedure when applied to lists, except that it does not accept a key procedure.
- (vector-merge < v v [ start [ end [ start [ end ] ] ] ])procedure
This procedure does not alter its inputs, and returns a newly allocated vector of length (end - start) + (end - start).
This procedure is equivalent to the SRFI 95 merge procedure when applied to vectors, except that it does not accept a key procedure.
- (vector-merge! < to from from [ start [ start [ end [ start [ end ] ] ] ] ])procedure
This procedure writes its result into vector to, beginning at index start, for indices less than end, which is defined as start + (end - start) + (end - start). The target subvector to[start, end) may not overlap either of the source subvectors from[1|start, end] and from[2|start, end]. It returns an unspecified value.
This procedure is equivalent to the SRFI 95 merge! procedure when applied to lists, except that it does not accept a key procedure.
These procedures delete adjacent duplicate elements from a list or a vector, using a given element-equality procedure. The first/leftmost element of a run of equal elements is the one that survives. The list or vector is not otherwise disordered.
These procedures are linear time -- much faster than the O(n^2) general duplicate-element deletion procedures that do not assume any "bunching" of elements provided by SRFI 1. If you want to delete duplicate elements from a large list or vector, sort the elements to bring equal items together, then use one of these procedures, for a total time of O(n lg n).
The equality procedure is always invoked as (= x y), where x comes before y in the containing list or vector.
- list-delete-neighbor-dups = lisprocedure
This procedure does not alter its input list, but its result may share storage with the input list.
- list-delete-neighbor-dups! = lisprocedure
This procedure mutates its input list in order to construct its result. It makes only a single, iterative, linear-time pass over its argument, using set-cdr!s to rearrange the cells of the list into the final result -- it works "in place." Hence, any cons cell appearing in the result must have originally appeared in the input.
- vector-delete-neighbor-dups = v #!optional start endprocedure
This procedure does not alter its input vector, but rather newly allocates and returns a vector to hold the result.
- vector-delete-neighbor-dups! = v #!optional start endprocedure
This procedure reuses its input vector to hold the answer, packing it into the index range [start, newend), where newend is the non-negative exact integer that is returned as its value. The vector is not altered outside the range [start, newend).
(list-delete-neighbor-dups = '(1 1 2 7 7 7 0 -2 -2)) => (1 2 7 0 -2) (vector-delete-neighbor-dups = '#(1 1 2 7 7 7 0 -2 -2)) => #(1 2 7 0 -2) (vector-delete-neighbor-dups < '#(1 1 2 7 7 7 0 -2 -2) 3 7)) => #(7 0 -2) ;; Result left in v[3,9): (let ((v (vector 0 0 0 1 1 2 2 3 3 4 4 5 5 6 6))) (cons (vector-delete-neighbor-dups! < v 3) v)) => (9 . #(0 0 0 1 2 3 4 5 6 4 4 5 5 6 6))
These procedures do not have SRFI 32 counterparts. They find the median element of a vector after sorting it in accordance with an ordering procedure. If the number of elements in v is odd, the middlemost element of the sorted result is returned. If the number of elements is zero, knil is returned. Otherwise, mean is applied to the two middlemost elements in the order in which they appear in v, and whatever it returns is returned. If mean is omitted, then the default mean procedure is (lambda (a b) (/ (+ a b) 2), but this procedure is applicable to non-numeric values as well.
- vector-find-median < v knil #!optional meanprocedure
This procedure does not alter its input vector, but rather newly allocates a vector to hold the intermediate result. Runs in O(n) time.
- vector-find-median! < v knil #!optional meanprocedure
This procedure reuses its input vector to hold the intermediate result, leaving it sorted, but is otherwise the same as vector-find-median. Runs in O(n ln n) time.
These procedures do not have SRFI 32 counterparts.
- vector-select! < v k #!optional start endprocedure
This procedure returns the kth smallest element (in the sense of the < argument) of the region of a vector between start and end. Elements within the range may be reordered, whereas those outside the range are left alone. Runs in O(n) time.
- vector-separate! < v k #!optional start endprocedure
This procedure places the smallest k elements (in the sense of the < argument) of the region of a vector between start and end into the first k positions of that range, and the remaining elements into the remaining positions. Otherwise, the elements are not in any particular order. Elements outside the range are left alone. Runs in O(n) time. Returns an unspecified value.
The sample implementation is a modified version of the Scheme 48 implementation of the sorting structure, and is found in the repository of this SRFI. It will use the R6RS sorting library if it is available, but does not depend on it. This is close to the original SRFI 32 reference implementation, but includes some bug fixes and switches the < and = arguments to the initial position. It also adds implementations for the median and selection procedures. The code is very portable and freely reusable. It is tightly bummed, as far as could be done in portable Scheme, and is commented in Olin's usual voluminous style, including notes on porting and implementation-specific optimizations. The median and selection code is specific to this SRFI.
The only non-R4RS features in the code are the use of R5RS/R6RS/R7RS multiple-value return, with values and call-with-values procedures, and the use of R7RS-style error to report an assertion violation.
You could speed up the vector code a lot by error-checking the procedure parameters and then shifting over to fixnum-specific arithmetic and dangerous vector-indexing and vector-setting primitives. The comments in the code indicate where the initial error checks would have to be added. There are several (quotient n 2) calls that could be changed to a fixnum right-shift, as well, in both the list and vector code. The code is designed to enable this -- each file usually exports one or two "safe" procedures that end up calling an internal "dangerous" primitive. The little exported cover procedures are where you move the error checks.
This should provide big speedups. In fact, all the code bumming in the source pretty much disappears in the noise unless you have a good compiler and also can dump the vector-index checks and generic arithmetic -- so it's really set up for optimization rather than fully optimized.
The optional-arg parsing, defaulting, and error checking is done with a portable syntax-rules macro. But if the target Scheme has a faster mechanism (e.g., Chez), it's definitely better to switch to using it. Note that argument defaulting and error-checking are interleaved -- there's no need to error-check defaulted start and end args to see if they are fixnums that are legal vector indices for the corresponding vector, etc.
- delndups.scm - the delete-neighbor-dups procedures
- lmsort.scm - list merge sort
- median.scm - the find-median procedures
- selection.scm - the selection procedure
- sort.scm - generic sort and merge procedures
- sorting-test.scm - test file
- sortp.scm - sort predicates
- srfi-132.scm - a Chicken library providing this SRFI
- srfi-132.sld - an R7RS counterpart of srfi-132.scm
- vector-util.scm - vector utilities
- vhsort.scm - vector heap sort
- visort.scm - vector insert sort
- vmsort.scm - vector merge sort
- vqsort2.scm - vector quick sort
Olin thanked the authors of the open source consulted when designing this library, particularly Richard O'Keefe, Donovan Kolbly and the MIT Scheme Team. John thanks Will Clinger for his detailed comments, and both Will Clinger and Alex Shinn for their implementation efforts.
- John Cowan (based on SRFI 32 by Olin Shivers)
- Ported to Chicken Scheme 5 by Sergey Goldgaber
SRFI text copyright
This document is copyright (C) Olin Shivers (1998, 1999). All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Short summary: no restrictions.
While Olin wrote all of this code himself, he read a lot of code before he began writing. However, all such code is, itself, either open source or public domain, rendering irrelevant any issue of "copyright taint."
Hence the sample implementation is Copyright © 1998 by Olin Shivers and made available under the same copyright as the SRFI text (see above).
- 0.1 - Ported to Chicken Scheme 5