- array-reduce op Aprocedure
We assume that A is an array and op is a procedure of two arguments that is associative, i.e., (op (op x y) z) is the same as (op x (op y z)).
Then (array-reduce op A) returns
(let ((box '()) (A_ (array-getter A))) (interval-for-each (lambda args (if (null? box) (set! box (list (apply A_ args))) (set-car! box (op (car box) (apply A_ args))))) (array-domain A)) (car box))
The implementation is allowed to use the associativity of op to reorder the computations in array-reduce. It is an error if the arguments do not satisfy these conditions.
As an example, we consider the finite sum: $$ S_m=\sum_{k=1}^m \frac 1{k^2}. $$ One can show that $$ \frac 1 {m+1}<\frac{\pi^2}6-S_m<\frac 1m. $$ We attempt to compute this in floating-point arithmetic in two ways. In the first, we apply array-reduce to an array containing the terms of the series, basically a serial computation. In the second, we divide the series into blocks of no more than 1,000 consecutive terms, apply array-reduce to get a new sequence of terms, and repeat the process. The second way is approximately what might happen with GPU computing.
We find with $m=1{,}000{,}000{,}000$:
(define A (make-array (make-interval '#(1) '#(1000000001)) (lambda (k) (fl/ (flsquare (inexact k)))))) (define (block-sum A) (let ((N (interval-volume (array-domain A)))) (cond ((<= N 1000) (array-reduce fl+ A)) ((<= N (square 1000)) (block-sum (array-map block-sum (array-tile A (vector (integer-sqrt N)))))) (else (block-sum (array-map block-sum (array-tile A (vector (quotient N 1000))))))))) (array-reduce fl+ A) => 1.644934057834575 (block-sum A) => 1.6449340658482325
Since $\pi^2/6\approx{}$1.6449340668482264, we see using the first method that the difference $\pi^2/6-{}$1.644934057834575${}\approx{}$9.013651380840315e-9 and with the second we have $\pi^2/6-{}$1.6449340658482325${}\approx{}$9.99993865491433e-10. The true difference should be between $\frac 1{1{,}000{,}000{,}001}\approx{}$9.99999999e-10 and $\frac 1{1{,}000{,}000{,}000}={}$1e-9. The difference for the first method is about 10 times too big, and, in fact, will not change further because any further terms, when added to the partial sum, are too small to increase the sum after rounding-to-nearest in double-precision IEEE-754 floating-point arithmetic.