chickadee » srfi-13 » string-hash-ci

(string-hash s [bound start end]) -> integerprocedure
(string-hash-ci s [bound start end]) -> integerprocedure

Compute a hash value for the string S. BOUND is a non-negative exact integer specifying the range of the hash function. A positive value restricts the return value to the range [0,BOUND).

If BOUND is either zero or not given, the implementation may use an implementation-specific default value, chosen to be as large as is efficiently practical. For instance, the default range might be chosen for a given implementation to map all strings into the range of integers that can be represented with a single machine word.

The optional start/end indices restrict the hash operation to the indicated substring of S.

string-hash-ci is the case-insensitive variant. Case-insensitive comparison is done by case-folding characters with the operation

(char-downcase (char-upcase C))

where the two case-mapping operations are assumed to be 1-1, locale- and context-insensitive, and compatible with the 1-1 case mappings specified by Unicode's UnicodeData.txt table:

ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt

Invariants:

(<= 0 (string-hash s b) (- b 1)) ; When B > 0.
(string=    s1 s2)  =>  (= (string-hash s1 b)    (string-hash s2 b))
(string-ci= s1 s2)  =>  (= (string-hash-ci s1 b) (string-hash-ci s2 b))

A legal but nonetheless discouraged implementation:

(define (string-hash    s . other-args) 1)
(define (string-hash-ci s . other-args) 1)

Rationale: allowing the user to specify an explicit bound simplifies user code by removing the mod operation that typically accompanies every hash computation, and also may allow the implementation of the hash function to exploit a reduced range to efficiently compute the hash value. E.g., for small bounds, the hash function may be computed in a fashion such that intermediate values never overflow into bignum integers, allowing the implementor to provide a fixnum-specific "fast path" for computing the common cases very rapidly.