chickadee » hash-trie

hash-trie

Documentation

Hash Array Mapped Tries (HAMT)

(Phil Bagwell, `Ideal Hash Trees', Technical Report, 2001)

Provided that the hash function have no collisions and yield hash values in a bounded interval, hash tries admit search and update in constant worst-case time (!), bounded by a somewhat larger constant than what one usually finds for the worst-case time of search or replacement, and the amortized time of insertion or deletion, in hash tables. There is no complicated incremental rehashing going on like in some real-time hash tables to attain these constant time bounds; in fact, there is never any rehashing. A little more precisely, `constant' means logarithmically proportional to the length of the interval of the hash values, or, practically, linearly proportional to the number of bits in the hash, with small constant factors.

Yes, this is the same data structure as Clojure uses to implement its hash maps and hash sets. Similarly to Clojure, this code uses hash collision buckets rather than the paper's suggestion of repeating hashes salted by the trie depth.

Although the pronunciation is identical, and despite the title of Bagwell's paper, a hash trie is not a hash tree. Sorry. Nor do these hash tries have any relation to what Knuth calls hash tries.

Except where this is obviously not the case (hash-table/fold, for example), every procedure here runs in constant expected time under the assumption of a good key hash function, ignoring the time taken by garbage collection and the time taken by the key hash function and the key equality predicate of the relevant hash trie type. When searching in a hash trie, the key equality predicate is applied to only as many keys extra as share a common hash value with the key whose association is sought. Thus in a hash trie with no collisions, every search involves at most one invocation of the key hash function, and at most one invocation of the key equality predicate.

Usage

(import hash-trie)

make-hash-trie-type

make-hash-trie-type KEY=? KEY-HASHprocedure

Constructor for hash trie types. KEY=? must be a key equality predicate, a procedure of two arguments that returns true to indicate that they are equal and false to indicate that they are not, and that behaves transitively, symmetrically, and reflexively. KEY-HASH must be a key hash function that preserves the key equality predicate, i.e. for keys A and B, it must be that if (KEY=? A B), then (= (KEY-HASH A) (KEY-HASH B)).

hash-trie-type?

hash-trie-type? OBJECTprocedure

Disjoint type predicate for hash trie types.

hash-trie-type/key-equality-predicate

hash-trie-type/key-hash-function

hash-trie-type/key-equality-predicate HASH-TRIE-TYPEprocedure
hash-trie-type/key-hash-function HASH-TRIE-TYPEprocedure

Accessors for the key equality predicates and key hash functions of hash trie types.

make-hash-trie

make-hash-trie HASH-TRIE-TYPEprocedure

Hash trie constructor.

hash-trie/type

hash-trie/type HASH-TRIEprocedure

Returns the <hash-trie-type> of HASH-TRIE.

hash-trie/count

hash-trie/count HASH-TRIEprocedure

Returns the number of associations in HASH-TRIE.

hash-trie/empty?

hash-trie/empty? HASH-TRIEprocedure

Returns true if HASH-TRIE has no associations, or false if it has any.

hash-trie/search

hash-trie/search HASH-TRIE KEY IF-FOUND IF-NOT-FOUNDprocedure

Searches for an association for KEY in HASH-TRIE. If there is one, tail-calls IF-FOUND with one argument, the datum associated with key. If not, tail-calls IF-NOT-FOUND with zero arguments.

hash-trie/lookup

hash-trie/lookup HASH-TRIE KEY DEFAULTprocedure

Searches for an association for KEY in HASH-TRIE. If there is one, returns its associated datum; otherwise returns DEFAULT.

hash-trie/member?

hash-trie/member? HASH-TRIE KEYprocedure

Returns true if HASH-TRIE has an association for KEY, or false if not.

hash-trie/update

hash-trie/update HASH-TRIE KEY IF-FOUND IF-NOT-FOUNDprocedure

Searches for an association for KEY in HASH-TRIE. If there is one, tail-calls IF-FOUND with three arguments:

  • the associated datum
  • a procedure (REPLACE DATUM) that returns a new hash trie with all the associations in HASH-TRIE, but with DATUM substituted for the datum associated with KEY
  • a procedure (DELETE) that returns a new hash trie with all the associations in HASH-TRIE excluding the association for KEY

If there is no such association, tail-calls IF-NOT-FOUND with one argument, a procedure (INSERT DATUM) that returns a new hash trie with all the associations in HASH-TRIE as well as an association of DATUM with KEY.

hash-trie/insert

hash-trie/insert HASH-TRIE KEY DATUMprocedure

Returns a hash trie with all the associations in HASH-TRIE, but associating DATUM with KEY, whether HASH-TRIE had an association for KEY or not.

hash-trie/modify

hash-trie/modify HASH-TRIE KEY DEFAULT MODIFIERprocedure

Returns a hash trie with all the associations in HASH-TRIE, but associating (MODIFIER D) with KEY if HASH-TRIE associated a datum D with KEY, or associating (MODIFIER DEFAULT) with KEY if HASH-TRIE had no association for KEY.

hash-trie/intern

hash-trie/intern HASH-TRIE KEY GENERATORprocedure

If HASH-TRIE has an association for KEY, returns its associated datum and HASH-TRIE. Otherwise, calls (GENERATOR KEY) to obtain a datum D, and returns D and a hash trie with all the associations in HASH-TRIE as well as an association of D with KEY.

hash-trie/delete

hash-trie/delete HASH-TRIE KEYprocedure

Returns a hash trie with all the associations in HASH-TRIE, excluding its association, if any, for KEY.

hash-trie/fold

hash-trie/fold HASH-TRIE INITIAL-VALUE COMBINATORprocedure

Folds HASH-TRIE by COMBINATOR, starting with an initial value V of INITIAL-VALUE and updating it for each association of a datum D with a key K, in no particular order, by (COMBINATOR K D V).

hash-trie->alist

hash-trie/key->alist

hash-trie/datum->alist

hash-trie->alist HASH-TRIEprocedure
hash-trie/key->alist HASH-TRIEprocedure
hash-trie/datum->alist HASH-TRIEprocedure

hash-trie->alist returns a list of pairs, in no particular order, corresponding with the associations in HASH-TRIE, with keys in the cars and associated data in the respective cdrs.

hash-trie/key-list returns a list of all the keys in HASH-TRIE, in no particular order.

hash-trie/datum-list returns a list of all the data in HASH-TRIE, in no particular order.

alist->hash-trie

alist->hash-trie ALIST HASH-TRIE-TYPEprocedure

Returns a hash trie of the given type with the associations listed in ALIST, taking keys from the cars and corresponding data from the respective cdrs.

string-hash

symbol-hash

exact-integer-hash

real-number-hash

complex-number-hash

string-hash STRINGprocedure
symbol-hash SYMBOLprocedure
exact-integer-hash EXACT-INTEGERprocedure
real-number-hash REAL-NUMBERprocedure
complex-number-hash COMPLEX-NUMBERprocedure

Hash functions for various types of data. exact-integer-hash, real-number-hash, and complex-number-hash all agree where their domains coincide. The current implementations of these hash functions are all based on the FNV (Fowler-Noll-Vo) family of hash functions, tweaked slightly so that it is more likely to fit into the range of fixnums (small exact integers that can be represented immediately) for most Scheme systems on 32-bit machines.

hash-trie-type:string

hash-trie-type:symbol

hash-trie-type:exact-integer

hash-trie-type:real-number

hash-trie-type:complex-number

hash-trie-type:stringconstant
hash-trie-type:symbolconstant
hash-trie-type:exact-integerconstant
hash-trie-type:real-numberconstant
hash-trie-type:complex-numberconstant

<hash-trie-type> of the above hash functions, with appropriate key equality predicates: string=? for strings, eq? for symbols, and = for the numeric types.

Iterator API

hash-trie-root HASH-TRIEprocedure
hash-trie-bucket? HASH-TRIE-NODEprocedure
hash-trie-bucket-list HASH-TRIE-BUCKET-NODEprocedure
hash-trie-branch? HASH-TRIE-NODEprocedure
hash-trie-branch-count HASH-TRIE-BRANCH-NODEprocedure
HASH-TRIE-NODE : (or HASH-TRIE-BUCKET-NODE HASH-TRIE-BRANCH-NODE)
HASH-TRIE-BRANCH-NODE
vector ; hash-trie-branch-count returns last index
Example
(import hash-trie)
(import (srfi 41))

;see https://mumble.net/~campbell/scheme/hash-trie.scm
(define (hash-trie->stream hash-trie)
  (define (bucket->stream list tail)
    (if (pair? list)
      (stream-cons (car list) ((stream-lambda () (bucket->stream (cdr list) tail))))
      tail ) )
  (define (branch->stream branch tail)
    (let ((vec (hash-trie-branch-vector branch)))
      (let recur ((index (sub1 (vector-length vec))) (tail tail))
        (if (positive? index)
          (node->stream (vector-ref branch index) ((stream-lambda () (recur (sub1 index) tail))))
          tail ) ) ) )
  (define (node->stream node tail)
    (cond
      ((hash-trie-branch? node) (branch->stream node tail))
      ((hash-trie-bucket? node) (bucket->stream (hash-trie-bucket-list node) tail))
      (else
        (error 'hash-trie->stream "(internal) invalid hash-trie node" node hash-trie)) ) )
  (let ((root (hash-trie-root hash-trie)))
    (if root
      ((stream-lambda () (node->stream root stream-null)))
      stream-null ) ) )

(define (stream->hash-trie stream hash-trie-type)
  (let loop ((stream stream) (hash-trie (make-hash-trie hash-trie-type)))
    (if (stream-pair? stream)
      (let ((cell (stream-car stream)))
        (loop (stream-cdr stream) (hash-trie/insert hash-trie (car cell) (cdr cell))) )
      hash-trie ) ) )

Author

Taylor R. Campbell

Maintainer

Kon Lovett

Version history

1.1.0
Provide Iterator API.
1.0.0
C5 release.

License

Copyright (c) 2009, Taylor R. Campbell

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

THIS SOFTWARE IS PROVIDED BY THE AUTHORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Contents »