A dictionary data structure based on counting Bloom filters.
The sfht library is an implementation of the Shared-node Fast Hash Table (SFHT) data structure described by Song, et al., in
Fast Hash Table Lookup Using Extended Bloom Filter: An Aid to Network Processing. (SIGCOMM'05)
The present implementation code defines an SFHT typeclass that implements an ordered dictionary mapping of keys to values.
The typeclass containts methods for creating new hash tables, querying, insertion of new elements, and deletion of existing elements. The SFHT interface is particularly suitable for situations where the keys are represented by bit vectors or vectors of fixnum values.
A counting Bloom filter is a Bloom filter that has been extended so that each bit of the filter has a counter associated with it. Upon insertion or deletion of an element, the counter is incremented or decremented, respectively. In order to find an element efficiently, we need to compute the k hash values, read the counters at the k locations, determine the smallest bucket size, and perform a linear search of that bucket for the element.
The SFHT type class is created by procedure sfht-map:
- make-sfht:: N P MAKE-RNG-STATE RANDOM! KEY->VECTOR KEY-VECTOR-REF KEY-VECTOR-LENGTH [KEY-EQUAL?] -> <SFHT> procedure
- is a user-supplied function that takes in an integer argument and returns an RNG state value.
- is a user-supplied function that generates a random positive integer, given a state value, which is expected to be mutated.
- is a user-supplied function that takes a key value and returns a vector.
- is a user-supplied function that retrieves an element from the vector returned by KEY-VECTOR.
- is a user-supplied function that returns the length of the key vector.
- is a user-supplied predicate that takes two keys and returns #t if they are equal. The default function used is equal?
The SFHT typeclass consists of the following methods:
- creates a new empty hash table
- procedure of the form SFHT KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, and returns a (key . value) pair of the found association. If an association with KEY cannot be located in the hash table, the PROC returns the result of evaluating the DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled. KEY must be comparable to the keys in the hash table by the KEY-EQUAL? predicate specified when the hash table was created)
- returns #t if the hash table is empty
- returns the size (the number of associations) in the hash table
- removes all associations from the hash table (thus making it empty)
- procedure of the form SFHT KEY VALUE which, given a KEY and a VALUE, adds the corresponding association to the hash table. If an association with the same KEY already exists, its value is replaced with the VALUE. The return value is #f.
- procedure of the form SFHT KEY . DEFAULT-CLAUSE which searches the hash table for an association with a given KEY, deletes it, and returns a (key . value) pair of the found and deleted association. If an association with the KEY cannot be located in the hash table, the PROC returns the result of evaluating DEFAULT-CLAUSE. If the default clause is omitted, an error is signaled.
(let ((m (sfht-map 100000 0.0001
(lambda (i) (make-swb-random-state i (fx+ i 17))) swb:random! integer->bit-vector (compose (lambda (x) (if x 1 0)) bit-vector-ref) bit-vector-length)))
(with-instance ((<SFHT> m)) (let ((t (empty)))
(put t 1 'one) (get t)
- Converted to type class interface
- Ensure test script returns proper exit status
- Documentation converted to wiki format
- Typo fix
- Ported to Chicken 4
- Build script updated for better cross-platform compatibility
- Introduced an API that is independent of the RNG used
- Documentation updates
- License upgrade to GPL v3
- Added random-swb to the list of dependencies
- Initial release
Copyright 2007-2013 Ivan Raikov. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or at your option any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. A full copy of the GPL license can be found at <http://www.gnu.org/licenses/>.