chickadee » srfi-14 » ucs-range->char-set

(ucs-range->char-set lower upper [error? base-cs]) -> char-setprocedure
ucs-range->char-set! lower upper error? base-csprocedure

LOWER and UPPER are exact non-negative integers; LOWER <= UPPER.

Returns a character set containing every character whose ISO/IEC 10646 UCS-4 code lies in the half-open range [LOWER,UPPER).

  • If the requested range includes unassigned UCS values, these are silently ignored (the current UCS specification has "holes" in the space of assigned codes).
  • If the requested range includes "private" or "user space" codes, these are handled in an implementation-specific manner; however, a UCS- or Unicode-based Scheme implementation should pass them through transparently.
  • If any code from the requested range specifies a valid, assigned UCS character that has no corresponding representative in the implementation's character type, then (1) an error is raised if ERROR? is true, and (2) the code is ignored if ERROR? is false (the default). This might happen, for example, if the implementation uses ASCII characters, and the requested range includes non-ASCII characters.

If character set BASE-CS is provided, the characters specified by the range are added to it. ucs-range->char-set! is allowed, but not required, to side-effect and reuse the storage in BASE-CS; ucs-range->char-set produces a fresh character set.

Note that ASCII codes are a subset of the Latin-1 codes, which are in turn a subset of the 16-bit Unicode codes, which are themselves a subset of the 32-bit UCS-4 codes. We commit to a specific encoding in this routine, regardless of the underlying representation of characters, so that client code using this library will be portable. I.e., a conformant Scheme implementation may use EBCDIC or SHIFT-JIS to encode characters; it must simply map the UCS characters from the given range into the native representation when possible, and report errors when not possible.