chickadee » string-utils

string-utils

Documentation

Memoized String

Usage

(import memoized-string)

make-string+

make-string+ COUNT #!optional FILLprocedure

An interning make-string.

FILL is any valid char, including codepoints outside of the ASCII range, which produce UTF-8 strings.

string+

string+ #!optional CHAR...procedure

An interning string.

CHAR is any valid char, including codepoints outside of the ASCII range, which produce UTF-8 strings.

global-string

global-string STRprocedure

Share common string space.

String Hexadecimal

Usage

(import string-hexadecimal)

string->hex

string->hex STRING #!optional START ENDprocedure

Returns a hexadecimal represenation of STRING. START and END are substring limits.

STRING is treated as a string of bytes, a byte-vector.

hex->string

hex->string STRING #!optional START ENDprocedure

Returns the binary representation of a hexadecimalSTRING. START and END are substring limits.

Hexadecimal Procedures

Usage

(import to-hex)

str_to_hex

str_to_hex OUT IN OFF LENprocedure

Writes the ASCII hexadecimal representation of IN to OUT.

IN is a nonnull-string.

OFF is the byte offset.

LEN is the length of the bytes at OFF.

OUT is a string of length >= (+ LEN 2).

blob_to_hex

blob_to_hex OUT IN OFF LENprocedure

Like str_to_hex except IN is a nonnull-blob.

u8vec_to_hex

u8vec_to_hex OUT IN OFF LENprocedure

Like str_to_hex except IN is a nonnull-u8vector.

s8vec_to_hex

s8vec_to_hex OUT IN OFF LENprocedure

Like str_to_hex except IN is a nonnull-s8vector.

mem_to_hex

mem_to_hex OUT IN OFF LENprocedure

Like str_to_hex except IN is a nonnull-c-pointer.

hex_to_str

hex_to_str OUT IN OFF LENprocedure

Reads the ASCII hexadecimal representation of IN to OUT.

IN is a nonnull-string.

OFF is the byte offset.

LEN is the length of the bytes at OFF.

OUT is a string of length >= (/ LEN 2).

hex_to_blob

hex_to_blob OUT IN OFF LENprocedure

Like hex_to_str except OUT is a blob of size >= (/ LEN 2).

Unicode Utilities

The name of this extension is misleading. Only UTF-8 is currently supported.

For a better treatment of UTF-8 see the utf-8 extension.

Usage

(import unicode-utils)

ascii-codepoint?

ascii-codepoint? CHARprocedure

char->unicode-string

char->unicode-string CHARprocedure

Returns a string formed from Unicode codepoint CHAR.

Note that the (string-length) (except under utf-8) may not be equal to 1.

Generates an error should the codepoint be out-of-range.

unicode-string

unicode-string #!optional CHAR...procedure

Returns a string formed from Unicode codepoints CHAR...

Note that the (string-length) (except under utf-8) may not be equal to the length of CHAR....

Generates an error should the codepoint be out-of-range.

*unicode-string

*unicode-string CHARSprocedure

Returns a string formed from Unicode codepoints CHARS, a (list-of char).

unicode-make-string

unicode-make-string COUNT #!optional FILLprocedure

Returns a string formed from COUNT occurrences of the Unicode codepoint FILL. The FILL default is #\space.

Note that the (string-length) (except under utf-8) may not be equal to COUNT.

Generates an error should the codepoint be out-of-range.

unicode-surrogate?

unicode-surrogate? NUMprocedure

unicode-surrogates->codepoint

unicode-surrogates->codepoint HIGH LOWprocedure

Returns the codepoint for the valid surrogate pair HIGH and LOW. Otherwise returns #f.

String Utilities

Usage

(import string-utils)

string-split-chars

string-split-chars STR #!optional DELIMITERSprocedure

Returns a list of substrings of STR & a list of the characters, from DELIMITERS, separating those substrings.

STR
string ; version string.
DELIMITERS
string ; string of version component delimiter characters, default ".,".
(string-split-chars "a.2,c" "$,.")
;=> ("a" "2" "c") (#\. #\,)

string-unzip

string-unzip STR #!optional DELIMITERSprocedure

Returns a list of substrings of STR & a list of the delimiters, from DELIMITERS, separating those substrings.

STR
string ; version string.
DELIMITERS
string ; string of version component delimiter characters, default ".,".
(string-unzip "a.2,c" "$,.")
;=> ("a" "2" "c") ("." ",")

string-zip

string-zip PARTS PUNCSprocedure

Returns a string formed from the concatenation of the PARTS and the interspersion of the PUNCS.

PARTS
(list-of string) ; version components.
PUNCS
(list-of string) ; version component separators.
(string-zip ("a" "2" "c") ("." ","))
;=> "a.2,c"

string-trim-whitespace-both

string-trim-whitespace-both Sprocedure

Returns the string S with whitespace trimmed.

list-as-string

list-as-string LSprocedure

Returns the list LS written to a string.

number->padded-string

number->padded-string N WIDTH #!optional PADCHAR BASEprocedure
N
number ; source
WIDTH
fixnum ; field width
PADCHAR
char ; padding character
BASE
fixnum ; number conversion base

string-fixed-length

(string-fixed-length S N [pad-char: #\space] [trailing: "..."]) -> stringprocedure

Returns the string S with the string-length fixed to N.

A shorter string is padded. A longer string is truncated, & suffixed with the trailing.

string-longest-common-prefix

string-longest-common-prefix STRINGSprocedure

Returns the longest comment prefix of STRINGS.

STRINGS
(list-of string)

string-longest-common-suffix

string-longest-common-suffix STRINGSprocedure

Returns the longest comment suffix of STRINGS.

STRINGS
(list-of string)

string-longest-prefix

string-longest-prefix CANDIDATE OTHERSprocedure

Returns the member with the longest comment prefix of CANDIDATE from OTHERS, or #f.

CANDIDATE
string
OTHERS
(list-of string)

string-longest-suffix

string-longest-suffix CANDIDATE OTHERSprocedure

Returns the member with the longest comment suffix of CANDIDATE from OTHERS, or #f.

CANDIDATE
string
OTHERS
(list-of string)

String Interpolation

Extends the read-syntax with #"..." where tagged scheme expressions in the string are evaluated at runtime:

#"@ #(+ 1 2)## (#'and #1 #2) = #(and 1 2) trailing #"
;=> "@ 3# (and 1 2) = 2 trailing #"

Similar to the #<# multi-line string.

See Multiline String Constant with Embedded Expressions.

Note Support for the #{<sexpr>} subform is dropped. So SRFI 105 can work as expected:

(import (srfi-105 extra))
#"1 + 3 = #{1 + 3}"
;=> "1 + 3 = 4"
#"An \"#{string-append(\"Hello, \" \"World\")}\" example"
;=> "An \"Hello, World\" example"

Usage

(import string-interpolation)

or using UTF8

(import utf8-string-interpolation)

Compiler Command-Line

csc -extend [utf8-]string-interpolation ...

Interpreter Command-Line

csi -require-extension [utf8-]string-interpolation ...

Activates string-interpolation #"..." syntax.

String Interpolation Syntax

Usage

(import string-interpolation-syntax)

set-sharp-string-interpolation-syntax

set-sharp-string-interpolation-syntax PROCprocedure

Extends the read-syntax with #"..." where the "..." is evaluated using (PROC "...").

PROC
#f ; read-syntax is cleared.
PROC
#t ; PROC is identity.
PROC
procedure ; interpolation function.

String Interpolator

Usage

(import string-interpolator)

or using UTF8

(import utf8-string-interpolator)

string-interpolate

(string-interpolate STR [eval-tag: EVAL-TAG]) -> listprocedure

Performs substitution of embedded Scheme expressions, prefixed with EVAL-TAG. Two consecutive EVAL-TAGs are translated to a single EVAL-TAG. A trailing EVAL-TAG is taken literally.

STR
string.
EVAL-TAG
character, default #\#.

Usage

(import rabin-karp)
STRINGS
(list-of string) ;
COMPARE
(string string --> boolean) ;
HASH
(string [BOUNDS []]) ; SRFI-69 hash procedure.
SEARCHER
(string [START [END]]) --> RESULT
RESULT
(or #f (STRING . (START . END))) ; success or failure result

Perform exhaustive search of the TARGET, returing a list of RESULT.

SEARCHER
from make-string-search
TARGET
string ; search within
RESULT
(or #f (STRING . (START . END))) ; success or failure result

Requirements

check-errors miscmacros srfi-1 srfi-13 srfi-69 utf8

test test-utils

Author

Kon Lovett

Version history

2.7.4
More fixnum, add default delimiter for string-split-chars/string-unzip.
2.7.3
Add tests, more fixnum, fix signatures.
2.7.2
Fix signatures, new test-runner.
2.7.1
Fix version.
2.7.0
Add rabin-karp module.
2.6.0
Remove #{...} support.
2.5.6
Reflow.
2.5.5
Update test-runner.
2.5.4
UTF8.
2.5.3
Add string-split-chars.
2.5.2
Fix potential buffer overflow in to-hex.
2.5.0
Add string-zip & string-unzip.
2.4.0
Add string-longest-common-prefix/suffix, string-longest-prefix/suffix, number->padded-string, list-as-string, string-trim-whitespace-both.
2.3.2
Deprecate unicode-char->string, fixes for memoized-string & string-utils modules, ascii-codepoint? & unicode-surrogate? are not predicates.
2.3.1
Minor optimization.
2.3.0
Deprecate #{...} support. Add string-interpolator modules.
2.2.0
Fix string-interpolation.
2.1.0
Add utf8-string-interpolation.
2.0.0
C5 release.

License

Copyright (C) 2010-2024 Kon Lovett. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

 Redistributions of source code must retain the above copyright notice, this list of conditions and the following
   disclaimer.
 Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following
   disclaimer in the documentation and/or other materials provided with the distribution.
 Neither the name of the author nor the names of its contributors may be used to endorse or promote
   products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICESLOSS OF USE, DATA, OR PROFITSOR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Contents »