chickadee » srfi-14 » char-set:whitespace

char-set:whitespaceconstant

In Unicode, a whitespace character is either

  • a character with one of the space, line, or paragraph separator categories (Zs, Zl or Zp) of the Unicode character database.
  • U+0009 Horizontal tabulation (\t control-I)
  • U+000A Line feed (\n control-J)
  • U+000B Vertical tabulation (\v control-K)
  • U+000C Form feed (\f control-L)
  • U+000D Carriage return (\r control-M)

There are 24 whitespace characters in Unicode 3.0:

0009HORIZONTAL TABULATION\t control-I
000ALINE FEED\n control-J
000BVERTICAL TABULATION\v control-K
000CFORM FEED\f control-L
000DCARRIAGE RETURN\r control-M
0020SPACEZs
00A0NO-BREAK SPACEZs
1680OGHAM SPACE MARKZs
2000EN QUADZs
2001EM QUADZs
2002EN SPACEZs
2003EM SPACEZs
2004THREE-PER-EM SPACEZs
2005FOUR-PER-EM SPACEZs
2006SIX-PER-EM SPACEZs
2007FIGURE SPACEZs
2008PUNCTUATION SPACEZs
2009THIN SPACEZs
200AHAIR SPACEZs
200BZERO WIDTH SPACEZs
2028LINE SEPARATORZl
2029PARAGRAPH SEPARATORZp
202FNARROW NO-BREAK SPACEZs
3000IDEOGRAPHIC SPACEZs

The ASCII whitespace characters are the first six characters in the above list -- line feed, horizontal tabulation, vertical tabulation, form feed, carriage return, and space. These are also exactly the characters recognised by the Posix isspace() procedure. Latin-1 adds the no-break space.