- char-set:whitespaceconstant
In Unicode, a whitespace character is either
- a character with one of the space, line, or paragraph separator categories (Zs, Zl or Zp) of the Unicode character database.
- U+0009 Horizontal tabulation (\t control-I)
- U+000A Line feed (\n control-J)
- U+000B Vertical tabulation (\v control-K)
- U+000C Form feed (\f control-L)
- U+000D Carriage return (\r control-M)
There are 24 whitespace characters in Unicode 3.0:
0009 HORIZONTAL TABULATION \t control-I 000A LINE FEED \n control-J 000B VERTICAL TABULATION \v control-K 000C FORM FEED \f control-L 000D CARRIAGE RETURN \r control-M 0020 SPACE Zs 00A0 NO-BREAK SPACE Zs 1680 OGHAM SPACE MARK Zs 2000 EN QUAD Zs 2001 EM QUAD Zs 2002 EN SPACE Zs 2003 EM SPACE Zs 2004 THREE-PER-EM SPACE Zs 2005 FOUR-PER-EM SPACE Zs 2006 SIX-PER-EM SPACE Zs 2007 FIGURE SPACE Zs 2008 PUNCTUATION SPACE Zs 2009 THIN SPACE Zs 200A HAIR SPACE Zs 200B ZERO WIDTH SPACE Zs 2028 LINE SEPARATOR Zl 2029 PARAGRAPH SEPARATOR Zp 202F NARROW NO-BREAK SPACE Zs 3000 IDEOGRAPHIC SPACE Zs The ASCII whitespace characters are the first six characters in the above list -- line feed, horizontal tabulation, vertical tabulation, form feed, carriage return, and space. These are also exactly the characters recognised by the Posix isspace() procedure. Latin-1 adds the no-break space.