chickadee » rfc822 » rfc822-next-token

rfc822-next-token iport &optional tokenizer-specsprocedure

A basic tokenizer. First it skips whitespaces and/or comments (CFWS) from iport, if any. Then reads one token according to tokenizer-specs. If iport reaches EOF before any token is read, EOF is returned.

Tokenizer-specs is a list of tokenizer spec, which is either a char-set or a cons of a char-set and a procedure.

After skipping CFWS, the procedure peeks a character at the head of iport, and checks it against the char-sets in tokenizer-specs one by one. If a char-set that contains the character belongs to is found, then a token is retrieved as follows: If the tokenizer spec is just a char-set, a sequence of characters that belong to the char-set consists a token. If it is a cons, the procedure is called with iport to read a token.

If the head character doesn't match any char-sets, the character is taken from iport and returned.

The default tokenizer-specs is as follows:

(list (cons (char-set #") rfc822-quoted-string)
      (cons *rfc822-atext-chars* rfc822-dot-atom))

Where rfc822-quoted-string and rfc822-dot-atom are tokenizer procedures described below, and *rfc822-atext-chars* is bound to a char-set of atext specified in rfc2822. This means rfc822-next-token retrieves a token either quoted-string or dot-atom specified in rfc2822 by default.

Using tokenizer-specs, you can customize how the header field is parsed. For example, if you want to retrieve a token that is either (1) a word constructed by uppercase characters, or (2) a quoted string, then you can call rfc822-next-token by this:

(rfc822-next-token iport
   `(,char-set:uppercase (,(char-set #") . ,rfc822-quoted-string)))