SXML/HTML automatic ligature detection and quote smartening.
This egg has no dependencies, but in order to use it you will need sxml-transforms.
Fancypants is a fairly simple set of functions plus an SXSLT ruleset to automagically convert SXML with plain-ASCII strings to typographically enhanced Unicode strings. Ligatures are added and quotes are educated ie, opening quotes are curled to the left while closing quotes are curled the other way. An example piece of SXML:
(sxml-apply-rules '(blockquote "\"The affable Estonian wasn't fired\"," " said the --- strangely afflicted ---" " flying monkey at the office.") (make-fancy-rules) (make-smart-quote-rules))
When rendered, looks like the following:
“The aﬀable Estonian wasn’t ﬁred”, said the — strangely aﬄicted — ﬂying monkey at the oﬃce.
Which looks like this without using fancypants:
"The affable Estonian wasn't fired", said the --- strangely afflicted --- flying monkey at the office.
As you can see, the quotes are curled correctly, the three minuses are converted to real emdashes (but this wiki renders them incorrectly, unfortunately) and the 'fi', 'ffl', 'fl' and 'ff' characters are replaced by ligatures that merge the characters in a nice way.
A word of warning: How the ligatures are displayed depends heavily on the particular font being used and the implementation of the fonts. For example, on a Mac, most MS Corefonts are apparently modified by Apple to support all ligatures, while the basic Corefonts by Microsoft (as found under Windows and many Unix installations) are lacking ligatures in most fonts. Consider this before using Fancypants' ligature capability (the fi and ff ligatures are reasonably safe to use in most cases, though). Testing on a number of platforms is, unfortunately, still a good idea while doing webdevelopment.
There are two rulesets: one for auto-conversion of ligatures and other types of character combinations to Unicode and one for smartening quotes. Both rulesets are generated by functions.
- (make-fancy-rules [exceptions] [character-map])procedure
Create a ruleset that performs ASCII->Unicode mappings for all entries in the character-map argument. character-map defaults to default-map (see below).
Please note that the order matters because the replacement algorithm employes a nongreedy search. Place prefixes of other matches after them and there is no problem. The symbols in exceptions are the tags to leave alone (ie, nothing below these is fancified) and defaults to default-exceptions (see below).
- (make-smart-quote-rules [exceptions] [quotes])procedure
Create a ruleset that educates quotes. quotes defines the strategy of how to translate quotes to smart quotes. See the documentation for all-quotes for more info on the structure of this argument. Please note that here, the order doesn't matter because the replacement algorithm uses simple regexes. The symbols in exceptions are the tags to leave alone. (ie, under these nothing has its quotes changes)
This constant is a list of all the tags (symbols) that are ignored by default.
These are: (head script pre code kbd samp @).
An alist of default ASCII sequences that are translated to ligatures by make-fancy-rules.
Contains mappings for 'ffi', 'ffl', 'ff', 'fi', 'fl' and 'ft'. The mapping for 'st' is intentionally left out because this ligature is too elaborate to use in body copy. You could easily define a ruleset for eg headings that does include the 'st' ligature (it's Unicode character fb06).
An alist of default ASCII punctuation sequences to translate to 'fancy' Unicode versions. Contains mappings for '...' => '…', '..' => '‥', '. . .' => '…', '---' => '—' and '--' => '–'.
An alist of default ASCII sequences to translate to 'fancy' Unicode versions. This contains several types of arrows. Useful mostly for mathematical texts and 'evaluates to' examples.
The quote characters in here to be translated by make-smart-quotes. Remove any you don't want to have handled.
The structure of an entry in this list is:
(pre match post how counts?)
pre is the part of the string that's before the quote to match, post is the string that is after the match. These are all irregex literals.
how is one of the following symbols: single, double, single-open, double-open, single-close or double-close.
counts? is a boolean describing whether the quote should influence the nesting of subsequent quotes or not. (ie, "isn't" => #f, since the ' is not a quote which matches a preceding quote or which is matched by a subsequent quote).
These procedures are used internally by Fancypants, but they are probably useful enough to export, so here they are.
- fancify string character-mapprocedure
Perform simple substitution of all ASCII character strings in the character-map alist to their Unicode character within string.
- smarten-quotes sxml quotes exceptionsprocedure
Smarten the sxml. Translates only the strings in the quotes argument, and skips all tag names in the exceptions list
- 0.4.1 Fix small irregex bug. Thanks to Mario for finding it
- 0.4 Updated to strict use irregex API procedures so it works with Chicken 4.6.2+
- 0.3 Ported to Chicken 4 and use irregex's SRE syntax so regexes can be composed
- 0.2 Added testsuite, removed useless syntax-case dependency
- 0.1 initial release.
Copyright (c) 2006-2011, Peter Bex (firstname.lastname@example.org) All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of author nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.