chickadee » 7off

7off

7off is a Chicken Scheme program that can convert from Markdown to Gemini's text format, based on lowdown.scm.

It can be used as a library, with a procedure named 7off that reads current-input-port and writes to current-output-port. There are three keyword arguments: allow-wack-headers, default-alt, and polish.

A stand-alone, command-line binary named 7off is also included.

By default, it reads from standard in and prints to standard out, but there is also the --input-file (a.k.a. -i) and --output-file (a.k.a. -o) to set those.

    7off --input-file my-snazzy-example.md --output-file my-drab-output.gmi

There are five other options.

By default, it refuses to convert documents where there are skipped header levels. Use --allow-wack-headers (a.k.a. -w) to allow these.

It also warns when it has flattened any lists, when there are H4 or deeper, and when reference links references are missing. Use --disable-warnings (a.k.a. -q) to disable these warnings.

Changing straight quotes to curly quotes and consecutive hyphens to various dashes is off by default. Use --polish (a.k.a. -p) to enable it.

You can have a list of specific URLs to swap out in a file that contains, as an s-expression, a list of dotted pairs where each car is an url to change and each cdr is what to change it to.

Like this:

(("http://boring.and-so-on/here" . "gemini://much-more-inter.esting/stuff-here")
 ("https://another-quite-boring.tedious/url" . "gemini://much-better-mirror-for.gemini-capsule/here"))

And point to that file with --swap-urls (a.k.a. -s).

Finally, the default alt tag for source code snippets is "Code" without the preceding space.

 ```Code

Use --default-alt (a.k.a. -a) to change that.

The plan in the future is to, I dunno, use tags in YAML preamble or something to be able to set specific alt tags, to support arbitrary ASCII art.

Paragraph & Quote Semantics

Standard Markdown rules: Hardwrap text, and 7off will softwrap it, and please use double space at the end of lines where you want to preserve a hardwrap.

This is also true inside blockquotes.

Nesting Semantics

Nested lists, it flattens (and prints a warning that it did so).

As far as nested blockquotes, it currently does pass them through. The plan is to at least start printing a warning when it does this. Please don't start giving nested links and nested blockquotes special support in clients. Better practice is to break up the quote into separate blocks, giving attribution to each.

Gemini text supports only one level of list or blockquote.

I realized that Markdown links does have two qualities that Markdown to Gemini translators can make use of.

The first is something that HTML really doesn't do, normally. It's a reference location separate from the inline link.

The other is shared by HTML, but a kind of rarely used feature, and it's a title distinct from the element's text. Both inline links (a.k.a. "explicit links") and reference links can have a title.

Putting those two things together, it becomes kind of natural to turn

    Hi, my [link] that I just casually mention

    [link]: gemini://my.boring/url "I like this link"

to

    Hi, my link that I just casually mention

    => gemini://my.boring/url I like this link

I.e. use the reference location to determine where the link line should go, and the link title to determine what it should be called.

So reference links have their Gemini semantics kind of given.

When there is no title

When there is no title, the prose element text is used.

    Hi, my [link] that I just casually mention

    [link]: gemini://my.boring/url

to

    Hi, my link that I just casually mention

    => gemini://my.boring/url link

When there is no reference

This means links like this:

    Hi, my [link](gemini://my.boring/url) that I just casually mention

In a short text line or list line with just one link, let's turn the entire line into the link.

    => gemini://my.boring/url Hi, my link that I just casually mention

Otherwise, "extract" the link (keep the prose text in there), and then extracted links can show up before the next header (i.e. at the end of the section), or before the next non-extracted link (i.e. preceding them in the same link list), or at the end of the document.

Supported Markdown

7off currently supports a much narrower range of markdown than any other markdown to gemini converter I know. It currently doesn't support any extensions compared to Gruber style basic Markdown.

It doesn't even support the ``` thing, ironically for something you want to publish as gemini text. You need to indent pre blocks by four spaces. (The git repo has a simple Unix text filter to help with that, anti-backticks.scm. It's just a small stdin/stdout toy; pipe to sponge if you want to edit in place.)

This is not a philosophy statement on my end—I use the heck out of such extensions when available, and backtick support would solve the alt text problem. It's just that the upstream library, lowdown.scm, doesn't support them yet.

That said, lowdown.scm has good support for HTML elements in the markdown text. I plan to develop the support for that further.

Currently, as far as HTML elements go, primarily I properly strip some of the inlines like <cite>, <i> etc, so you can freely use them in your source document.

I support the <h1>, <h2> etc series, <del> just because I think it's cute (it emits the matching number of ^W gigraphs to indicate deleted text), and <table> with <th>, <tr> and <td> (although it currently can't understand colspan).

This version just outputs such tables as tab-separated values. That's in one sense a step back from the beautiful Unicode tables that md2gemini supports. Hopefully this is more accessible for low-vision technology until browers can catch up that make it easier to skip pre blocks.

In the future, I want to also support <dl>, <dt>, <dd>, <a>, and <img> elements.

The biggest flaw in 7off's markdown support currently is that it, unlike Gruber markdown but like kramdown, requires a blank line before blockquotes. In other words, it won't recognize this:

    Sandra wrote:
    > Whaddayamean, I thought Markdown's syntax was inspired
> by how people used to write email in the nineties?

To sorta compensate for this, in a sort of half-thought-through, iffy decision, I decided to remove blockquote-preceding blank lines in the Gemini output. If we can't have beautiful input, we shall at least have beautiful output.

Note that blank lines are considered part of paragraphs in Gemini text semantics. It's more idiomatic in Gemini to not need blank lines everywhere.

For example, this is also fine

    * non-link lines
    => /page and link lines
    => /home all mixed up
    * together

To force extra blank lines, you can use a markdown hr (three or more hyphens on a line).

Linting Philosophy

Taking all that together, you see that I try to be very strict and drab in what I output. I remove inline markup such as emphasis.

The strict and drab version of Gemini text I support here is that way for a reason—​accessibility primarily—​but this strictness is not something for Gemini clients to emulate.

To restate that: clients should not expect, want, or care about all documents being as strict and nerdy as the ones created here.

For example, clients and scrapers must not care about, or rely on, there not being any wack header levels.

That's part of the niceness of Gemini, it's really hard to mess up documents. Four supported line types, and optionally three advanced ones, and that's it. Seven types.

Documents that don't conform to the strictness that 7off aspires to are not wrong.

It's not spec-breaking to put *asterisks* around a word for example, even if that's something 7off deliberately removes.

The key to a successful protocol language is to make as few and simple demands as possible on each other.

Client writers, please never support, for example, *asterisked* words by highlighting or bolding them (that'd be "embracing and extending"), but, also, please don't bork on them. It's just text. It's just seven line types.

People can do whatever. It's fine.

History & Future

This is my third attempt at this; for the longest time I used md2gemini, with some contributions by me (uh, that got lost in their git history somehow) and I started tacking on more and more preprocessing and postprocessing.

Then, I tried making a Gemini writer for pandoc, in Lua. I didn't get very far with that approach and never put it in actual production.

Finally, I made 7off. There still issues to fix, but, this is something I use in practice on hundreds of pages.

The source code, including a license file (AGPL) and a "Hacking" text (explaining the architecture, the separate parsing passes etc) is available via

git clone https://idiomdrottning.org/7off

The name is sort of a, uh, it's a reference to Gemini having seven line types and to the original "markdown" name being a pun on discount pricing.