Contributed by pitrh on Sat Mar 4 16:47:50 2017 (GMT)
from the mark me up before you go-go dept.
If you follow commits closely, via source-changes@ or otherwise, you may already know that mandoc has grown another useful feature. Ingo Schwarze sent us this very nicely formatted article about the new mandoc to markdown converter:
New mandoc -mdoc -T markdown converter
output formatter to OpenBSD-current,
for converting manual pages written in the
markup language to
The point is that in some contexts, documentation authors are
required by third-party policies to provide markdown versions of
This new output mode allows them to maintain only one copy of
their documentation in the well-known, simple, and high quality
mdoc(7) language while
still providing markdown versions for the purposes where those are
required, which may for example include pasting them into Wikis.
Thanks to Reyk@
(OpenBSD) and to
for suggesting such an output mode, and to
for contributing several ideas to this writeup.
The reason for providing this output mode is not that i
consider markdown a good, or even a half-decent, markup language.
Quite to the contrary, I hereby offcially declare it the shittiest
markup language i have seen so far.
Basically, it hasn't any strong point whatsoever, but the downsides
are numerous, scary, and cover practically every relevant aspect:
- Lack of expressiveness:
Markdown is pitifully weak and powerless even by its own
standard, which is: make formatting easy for anything that can be
expressed in a plain-text email.
For example, it doesn't provide any syntax for definition lists
<dl> in HTML,
.Bl -tag in mdoc(7),
.TP in man(7)) even though such lists can easily be
written in a plain-text email.
- Context sensitivity:
The syntax and semantics are extremely context sensitive.
Almost every token can take completely different meanings depending
on where it appears.
The syntax for emphasis by enclosing in asterisks or underscores
is terribly ill-designed because it gives rise to no end of ambiguity
— and not just the classic example of
but also confusion about start and end tags.
**bold***italic* works as expected, but
if you add another
**bold**, as in
**bold***italic***bold**, it may become
at least with some markdown compilers.
- Mixup of semantic and presentational markup:
You can't switch off filling (which is a presentational
manipulation) without getting
(which is semantic markup).
You can't get indentation (presentational!) without either
Admittedly, early versions of HTML had similar problems.
<i> was originally designed to be
presentational; in HTML 5, it is now properly semantic, and the
presentational aspects are relegated to CSS, where they belong.
Kristaps summed this up succinctly: "HTML 5 is (kinda) semantic;
markdown is not."
In theory, HTML code generated from markdown input could be
improved if parser maintainers would choose to generate HTML output
that is less encumbered with unintended semantic connotations.
But Kristaps tells me parser maintainers rarely do that, for two
Through inertia, most CSS files for markdown-generated HTML
now expect these cruddy HTML constructs.
And so do some tools that check the output of markdown-to-HTML
converters for "correctness", checking that the emitted tags agree
with tradition rather than checking whether they make sense
- Lack of independence:
Markdown is not at all a self-contained language.
It allows embedding arbitrary HTML code, both at the block and at the
That makes writing any parser for it very hard because you basically
have to include a full HTML parser and then add context sensitive
complications on top of it.
You also have to worry about all the security caveats of HTML.
Fortunately, i did not have to implement a markdown parser,
only needs to write markdown, not read it.
Reading markdown code is the job for
So far, so bad: you get all the downsides of HTML for sure.
But you get almost none of the benefits of HTML because markdown
imposes lots of arbitrary and crippling restrictions on how you can
For example, inside unfilled text, you can neither use named or
numbered character references, nor flow-level elements like
<em>, nor even native markdown formatting
You can't use any block-level HTML elements inside any text that
is to be indented.
You can't use any kind of markdown formatting inside block-level
As an example, even if you are willing to write definition lists
in HTML syntax, their list items cannot contain nested markdown
lists or displays, nor can the items of markdown lists contain
While markdown list elements can contain paragraph breaks, that no
longer works when the list as a whole is indented.
In that case, a paragraph break terminates the list.
And so on and so forth, no end of traps here...
Of course, you can work around such nesting restrictions by
writing all parent and child elements of the HTML block you want
to nest in HTML rather than in markdown syntax, even if markdown
syntax exists for these parent and child elements when they appear
But that mostly defeats the purpose of the whole exercise, making
you wonder why you ever chose markdown over HTML in the first
In addition, markdown was originally intended for autogenerating
exactly one target language: HTML. Having only one target language
in mind when designing a new meta-language is obviously already a
bad idea, but choosing HTML as a target language is even worse,
because HTML is notoriously difficult to translate into other
formats. So even leaving the many design failures listed above
aside, the basic approach of mainly targetting HTML already curtails
most of the potential benefit of inventing a simpler markup
- Syntax inspired by
A line break without a paragraph break requires whitespace
at the end of the preceding line, but the number of trailing blanks
is semantically significant: there must be at least two.
So, the two line endings "
foo " and
foo " have different meaning.
- Lack of standardization:
The most official reference manual for markdown is
original one written by John Gruber in 2004.
It is unmaintained since that time and leaves various ambiguities,
such that different parsers tend to parse input somewhat differently
In a language starved for features, that's particularly unfortunate
because you usually can't use any alternative syntax to avoid the
ambiguities because usually there aren't any alternatives at
- Lack of extensibility:
The language clearly wasn't designed with extensibility in
mind, and it shows.
That alone would not necessarily be an important downside: if a
language is well-designed in the first place, even if it is not
extensible, it easily beats an ill-designed extensible language.
But unfortunately, markdown is both ill-designed and lacks many
important features, so this language would really need extensibility
to become usable at all.
Consequently, many different people went ahead and implemented
("designed" would probably be the wrong word here: i don't think
software design is part of the picture when it comes to markdown)
their own ad-hoc extensions.
Some of them are no doubt useful, but all the various versions of
the language that exist in the wild are now incompatible with each
Some people say this is the main weakness of markdown as a langauge,
but i don't agree.
Sure, it is one annoying weakness, but there are many others that
are even worse.
For example, i ranted above about the lack of definition lists.
PHP Markdown Extra, Python Markdown, and pandoc appear to support
a syntax for them, and so may Github, although it doesn't appear
to be documented for Github.
To avoid the mess of extensions that may or may not be supported,
only generates code according to John Gruber's original specification
and does not rely on any extensions.
Of course, that does not avoid the danger that some plain text
in the markdown code generated from your document may accidentally
trigger some extension handling in whatever markdown compiler you
In case you wonder —
here is how i think that a few other markup languages compare:
- LaTeX: Very good.
Very powerful in the first place, and very easy to extend.
Extension mechanisms are so strong that it is almost usable as a
Little context sensitivity and ambiguity.
The syntax is slightly cumbersome, but still more palatable
The excessive size of the TeX
Live distribution is a serious nuisance.
official death of the
project implies that LaTeX has become irrelevant for software
- roff(7): Very good.
Very simple and friendly syntax, works well even with
Reasonably powerful in the first place, and the extension mechanisms
are very powerful.
Some context sensitivity, but not too bad.
Unfortunately, while extensibility is powerful, it requires unusual,
fragile, and sometimes
downright ugly syntax — but that is of little importance
because it rarely affects end-users.
- HTML: Acceptable.
The basics are very easy to learn.
But HTML without CSS
is of limited use, and CSS is terribly overengineered, while at the
same time lacking important features — the landmark symptom
of a botched design.
Even though designed for extensibility, that is almost unusable in
and its sub-languages are among the most hostile languages
on the planet.
- DocBook: Abominable.
Overengineered beyond absurdity, ridiculously slow toolchain, syntax
encumbers the source code to the point of making it unreadable.
The man(7) output of the
standard tool chain is by far the lowest quality autogenerated man
code of any tool that i'm aware of.
Absolutely never use use DocBook for anything.
As a language, in theory, it is probably better designed than
markdown, but that is irrelevant because it is even more unusable
than markdown in practice.
Abominable. See above.
- There are a few others (for example AsciiDoc, reStructuredText,
...) but i dare not judge them because i have too little experience
Oh did i really mention that? How stupid of me.
It's not April 1st yet.
So, the bottom line is: Do not use markdown. Do not use DocBook.
Do not use Texinfo. Use
to maintain your source documents, and
to convert them when needed (including to simple PostScript
or PDF output), or use
if you need to convert them to high-quality PostScript
or PDF output.