Using Pandoc to well below its fullest potential!

Pandoc is a Swiss Army knife that I exclusively use to cut cheese.

Every time I begin to write for this site, I open Codium and create a new file in my repo. Then I realise I meant to create a new folder, call myself an idiot and start over. This is the first step of my process, which currently concludes with running my anemic static site generator. I hope to soon replace with this a bespoke and iron proficient(?) site generator, but until then I will continue using Pandoc to fulfill my static site generator requirements.

My static site generator requirements:

Markdown Input

It can convert documents between myriad formats, most relevantly Markdown to HTML.

pandoc inputfile.md -o output.html --to=html5

Code Highlighting

There are plenty of tools that can do this, but Pandoc was the first I found that had code highlighting without having any depedencies like highlight.js, just saving the colour information in inline CSS

Any fenced Markdown code blocks with a language specified (like the following) will have the syntax automatically highlighted by Pandoc.

# Title

## Subtitle

Lorem ipsum amirite.  Hey look a fenced code block:

```Python
if x == 5:
    print("x is equal to 5")
```

Pandoc will turn this Markdown into the following HTML:

<h1 id="title">Title</h1>
<h2 id="subtitle">Subtitle</h2>
<p>Lorem ipsum amirite. Hey look a fenced code block:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> x <span class="op">==</span> <span class="dv">5</span>:</span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="st">"x is equal to 5"</span>)</span></code></pre></div>

The various HTML tag class spaghetti are used to highlight the code, though the CSS required to actually make this work is not included in the standard Pandoc command invocation. To actually generate the CSS that makes these classes work you need to use the --standalone flag, à la:

pandoc inputfile.md -o output.html --to=html5 --standalone

This will include some inline CSS, along the lines of:

code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */

This CSS is great, but only actually needs to be generated once. I copied this CSS in a single file. After copying this text, the --standalone flag can be omitted.

with theming?

The colours used in the highlighting can be customised! As with everything else in my life at the moment, I use Dracula, which thankfully already has a theme for Pandoc.

Once the theme is installed, Pandoc can highlight using the theme with the --defaults flag:

pandoc inputfile.md -o output.html --to=html5 --standalone --defaults path/to/theme/dracula.yaml

The HTML classes will be the same names regardless of the theme, but the inline CSS will be dependent on the theme. Again, this only needs to be run once, and the inline CSS can be copied and saved to be used later.

Simple Templating

Pandoc makes it simple, as long as you only need simple template insertions! HTML templates can be created easily. A basic template could look like the following:

<!DOCTYPE html>
<head>
    <link rel="stylesheet" href="/styles/code.css">
    <title>$title$</title>
</head>

<body>
    <header>
        <h1>$title$</h1>
        <p>$subtitle$</p>
        <p>$date$</p>
    </header>

    <article>
        $body$
    </article>
</body>

Note the inclusion of the stylesheet. code.css includes the CSS for code highlighting.

The body variable is automatically assigned to the content of the Markdown file, and the rest of the variables can be assigned manually using a YAML header in the Markdown file.

---
title: A Titular Title
subtitle: A subtitular subtitle
date: 2022-06-12
---

## Chapter 1: Where it all began

Lorem ipsum amirite.  Hey look a fenced code block:

```Python
if x == 5:
    print("x is equal to 5")
```

To use the template, you use the --template flag in the Pandoc invocation:

pandoc inputfile.md -o outputfile.html --to=html5 --template=path/to/template.html

Dependencies

None! The outputted HTML has no external dependencies.

Automated?

It's a pain typing out that command for each post - so don't.

#!/usr/bin/env bash

nonposts=("styles/", "templates/")

for d in */; do
    echo "$d"
    if ! [[ ${nonposts[*]} =~ $d ]]; then
        echo "is post"
        echo "generating index.html"
        name=$(echo "$d" | sed 's/.$//')
        echo "pandoc ${d}${name}.md -o ${d}index.html --to=html5 --template=templates/template.html"
        pandoc ${d}${name}.md -o ${d}index.html --to=html5 --template=templates/template.html
    fi
done
echo "done"

This script is designed to work on the following directory structure:

blog
├── basic-seleniumbasic
│   ├── adding-reference.png
│   ├── basic-seleniumbasic.md
│   ├── dotnet-frameworks.png
│   ├── index.html
│   └── seleniumbasic-installer.png
├── covid-updates
│   ├── covid-updates.md
│   ├── index.html
│   └── media_release.png
├── crafting-recipes
│   ├── crafting-recipes.md
│   └── index.html
├── file-descriptors
│   ├── file-descriptors.md
│   └── index.html
├── generate_posts.sh
├── hep-001
│   ├── hep-001.md
│   └── index.html
├── index.html
├── poor-mans-site
│   ├── index.html
│   └── poor-mans-site.md
├── templates
│   └── template.html
└── usernames
    ├── index.html
    └── usernames.md

It opens every directory (except those in nonposts), runs pandoc on a Markdown file with the same name as the directory, and outputs index.html based on templates/template.html.

The top level index.html is not modified by the script at all. This page presumably would link to the subdirectories and must be manually modified. Ugh.