Static Website

tbowan
(en français)
September 27th 2017

Spoiler: After having experimented the PHP, return to the sources with a static generation of the website. For the curious, here's how we did it. We will use pandoc to compile markdown to html and a makefile to orchestrate all operations. A few apache modules to simplify a few areas and, finally, GitLab to deploy automatically.

As you can imagine, in addition to materializing our presence in cyber-space, the Arsouyes' website serves as a platform for experimentation.

At the beginning, we had choosed a static generation for the website which had the advantage of not exposing application flaws. We then developed several versions in PHP with databases. Following fashion and going back to basics, we have therefore returned to a static system to offer you this new version of the arsouyes'website.

There are already many systems for generating more or less successful static sites and you can find a list on staticgen.com. On the other hand, they have their own conventions (hierarchy of directories, notion of blogs, tags and others) which are not really what we were trying to do here.

As we do not trust anyone and we like to experiment, we decided to make our own system, just to train. And as we like to share, here are the small solutions that we have implemented for this new version of the site.

The arsouyes'solution

`Makefile` to orchestrate the generation

As we wanted something simple, we naturally turned to make to compile the website from source. Captain Obvious, it's far from obvious as the vast majority of generators go through higher level scripts like python, ruby or even nodejs.

So we have a makefile at the root that will orchestrate the whole system. With generic rules for producing different types of content and other more specific ones for particular cases.

The first thing I often do in a makefile is declare the target all to make it dependent on allatend which, as the name suggests, will be set at the end of the file. This allows me to declare and build little by little everything I need. It also guarantees me that calling make with no argument will be equivalent to make all.

all: allatend

Then I continue by declaring macros for the directories where I work; here the sources and the target directory. Using the ?= allows me to override these macros in the command line and be able to generate the content elsewhere (or generate other content).

SRC_DIR         ?= src
DST_DIR         ?= public

We can now tackle the generation of simple files, those that must be copied. The first two macros list the content to be generated and the third adds them to the global list. The recipe for these files is very simple since you just have to copy them. To avoid using dependencies for directories, I use mkdir -p.

SRC_RAW_FILES   := $(shell find $(SRC_DIR) -type f)
DST_RAW_FILES   := $(subst $(SRC_DIR),$(DST_DIR),$(SRC_RAW_FILES))
ALL             += $(DST_RAW_FILES)

$(DST_DIR)/% : $(SRC_DIR)/%
    mkdir -p $(dir $@)
    cp $< $@

And as promised, at the very end of the makefile, I add the allatend target:

allatend : $(ALL)

With this base, the makefile can generate all the static files (css, images,...).

`Pandoc` to generate html

To go further, and generate HTML pages without having to write tags, I use pandoc which is the all-purpose tool when you want to translate formats between them, and especially markdown.

Again, the pattern is similar. I start with the macros to find the source files and list the files to produce.

SRC_PANDOC_FILES ?= $(shell find $(SRC_DIR) -type f -name "*.md")
DST_PANDOC_FILES ?= $(subst .md,.html, \
                        $(subst $(SRC_DIR),$(DST_DIR), \
                            $(SRC_PANDOC_FILES)))

ALL              += $(DST_PANDOC_FILES)

As with any build system, I then proceed with the common arguments (PANDOC_FLAGS) and specific dependencies (PANDOC_TEMPLATE). Here again, the use of ?= allows me to override these macros directly on the command line when I want to test variants.

PANDOC_FLAGS    ?= \
                    -f markdown+definition_lists \
                    -t html \
                    --base-header-level 1 \
                    --email-obfuscation=javascript

PANDOC_TEMPLATE ?= assets/template.html

We then enter the heart of the matter, the generation of html files. As I want the site to be visible without a web server, I use realpath --relative-to to pass the path to the root to the template.

$(DST_DIR)/%.html : $(SRC_DIR)/%.md $(PANDOC_TEMPLATE)
    mkdir -p $(dir $@)
    pandoc\
        $< \
        $(PANDOC_FLAGS) \
        --template=$(PANDOC_TEMPLATE) \
        -o $@ \
        --variable=root:`realpath --relative-to=$(dir $@) $(DST_DIR)`

With this version, the html pages are also generated and we could stop there.

If you remember from the previous section, you will notice that the source files are also copied in prod. In our case, this is not an error because we find it practical to have access to the source directly online; you can for example send us corrections ;-).

`mod_rewrite` to fix links

It turns out that when I write links in markdown pages, I don't like having to add the .html extension. At this point I only have .md files that haven't been compiled yet and I prefer my links to say where to look rather than where they will point when everything is good.

As I cannot add these extensions to the compilation, I have to do it at runtime, i.e. by the apache web server. And that's good because he has a module made for that: mod_rewrite.

This time it's in the src/.htaccess file. We start by activating the mod_rewrite:

RewriteEngine on

We can then add the rules to redirect visitors to the pages with the extension. For that, I first check that the link is not already valid, then that an html file also exists. The third rule is there to capture the visitor's URL rather than paths in the file system.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteCond %{THE_REQUEST} " ([^ ]+) "
RewriteRule ^(.*)$ %1.html [R=301,L]

Similarly, I don't like links that contain index.html so I'm going to change that as well. This time, rather than adding, we remove part of the URL but the method is the same.

RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{THE_REQUEST} " (.*/)index\.html "
RewriteRule ^.*$ %1 [R=301,L]

As a bonus, I declare a few error pages that I wrote in markdown to have a minimum of consistency in the graphic charter of the site.

ErrorDocument 403 /error/403.html
ErrorDocument 404 /error/404.html

`mod_autoindex` for listings

Rather than having to maintain up-to-date listings by hand, or make a script that generates them during compilation, we have chosen here to leave the web server in charge via the autoindex (which works very well and is definitely more old school).

So we start by activating the generation of indexes.

Options +Indexes

Like old school, it still has its limits, we activate some very practical options to have a nicer rendering.

IndexOptions Charset=UTF-8 # To be sure of the encoding
IndexOptions FoldersFirst # Categories first
IndexOptions HTMLTable # <tables> rather than a <pre>
IndexOptions SuppressRules # Suppress lines around listing
IndexOptions NameWidth=40 # Truncate filenames
IndexOptions DescriptionWidth=* # Leave the length of the descriptions free

We continue by customizing the icons according to file extensions. I'm not giving you all the addIcons, you get the idea.

DefaultIcon /images/icons/folder_green.png
AddIcon /images/icons/folder.png ^^DIRECTORY^^
AddIcon /images/icons/ssl_certificates.png .crt
AddIcon /images/icons/file_extension_3gp.png .3gp
AddIcon /images/icons/file_extension_7z.png .7z
...

We can now get to the heart of the matter by really customizing the generated pages. For that, we will generate what comes before and after the listings and tell apache to use our files rather than generate its html code.

# Do not list files unnecessarily
IndexIgnore Readme.html favicon.ico *.md .[^.]*

# Do not add html code around the table
IndexOptions SuppressHTMLPreamble

# Insert the content of .Header.html before the table
HeaderName .Header.html

# Insert the contents of .Readme.html after the table
ReadmeName .Readme.html

The next happens in the makefile which will manage the compilation of these two files .Header.html and .Readme.html when necessary. Indeed, if the directory contains an index.md file which will be compiled in index.html, the auto-index is not used by apache.

SRC_DIRS       ?= $(shell find $(SRC_DIR) -type d)
DIR_WITH_INDEX ?= $(subst /index.md,, \
                      $(shell find $(SRC_DIR) -type f -name "index.md"))
DIR_WITHOUT    ?= $(filter-out $(DIR_WITH_INDEX), $(SRC_DIRS))

Now that we have all the necessary directories, we can create the list of .Header.html and .Readme.html.

AUTOINDEX_HEADERS ?= $(addsuffix /.Header.html,$(DIR_WITHOUT))
AUTOINDEX_READMES ?= $(addsuffix /.Readme.html,$(DIR_WITHOUT))

ALL += $(subst $(SRC_DIR),$(DST_DIR),\
           $(AUTOINDEX_HEADERS) \
           $(AUTOINDEX_READMES))

All that remains is to compile these files. We took the convention that if a readme.md file was present, it would be used to generate the files for the auto-index.

$(DST_DIR)%/.Header.html : $(SRC_DIR)%/Readme.md $(PANDOC_HEADER)
    mkdir -p $(dir $@)
    pandoc\
        $< \
        $(PANDOC_FLAGS) \
        --template=$(PANDOC_HEADER) \
        -o $@ \
        --variable=root:`realpath --relative-to=$(dir $@) $(DST_DIR)`

$(DST_DIR)%/.Readme.html : $(SRC_DIR)%/Readme.md $(PANDOC_FOOTER)
    mkdir -p $(dir $@)
    pandoc\
        $< \
        $(PANDOC_FLAGS) \
        --template=$(PANDOC_FOOTER) \
        -o $@ \
        --variable=root:`realpath --relative-to=$(dir $@) $(DST_DIR)`

And if no file exists, then we use a default file and the directory name as the title.

AUTOINDEX_README_DEFAULT ?= assets/Readme.md

$(DST_DIR)%/.Header.html : $(AUTOINDEX_README_DEFAULT) $(PANDOC_HEADER)
    mkdir -p $(dir $@)
    pandoc\
        $< \
        $(PANDOC_FLAGS) \
        --template=$(PANDOC_HEADER) \
        -o $@ \
        --variable=root:`realpath --relative-to=$(dir $@) $(DST_DIR)` \
        --variable=title:$(shell basename $(dir $@))
        
$(DST_DIR)%/.Readme.html : $(AUTOINDEX_README_DEFAULT) $(PANDOC_FOOTER)
    mkdir -p $(dir $@)
    pandoc\
        $< \
        $(PANDOC_FLAGS) \
        --template=$(PANDOC_FOOTER) \
        -o $@ \
        --variable=root:`realpath --relative-to=$(dir $@) $(DST_DIR)` \
        --variable=title:$(shell basename $(dir $@))

Symbolic links for the lattice

At this stage, the notion of lattice is still missing. Rather than a simple tree structure which, for each file, has only one path to get there, we prefer trellises. The documents therefore appear in several directories and to avoid copying them, symbolic links are used.

This time again, it's the makefile that sticks to it but with a limitation. As it is not able to choose a rule according to the type of file but only according to the names, we had to choose an extension for our symbolic links.

So we start by generating the list of links in the source and those that we will have to generate.

LINKS_EXT      ?= lnk
SRC_LINKS      ?= $(shell find $(SRC_DIR) -type l -name "*.$(LINKS_EXT)")
DST_LINKS      ?= $(subst $(SRC_DIR),$(DST_DIR),$(basename $(SRC_LINKS)))

All you have to do is copy the symbolic links, making sure to keep the notion of link (option -d).

$(DST_DIR)/%: $(SRC_DIR)/%.$(LINKS_EXT)
    mkdir -p $(dir $@)
    cp -d $< $@

ALL += $(DST_LINKS)

For links to markdown files, it is more subtle since it must be taken into account that these files generate html pages and that logically, it is also necessary to make links for these generated files.

DST_LINKS_HTML ?= $(subst .md,.html,$(DST_LINKS))

$(DST_DIR)/%.html: $(SRC_DIR)/%.md.$(LINKS_EXT)
    mkdir -p $(dir $@)
    target=`realpath --relative-to=$(dir $<) $< \
            | sed -e "s/.md/.html/"` ;\
            ln -f -s $$target $@

ALL += $(DST_LINKS_HTML)

Since the web server is not able to redirect visitors when it sees links (it can only follow them and provide the same content for both URIs), we will generate a list of redirects for mod_rewrite in the .htaccess file.

$(DST_DIR)/.htaccess : \
                        $(SRC_DIR)/.htaccess \
                        $(DST_LINKS) \
                        $(DST_LINKS_HTML)
    ( \
        cat $(SRC_DIR)/.htaccess ; \
        for i in $(DST_LINKS) $(DST_LINKS_HTML); do \
            source=$${i#$(DST_DIR)} ;\
            target=`realpath -m $$i` ;\
            target=$${target#$(DST_DIR_REAL)} ;\
            echo "Redirect \"$$source\" \"$$target\"" ;\
        done | sort -u \
    ) > $@

All that remains is to let the web server follow the links so that its automatic index generation takes them into account and that's it.

Options +Indexes +FollowSymLinks

`Gitlab` for continuous deployment

More often called Continuous Delivery, it is quite simply a question of pushing the modifications of the website into production in a continuous manner, ie automatically and without human intervention.

For larger applications, it's a big deal, but for a static site, it's pretty simple. Static sites hosted on github and gitlab already allow this.

The sources of the arsouyes'website, as well as the makefile and other files are versioned via a gitlab server installed on our platform.

We use two branches: master which contains the production version and develop which contains the next version of the site and deployed on a test server. This allows us to play and experience the site in preview.

For deployment, we therefore use a runner gitlab on both servers and a configuration file at the root of the git repository.

As we keep it very simple, we only have one deploy step and one job per server.

stages:
    - deploy

deploy-develop:
    stage: deploy
    only:
    - develop
    script:
    - make cleanall
    - make DST_DIR=public
    - rsync -avz -c --delete-after public/ /public/arsouyes.org/www
    tag:
    - preprod

deploy-master:
    stage: deploy
    only:
    - master
    script:
    - make cleanall
    - make DST_DIR=public
    - rsync -avz -c --delete-after public/ /public/arsouyes.org/www
    tag:
    - prod

Conclusion

After all these efforts, we therefore have a rather light system to generate our site with a simple call to make when we want to test locally and we leave it to gitlab to deploy in production automatically.

We could of course go further. We could generate the indexes at compile time to prevent the web server from having to do it or even add a verification step or even outright deployment in green/blue... But we are getting away from the goal: to have a light and practical system which does not take all our time.

The big counterpart of the static site, and we quickly suspect it, is that you will not be able to add content, comments or create discussion threads here. But it's completely assumed: we're not here for that and these services are already available elsewhere (e.g. twitter).

The arsouyes'solution

Makefile to orchestrate the generation

Pandoc to generate html

mod_rewrite to fix links

mod_autoindex for listings