Static Website
Spoiler: After having experimented the PHP, return to the sources with a static generation of the website. For the curious, here’s how we did it. We will use pandoc to compile markdown to html and a makefile to orchestrate all operations. A few apache modules to simplify a few areas and, finally, GitLab to deploy automatically.
As you can imagine, in addition to materializing our presence in cyber-space, the Arsouyes’ website serves as a platform for experimentation.
At the beginning, we had choosed a static generation for the website
which had the advantage of not exposing application flaws. We then
developed several versions in PHP
with databases. Following
fashion and going back to basics, we have therefore returned to a static
system to offer you this new version of the arsouyes’website.
There are already many systems for generating more or less successful static sites and you can find a list on staticgen.com. On the other hand, they have their own conventions (hierarchy of directories, notion of blogs, tags and others) which are not really what we were trying to do here.
As we do not trust anyone and we like to experiment, we decided to make our own system, just to train. And as we like to share, here are the small solutions that we have implemented for this new version of the site.
The arsouyes’solution
Makefile
to
orchestrate the generation
As we wanted something simple, we naturally turned to
make
to compile the website from source. Captain
Obvious, it’s far from obvious as the vast majority of generators
go through higher level scripts like python, ruby or even nodejs.
So we have a makefile at the root that will orchestrate the whole system. With generic rules for producing different types of content and other more specific ones for particular cases.
The first thing I often do in a makefile is declare the target
all
to make it dependent on allatend
which, as
the name suggests, will be set at the end of the file. This allows me to
declare and build little by little everything I need. It also guarantees
me that calling make
with no argument will be equivalent to
make all
.
all: allatend
Then I continue by declaring macros for the directories where I work;
here the sources and the target directory. Using the ?=
allows me to override these macros in the command line and be able to
generate the content elsewhere (or generate other content).
SRC_DIR ?= src
DST_DIR ?= public
We can now tackle the generation of simple files, those that must be
copied. The first two macros list the content to be generated and the
third adds them to the global list. The recipe for these files is very
simple since you just have to copy them. To avoid using dependencies for
directories, I use mkdir -p
.
SRC_RAW_FILES := $(shell find $(SRC_DIR) -type f)
DST_RAW_FILES := $(subst $(SRC_DIR),$(DST_DIR),$(SRC_RAW_FILES))
ALL += $(DST_RAW_FILES)
$(DST_DIR)/% : $(SRC_DIR)/%
$(dir $@)
mkdir -p $< $@ cp
And as promised, at the very end of the makefile
, I add
the allatend
target:
allatend : $(ALL)
With this base, the makefile can generate all the static files (css, images,…).
Pandoc
to generate
html
To go further, and generate HTML pages without having to write tags, I use pandoc which is the all-purpose tool when you want to translate formats between them, and especially markdown.
Again, the pattern is similar. I start with the macros to find the source files and list the files to produce.
SRC_PANDOC_FILES ?= $(shell find $(SRC_DIR) -type f -name "*.md")
DST_PANDOC_FILES ?= $(subst .md,.html, \
$(subst $(SRC_DIR),$(DST_DIR), \
$(SRC_PANDOC_FILES)))
ALL += $(DST_PANDOC_FILES)
As with any build system, I then proceed with the common arguments
(PANDOC_FLAGS
) and specific dependencies
(PANDOC_TEMPLATE
). Here again, the use of ?=
allows me to override these macros directly on the command line when I
want to test variants.
PANDOC_FLAGS ?= \
-f markdown+definition_lists \
-t html \
--base-header-level 1 \
--email-obfuscation=javascript
PANDOC_TEMPLATE ?= assets/template.html
We then enter the heart of the matter, the generation of
html
files. As I want the site to be visible without a web
server, I use realpath --relative-to
to pass the path to
the root to the template.
$(DST_DIR)/%.html : $(SRC_DIR)/%.md $(PANDOC_TEMPLATE)
$(dir $@)
mkdir -p \
pandoc$< \
$(PANDOC_FLAGS) \
$(PANDOC_TEMPLATE) \
--template=$@ \
-o $(dir $@) $(DST_DIR)` --variable=root:`realpath --relative-to=
With this version, the html
pages are also generated and
we could stop there.
If you remember from the previous section, you will notice that the source files are also copied in prod. In our case, this is not an error because we find it practical to have access to the source directly online; you can for example send us corrections ;-).
mod_rewrite
to fix
links
It turns out that when I write links in markdown pages, I don’t like
having to add the .html
extension. At this point I only
have .md
files that haven’t been compiled yet and I prefer
my links to say where to look rather than where they will point when
everything is good.
As I cannot add these extensions to the compilation, I have to do it at runtime, i.e. by the apache web server. And that’s good because he has a module made for that: mod_rewrite.
This time it’s in the src/.htaccess
file. We start by
activating the mod_rewrite
:
RewriteEngine on
We can then add the rules to redirect visitors to the pages with the
extension. For that, I first check that the link is not already valid,
then that an html
file also exists. The third rule is there
to capture the visitor’s URL rather than paths in the file system.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteCond %{THE_REQUEST} " ([^ ]+) "
RewriteRule ^(.*)$ %1.html [R=301,L]
Similarly, I don’t like links that contain index.html
so
I’m going to change that as well. This time, rather than adding, we
remove part of the URL but the method is the same.
RewriteCond %{REQUEST_FILENAME} -f
RewriteCond %{THE_REQUEST} " (.*/)index\.html "
RewriteRule ^.*$ %1 [R=301,L]
As a bonus, I declare a few error pages that I wrote in markdown to have a minimum of consistency in the graphic charter of the site.
ErrorDocument 403 /error/403.html
ErrorDocument 404 /error/404.html
mod_autoindex
for
listings
Rather than having to maintain up-to-date listings by hand, or make a script that generates them during compilation, we have chosen here to leave the web server in charge via the autoindex (which works very well and is definitely more old school).
So we start by activating the generation of indexes.
Options +Indexes
Like old school, it still has its limits, we activate some very practical options to have a nicer rendering.
IndexOptions Charset=UTF-8 # To be sure of the encoding
IndexOptions FoldersFirst # Categories first
IndexOptions HTMLTable # <tables> rather than a <pre>
IndexOptions SuppressRules # Suppress lines around listing
IndexOptions NameWidth=40 # Truncate filenames
IndexOptions DescriptionWidth=* # Leave the length of the descriptions free
We continue by customizing the icons according to file extensions.
I’m not giving you all the addIcons
, you get the idea.
DefaultIcon /images/icons/folder_green.png
AddIcon /images/icons/folder.png ^^DIRECTORY^^
AddIcon /images/icons/ssl_certificates.png .crt
AddIcon /images/icons/file_extension_3gp.png .3gp
AddIcon /images/icons/file_extension_7z.png .7z
...
We can now get to the heart of the matter by really customizing the generated pages. For that, we will generate what comes before and after the listings and tell apache to use our files rather than generate its html code.
# Do not list files unnecessarily
IndexIgnore Readme.html favicon.ico *.md .[^.]*
# Do not add html code around the table
IndexOptions SuppressHTMLPreamble
# Insert the content of .Header.html before the table
HeaderName .Header.html
# Insert the contents of .Readme.html after the table
ReadmeName .Readme.html
The next happens in the makefile which will manage the compilation of
these two files .Header.html
and .Readme.html
when necessary. Indeed, if the directory contains an
index.md
file which will be compiled in
index.html
, the auto-index is not used by apache.
SRC_DIRS ?= $(shell find $(SRC_DIR) -type d)
DIR_WITH_INDEX ?= $(subst /index.md,, \
$(shell find $(SRC_DIR) -type f -name "index.md"))
DIR_WITHOUT ?= $(filter-out $(DIR_WITH_INDEX), $(SRC_DIRS))
Now that we have all the necessary directories, we can create the
list of .Header.html
and .Readme.html
.
AUTOINDEX_HEADERS ?= $(addsuffix /.Header.html,$(DIR_WITHOUT))
AUTOINDEX_READMES ?= $(addsuffix /.Readme.html,$(DIR_WITHOUT))
ALL += $(subst $(SRC_DIR),$(DST_DIR),\
$(AUTOINDEX_HEADERS) \
$(AUTOINDEX_READMES))
All that remains is to compile these files. We took the convention
that if a readme.md
file was present, it would be used to
generate the files for the auto-index.
$(DST_DIR)%/.Header.html : $(SRC_DIR)%/Readme.md $(PANDOC_HEADER)
$(dir $@)
mkdir -p \
pandoc$< \
$(PANDOC_FLAGS) \
$(PANDOC_HEADER) \
--template=$@ \
-o $(dir $@) $(DST_DIR)`
--variable=root:`realpath --relative-to=
$(DST_DIR)%/.Readme.html : $(SRC_DIR)%/Readme.md $(PANDOC_FOOTER)
$(dir $@)
mkdir -p \
pandoc$< \
$(PANDOC_FLAGS) \
$(PANDOC_FOOTER) \
--template=$@ \
-o $(dir $@) $(DST_DIR)` --variable=root:`realpath --relative-to=
And if no file exists, then we use a default file and the directory name as the title.
AUTOINDEX_README_DEFAULT ?= assets/Readme.md
$(DST_DIR)%/.Header.html : $(AUTOINDEX_README_DEFAULT) $(PANDOC_HEADER)
$(dir $@)
mkdir -p \
pandoc$< \
$(PANDOC_FLAGS) \
$(PANDOC_HEADER) \
--template=$@ \
-o $(dir $@) $(DST_DIR)` \
--variable=root:`realpath --relative-to=$(shell basename $(dir $@))
--variable=title:
$(DST_DIR)%/.Readme.html : $(AUTOINDEX_README_DEFAULT) $(PANDOC_FOOTER)
$(dir $@)
mkdir -p \
pandoc$< \
$(PANDOC_FLAGS) \
$(PANDOC_FOOTER) \
--template=$@ \
-o $(dir $@) $(DST_DIR)` \
--variable=root:`realpath --relative-to=$(shell basename $(dir $@)) --variable=title:
Symbolic links for the lattice
At this stage, the notion of lattice is still missing. Rather than a simple tree structure which, for each file, has only one path to get there, we prefer trellises. The documents therefore appear in several directories and to avoid copying them, symbolic links are used.
This time again, it’s the makefile that sticks to it but with a limitation. As it is not able to choose a rule according to the type of file but only according to the names, we had to choose an extension for our symbolic links.
So we start by generating the list of links in the source and those that we will have to generate.
LINKS_EXT ?= lnk
SRC_LINKS ?= $(shell find $(SRC_DIR) -type l -name "*.$(LINKS_EXT)")
DST_LINKS ?= $(subst $(SRC_DIR),$(DST_DIR),$(basename $(SRC_LINKS)))
All you have to do is copy the symbolic links, making sure to keep
the notion of link (option -d
).
$(DST_DIR)/%: $(SRC_DIR)/%.$(LINKS_EXT)
$(dir $@)
mkdir -p $< $@
cp -d
ALL += $(DST_LINKS)
For links to markdown files, it is more subtle since it must be taken into account that these files generate html pages and that logically, it is also necessary to make links for these generated files.
DST_LINKS_HTML ?= $(subst .md,.html,$(DST_LINKS))
$(DST_DIR)/%.html: $(SRC_DIR)/%.md.$(LINKS_EXT)
$(dir $@)
mkdir -p $(dir $<) $< \
target=`realpath --relative-to="s/.md/.html/"` ;\
| sed -e $$target $@
ln -f -s
ALL += $(DST_LINKS_HTML)
Since the web server is not able to redirect visitors when it sees
links (it can only follow them and provide the same content for both
URIs), we will generate a list of redirects for mod_rewrite
in the .htaccess
file.
$(DST_DIR)/.htaccess : \
$(SRC_DIR)/.htaccess \
$(DST_LINKS) \
$(DST_LINKS_HTML)
\
( $(SRC_DIR)/.htaccess ; \
cat $(DST_LINKS) $(DST_LINKS_HTML); do \
for i in $${i#$(DST_DIR)} ;\
source= target=`realpath -m $$i` ;\
target=$${target#$(DST_DIR_REAL)} ;\
echo "Redirect \"$$source\" \"$$target\"" ;\
done | sort -u \
) > $@
All that remains is to let the web server follow the links so that its automatic index generation takes them into account and that’s it.
Options +Indexes +FollowSymLinks
Gitlab
for
continuous deployment
More often called Continuous Delivery, it is quite simply a question of pushing the modifications of the website into production in a continuous manner, ie automatically and without human intervention.
For larger applications, it’s a big deal, but for a static site, it’s pretty simple. Static sites hosted on github and gitlab already allow this.
The sources of the arsouyes’website, as well as the makefile and other files are versioned via a gitlab server installed on our platform.
We use two branches: master which contains the production version and develop which contains the next version of the site and deployed on a test server. This allows us to play and experience the site in preview.
For deployment, we therefore use a runner gitlab on both servers and a configuration file at the root of the git repository.
As we keep it very simple, we only have one deploy
step
and one job per server.
stages:
- deploy
deploy-develop:
stage: deploy
only:
- develop
script:
- make cleanall
- make DST_DIR=public
- rsync -avz -c --delete-after public/ /public/arsouyes.org/www
tag:
- preprod
deploy-master:
stage: deploy
only:
- master
script:
- make cleanall
- make DST_DIR=public
- rsync -avz -c --delete-after public/ /public/arsouyes.org/www
tag:
- prod
Conclusion
After all these efforts, we therefore have a rather light system to generate our site with a simple call to make when we want to test locally and we leave it to gitlab to deploy in production automatically.
We could of course go further. We could generate the indexes at compile time to prevent the web server from having to do it or even add a verification step or even outright deployment in green/blue… But we are getting away from the goal: to have a light and practical system which does not take all our time.
The big counterpart of the static site, and we quickly suspect it, is that you will not be able to add content, comments or create discussion threads here. But it’s completely assumed: we’re not here for that and these services are already available elsewhere (e.g. twitter).