Move Markdown to HTML conversion into Prolog - squeeze - A static site generator that can put the toothpaste back in the tube.

commit 78c5475b805b496d32d9094c249276fdb01d1f61
parent 8d1aabd995ed7ada5ce5286217af6b6c80db779d
Author: St John Karp <contact@stjo.hn>
Date:   Tue, 16 Jun 2020 08:59:37 -0500

Move Markdown to HTML conversion into Prolog

Previously the call to convert Markdown to HTML was in the shell
script, which had the key disadvantage that we needed to separate
and then re-integrate the headers. The logical place to put this
has always been inside Prolog, the problem being that ISO Prolog
doesn't have any way to execute an external program.

I've worked around this limitation by writing an extensible
markdown_to_html predicate which can be implemented for different
Prolog dialects. Included are SWI-Prolog and GNU-Prolog.

The definition of the Markdown command itself can go in your
site.pl, allowing for different Markdown converters and different
arguments on a per-site basis.

Diffstat:
A dialects/gnu-prolog.pl  | 24 ++++++++++++++++++++++++
A dialects/swi-prolog.pl  | 20 ++++++++++++++++++++
M generate_rss.pl  | 2 +-
M helpers.pl  | 21 +++++++++++++++++----
M parse_entry.pl  | 12 +++++++++---
M readme.md  | 14 +++++++++++---
M squeeze.sh  | 37 ++-----------------------------------
M unsqueeze.sh  | 1 -

8 files changed, 84 insertions(+), 47 deletions(-)
diff --git a/dialects/gnu-prolog.pl b/dialects/gnu-prolog.pl
@@ -0,0 +1,23 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Predicate implementations for GNU-Prolog dialects.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+% Detect GNU-Prolog
+gnu_prolog:-
+	catch(current_prolog_flag(dialect, gprolog), _, fail).
+
+gnu_prolog:-
+	catch(current_prolog_flag(prolog_name, 'GNU Prolog'), _, fail).
+
+
+% GNU-Prolog-specific predicate to run an external Markdown tool.
+% The command itself should be specified in your site.pl.
+markdown_to_html(MarkdownEntryCodes, HTMLEntryCodes):-
+	gnu_prolog,
+	markdown_command(CommandList),
+	join(CommandList, ' ', Command),
+	exec(Command, StreamIn, StreamOut, _),
+	write_codes(StreamIn, MarkdownEntryCodes),
+	close(StreamIn),
+	read_file(StreamOut, HTMLEntryCodes),
+	close(StreamOut).
+\ No newline at end of file
diff --git a/dialects/swi-prolog.pl b/dialects/swi-prolog.pl
@@ -0,0 +1,19 @@
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+% Predicate implementations for SWI-Prolog dialects.
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
+
+% Detect SWI-Prolog
+swi_prolog:-
+	catch(current_prolog_flag(dialect, swi), _, fail).
+
+
+% SWI-Prolog-specific predicate to run an external Markdown tool.
+% The command itself should be specified in your site.pl.
+markdown_to_html(MarkdownEntryCodes, HTMLEntryCodes):-
+	swi_prolog,
+	markdown_command([Exe|Args]),
+	process_create(Exe, Args, [stdin(pipe(StreamIn)), stdout(pipe(StreamOut))]),
+	write_codes(StreamIn, MarkdownEntryCodes),
+	close(StreamIn),
+	read_file(StreamOut, HTMLEntryCodes),
+	close(StreamOut).
+\ No newline at end of file
diff --git a/generate_rss.pl b/generate_rss.pl
@@ -19,7 +19,7 @@ generate_rss(BuildDate, Filenames):-
 	sort(Articles, SortedArticles),
 	% Convert to RSS and write to stdout.
 	rss(BuildDate, SortedArticles, RSSCodes, []),
-	write_codes(RSSCodes),
+	write_codes(user_output, RSSCodes),
 	halt.
 
 
diff --git a/helpers.pl b/helpers.pl
@@ -56,12 +56,25 @@ append_lists([First|List1], List2, [First|Result]):-
 %   character, and write them to the current output stream one at
 %   a time. This is better than converting the whole list to an atom
 %   with atom_codes/2, which can trigger a segfault if the atom is too long.
-write_codes([]).
+write_codes(_, []).
 
-write_codes([X|Rest]):-
+write_codes(Stream, [X|Rest]):-
 	char_code(Char, X),
-	write(Char),
-	write_codes(Rest).
+	write(Stream, Char),
+	write_codes(Stream, Rest).
+
+
+% join(?List, +Separator, ?Atom).
+%   Join elements of a list into an atom separated by a separator.
+%   Written specifically as a join predicate, but should work as a split.
+join([], _, '').
+
+join([A], _, A).
+
+join([First|Rest], Separator, Result):-
+	join(Rest, End),
+	atom_concat(First, Separator, FirstPlusSeparator),
+	atom_concat(FirstPlusSeparator, End, Result).
 
 
 anything([]) --> [].
diff --git a/parse_entry.pl b/parse_entry.pl
@@ -7,6 +7,11 @@
 :- include('html.pl').
 :- include('markdown.pl').
 
+% Include files for dialect-dependent predicates.
+:- discontiguous(markdown_to_html/2).
+:- include('dialects/gnu-prolog.pl').
+:- include('dialects/swi-prolog.pl').
+
 % parse_entry.
 %	Read in an HTML file from stdin.
 parse_entry:-
@@ -27,7 +32,7 @@ parse_entry(Filename):-
 parse_html(HTML):-
 	page(EntryCodes, Title, Subtitle, Date, HTML, []),
 	markdown(EntryCodes, Title, Subtitle, Date, MarkdownCodes, []),
-	write_codes(MarkdownCodes),
+	write_codes(user_output, MarkdownCodes),
 	halt.
 
 
@@ -50,6 +55,7 @@ generate_entry(Filename):-
 %	Parse Markdown into an HTML file and write to stdout.
 generate_html(Markdown):-
 	markdown(EntryCodes, Title, Subtitle, Date, Markdown, []),
-	page(EntryCodes, Title, Subtitle, Date, HTMLCodes, []),
-	write_codes(HTMLCodes),
+	markdown_to_html(EntryCodes, HTMLEntryCodes),
+	page(HTMLEntryCodes, Title, Subtitle, Date, HTMLCodes, []),
+	write_codes(user_output, HTMLCodes),
 	halt.
 \ No newline at end of file
diff --git a/readme.md b/readme.md
@@ -4,7 +4,7 @@ A static site generator that can put the toothpaste back in the tube.
 
 ## What is this?
 
-A few months ago I lost the source files I used to generate my static website. Fortunately there was no irreparable data loss because I still had the generated site up on my server. The problem was now I needed to write a script that would extract all the articles into source files again, and then I'd have to reconfigure the site generator. Then I went, "Oh. This is a Prolog problem." (But then I love Prolog so every problem is a Prolog problem but I don't care. Fight me.) A Prolog problem is basically a set of rules and the logic can be run in either direction. I figured if I could write a Prolog program that described my HTML template then I could use the same code both to un-generate and re-generate the website.
+A few months ago I lost the source files I used to generate my static website. Fortunately there was no irreparable data loss because I still had the generated site up on my server. The problem was now I needed to write a script that would extract all the articles into source files again, and then I'd have to reconfigure the site generator. Then I went, "Oh. This is a Prolog problem." (But then I love Prolog so every problem is a Prolog problem. I don't care. Fight me.) A Prolog program is basically a set of rules and the logic that's guided by those rules can be run in either direction. I figured if I could write a Prolog program that described my HTML template then I could use the same code both to un-generate and re-generate the website.
 
 So the skinny is I wound up writing my own static website generator in Prolog. Well, the main components are in Prolog. I also wrote a bash script to make use of a bunch of common \*nix utilities (find, sed, grep, etc.) and to pipe output to some third-party programs where I needed them (Markdown and SmartyPants). Weirdest bit was that I just couldn't find anything decent to generate RSS feeds. I considered dropping the RSS all together, but I've spent enough time haranguing people for not supporting interoperable standards that I didn't want to be a hypocrite. I wound up writing my own RSS generator too, also in Prolog.
 
@@ -39,6 +39,8 @@ site.pl contains DCG definitions of this site's specifics, such as title, author
 
 	user_name --> "Harold Gruntfuttock".
 
+	markdown_command(['/usr/bin/hoedown', '--footnotes']).
+
 ## Use
 
 Generate a static website from Markdown sources:
@@ -47,4 +49,10 @@ Generate a static website from Markdown sources:
 
 Generate source files from a static website:
 
-	./unsqueeze.sh /home/user/website
-\ No newline at end of file
+	./unsqueeze.sh /home/user/website
+
+## Notes
+
+The Markdown converter is called from inside Prolog, so the path to your Markdown (and any arguments) is specified in site.pl. This allows you to have different Markdown converters or arguments on a per-site basis.
+
+Because ISO Prolog doesn't support making calls to external programs, I've implemented a compatibility layer that allows you to define the `markdown_to_html` predicate for whatever dialect of Prolog you happen to use. Included are compatibility predicates for SWI-Prolog and GNU Prolog.
+\ No newline at end of file
diff --git a/squeeze.sh b/squeeze.sh
@@ -5,18 +5,6 @@ SOURCE_DIR=source
 
 SITE_PATH=$1
 
-combine_headers () {
-	read -d "" HTML
-
-	if [ "$1" = "" ]; then
-		echo "$HTML"
-	else
-		echo "$1"
-		echo ""
-		echo "$HTML"
-	fi
-}
-
 # Copy everything that's not Markdown or HTML.
 # This will also create the folder structure for the destination Markdown files.
 rsync --archive --delete --verbose --exclude "*.md" --exclude "*.html" --exclude "feeds" "$SITE_PATH/$SOURCE_DIR/" "$SITE_PATH/$OUTPUT_DIR/"
@@ -42,25 +30,7 @@ find "$SITE_PATH/$SOURCE_DIR" -type f -name "*.md" |
 		if [ ! -f "$NEW_PATH" ] || [[ $(find "$file" -mtime -7) ]]; then
 			echo "$file"
 
-			# Get everything after the metadata.
-			if grep -q "^Title: " "$file"; then
-				HEADERS=$(sed "/^$/q" "$file")
-				MARKDOWN=$(sed "1,/^$/d" "$file")
-			else
-				HEADERS=""
-				MARKDOWN=$(cat "$file")
-			fi
-
-			echo "$MARKDOWN" |
-				# Convert Markdown to HTML.
-				markdown |
-				# Recombine with the metadata and hand it to Prolog.
-				combine_headers "$HEADERS" |
-				#gprolog --consult-file parse_entry.pl --consult-file "$SITE_PATH/site.pl" --entry-goal "generate_entry" |
-				swipl --traditional --quiet -l parse_entry.pl -g "consult('$SITE_PATH/site.pl'), generate_entry." |
-				# Some Prolog variants will output banners and "compiling" output no matter how nicely you ask them not to.
-				# Strip everything before the doctype declaration.
-				awk "/<!DOCTYPE/{i++}i" |
+			swipl --traditional --quiet -l parse_entry.pl -g "consult('$SITE_PATH/site.pl'), generate_entry('$file')." |
 				# Smarten punctuation.
 				smartypants \
 				> "$NEW_PATH"
@@ -87,8 +57,5 @@ ARTICLES=$(grep --recursive --include=\*.md "^Date: " "$SITE_PATH/$SOURCE_DIR" |
 	sed "s|,|','|g")
 BUILD_DATE=$(date +"%Y-%m-%d %T")
 # Parse the articles and generate the RSS.
-#gprolog --consult-file generate_rss.pl --consult-file "$SITE_PATH/site.pl" --entry-goal "generate_rss(\"$BUILD_DATE\", ['$ARTICLES'])" |
-swipl --traditional --quiet -l generate_rss.pl -g "consult('$SITE_PATH/site.pl'), generate_rss(\"$BUILD_DATE\", ['$ARTICLES'])." |
-	# Strip everything before the XML declaration.
-	awk "/<?xml/{i++}i" \
+swipl --traditional --quiet -l generate_rss.pl -g "consult('$SITE_PATH/site.pl'), generate_rss(\"$BUILD_DATE\", ['$ARTICLES'])." \
 	> "$SITE_PATH/$OUTPUT_DIR/feeds/rss.xml"
diff --git a/unsqueeze.sh b/unsqueeze.sh
@@ -27,7 +27,6 @@ find "$SITE_PATH/$OUTPUT_DIR" -type f -name "*.html" |
 		NEW_PATH=$(echo "$file" |
 			sed "s|^$SITE_PATH/$OUTPUT_DIR|$SITE_PATH/$SOURCE_DIR|" |
 			sed 's|.html$|.md|')
-		#gprolog --consult-file parse_entry.pl --consult-file "$SITE_PATH/site.pl" --entry-goal "parse_entry('$file')" |
 		swipl --traditional --quiet -l parse_entry.pl -g "consult('$SITE_PATH/site.pl'), parse_entry('$file')." |
 			# Unsmarten the punctuation.
 			sed "s|&nbsp;| |g" |

	squeeze A static site generator that can put the toothpaste back in the tube.
	git clone https://git.stjo.hn/squeeze
	Log \| Files \| Refs \| README \| LICENSE

A	dialects/gnu-prolog.pl	\|	24	++++++++++++++++++++++++
A	dialects/swi-prolog.pl	\|	20	++++++++++++++++++++
M	generate_rss.pl	\|	2	+-
M	helpers.pl	\|	21	+++++++++++++++++----
M	parse_entry.pl	\|	12	+++++++++---
M	readme.md	\|	14	+++++++++++---
M	squeeze.sh	\|	37	++-----------------------------------
M	unsqueeze.sh	\|	1	-