Hot-wiring the lisp machine
I know you're thinking about it.
The modern web is choking on its own exhaust. Somewhere along
the line, we traded the elegance of plain text for gigabytes of node_modules/,
labyrinthine JavaScript frameworks, and bloated Static Site Generators that
insist you learn their esoteric templating languages just to write a blog
post. Worse yet, some even force you to use the mouse. Gross.
I know you're thinking about it.
I didn't want another framework. I refused to hoard dependencies like some doomsday prepper. These held no appeal to me. I wanted the comfort of my text editor.
Specifically, I wanted Emacs. It's a well-documented reality that it is a single-threaded Lisp machine masquerading as a text editor. Org-mode is a markup language with an Abstract Syntax Tree mapped directly into your prefrontal cortex, not yet another cheap Markdown knockoff dressed in hipster syntax. Organizing your life with org-mode is a baseline, not an overstatement, not even an exaggeration. Absolute maniacs run their finances, their spreadsheets, and their fragile grip on reality out of it.
I was already deep in the trenches of org-mode, juggling giant agendas and interconnected Zettelkasten-style notes, the works. Bending my workflow or reorganizing my notes to appease the rigid directory structure of some flavor-of-the-month SSG was a non-starter. I just wanted to render my thoughts into some damn HTML.
I may generate a static website out of these notes at some point, not sure when though.
That line had been hovering in my README as a taunt for the better part of five years. It was time to call my own bluff. The goal was simple, perhaps dangerously so: publish my notes, written in org-mode, using zero external dependencies.
Like many before me, I stumbled into the gravitational pull of org-publish.
The allure of a native publishing solution built right into Emacs was
intoxicating. I spent hours tweaking, pruning, and watering my
org-publish-projects-alist, only to smash face-first into the cold, harsh
reality of its brittle API. For
all its promises of infinite extensibility, the publishing engine felt
agonizingly spartan. My code devolved into an abhorrent mass of duct tape and
fragile hooks, leaving me miles away from the HTML output I was after.
I lost count of the battles fought against the templating function, the broken
URLs, and the damned sitemap11
Sitemap in org-publish parlance refers to the index page listing all
posts.
. Needless to say, building a paginated
index was an exercise in futility that felt less like programming and more
like negotiating a hostage release with a brick wall. Extensibility was a
myth; it was turtles all the way down.
Perhaps "no dependencies whatsoever" was a suicide pact. I re-evaluated my options, seeking something that rode natively on Emacs's composability: Disclaimer: This is not to say that any of these are bad. They simply don't fit my – admittedly draconian – constraints. If it sounds like I'm shitting all over the hard work of open-source contributors, I'm not. This is hyperbole meant to illustrate my descent into madness. No offense is meant.
Disclaimer: This is not to say that any of these are bad. They simply don't fit my – admittedly draconian – constraints. If it sounds like I'm shitting all over the hard work of open-source contributors, I'm not. This is hyperbole meant to illustrate my descent into madness. No offense is meant.
- Worg? A wiki with a nice publishing script but not a generator. Pass.
- Hugo+ox-hugo? Forcing Org through a Markdown proxy defeats the purpose entirely. Sacrilege.
org-publish? We already established that was unworkable. Trying to hide the gruesome details by wrapping it in a custom layer equates to putting lipstick on a pig.- blorgit? Untouched in 14 years. I can't handle the Ruby bloat or the archeology.
- jorge? Written in Go. Bypasses the native Emacs export process entirely. Next.
- org-webring? Just a static RSS/Atom generator. Too narrow.
- org-site? "It's not designed to be a so-called static blog generator," says its own README. Unfinished and forces a specific directory layout. No thanks.
- … a graveyard of abandoned Go binaries and half-baked scripts.
- Weblorg? … Hmm. Now this looks interesting.
Weblorg. Nifty little thing. It ticked almost every box. Unopinionated. Composable. Just the tasty vibe I was hunting for:
;; route for rendering each post (weblorg-route :name "posts" :input-pattern "posts/*.org" :template "post.html" :output "output/posts/{{ slug }}.html" :url "/posts/{{ slug }}.html") ;; route for rendering the index page (weblorg-route :name "blog" :input-pattern "posts/*.org" :input-aggregate #'weblorg-input-aggregate-all-desc :template "blog.html" :output "output/index.html" :url "/") ;; route for static assets that also copies files to output directory (weblorg-copy-static :output "static/{{ file }}" :url "/static/{{ file }}") ;; fire the engine and export all the files declared in the ;; routes above (weblorg-export)
Beautiful.
Yet, it possessed the distinct, irritating friction of a pebble in a shoe; its dependency on string templating grated on my nerves. Why reinvent a mustachioed, jinja-flavored square wheel when I already had the ultimate Lisp machine purring beneath my fingertips? I didn't want another templating engine. I wanted pure, unadulterated Elisp. I demanded the raw power of Emacs.
So I scrapped the compromises. I exhumed the rotting corpse of my failed
org-publish wrapper, opd – the dumb org-publish distribution, and decided to
engineer my own way out of hell. Weblorg possessed the perfect architectural
skeleton, but its organs were weak. A violent transplant was in order.
This is the story of how I ripped out its core, broke it entirely, and rebuilt it into a mathematically pure, two-pass "compiler". Long-ass introduction that will hopefully get you into the swing of things for this long-ass article. Buckle up. We're diving deep into Elisp weeds and parentheses.
Long-ass introduction that will hopefully get you into the swing of things for this long-ass article. Buckle up. We're diving deep into Elisp weeds and parentheses.
The delusion of naïveté
Every SSG starts with the exact same assumption:
"I'll just read a file, swap out some variables, and write some HTML."
– Famous last words
My early prototypes were nothing short of filthy. I started by ripping out
Weblorg's core and only dependency, tempel, and substituted it with a
homegrown string-replacement pipeline using standard format specifiers. %s
became the slug, %t became the title, and %c was the compiled HTML content.
I spun up a fleet of with-temp-buffer instances, injected the raw text,
dumped the output, and called it a day.
It was a brute-force data pipeline that boiled down to this monstrosity:
(format-spec template `((?p . ,(or (org-html--build-pre/postamble 'preamble info) "")) (?P . ,(or (org-html--build-pre/postamble 'postamble info) "")) (?t . ,title-fragment) (?d . ,date-fragment) (?T . ,tags-fragment) (?r . ,reading-time) (?c . ,contents)))
Which was crudely mashed into an HTML template:
%x %D <html%a> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width,initial-scale=1"> <title>%t</title> %O %H </head> <body> %p <main id="content"> <article> <h1 class="title">%t</h1> <div class="page-meta">%d 🞄 %T 🞄 %r</div> %c </article> </main> <div class="back-home"><a href="/">← all articles</a></div> %P </body> </html>
It worked… until it didn't.
The architecture shattered the second I strayed off the beaten path. What if
a post had a subtitle? Hardcode a %S. What if I wanted a custom canonical
URL? Hardcode a %U. God forbid I wanted to inject an inline <style> block
containing a CSS percentage like width: 100%; format-spec would mistake it
for a missing variable and vomit a backtrace all over my screen.
It felt too entrenched, hopelessly rigid. Every new piece of metadata required hardcoding another arbitrary format specifier. I fought a losing battle against the boundary between dynamically evaluated Lisp and dead text.
Strings are dumb. They have no context.
So I did the next (il)logical thing: I doubled down. If static strings were dumb, I'd make them smart. I started binding those string specifiers to evaluated closures in a desperate attempt to smuggle dynamic Lisp execution into a flat text pipeline.
I was so engrossed in my stupidity that I was deaf to the laments of the code actively fighting back, screaming at my folly. It was a spectacular failure; I was trying to build a primitive, context-blind string formatter inside an editor that already possessed one of the most sophisticated, yet ergonomic AST parsers on the planet. In other words, I was bolting a warp drive22 🖖 onto a horse cart.
Meanwhile, the real solution had been sitting right under my nose the entire time, tapping its foot, waiting for me to remember to get the hell out of its way, eyeing me with a look that needn't much interpretation: I'm tired of your shit.
I needed a paradigm shift, so I threw the whole thing away…
Embracing the heritage
… well, not entirely.
The foundation, the core routing logic inspired by Weblorg, was sound. It just took a moment to decipher the whispers between the screams; what the code was really trying to tell me was that my string-mashing idiocy needed to be dragged out back and shot. The epiphany hit me like a Samsung truck:33 If you know, you know. I shouldn't even be touching the damn HTML.
I tossed the string templates into the fire. Instead of passing variables to
a magic string, I equipped the routes with a :template parameter expecting a
pure Lisp closure instead of a file path or a string. I stopped trying to
wrap the content. I handed the user the raw, parsed Org context and told
them: "Here is a temporary buffer. Knock yourself out."
The default :template collapsed from a massive HTML skeleton into a thing of
minimalistic beauty:
(lambda (ctx) (when-let ((path (alist-get 'abspath ctx))) (insert-file-contents path)))
This alone unlocked an entire dimension of extensibility. I could programmatically fill the temporary buffer with whatever I wanted using nothing but Lisp. The simplest case consisted of dumping the file in, but the ceiling had vanished; the possibilities were endless.
Getting the data into the pipeline was only half the battle, and shoving text into a temporary buffer doesn't magically get the damn HTML onto the disk. The pipeline needed an exit strategy.
Enter the :exporter parameter, the cherry on top. Its default value? An even
shorter lambda function that handed the final rendering back to org-export:
(lambda (_)
(org-export-as 'opd))
Hallelujah. The heavens parted and the holy grail of the org-centric design
I was chasing revealed itself: infinite extensibility through extreme
minimalism. Because opd was now just a routing layer feeding data into the
standard Org export pipeline, I didn't have to invent a bespoke
DSL, wrestle with hooks, or manage
bloated plugin registries. I could just use
org-export-define-derived-backend, inherit from a minimal opd base backend,
and write native ox.el transcoders. Margin notes? Custom blocks? RSS feeds?
They came for free.
I had successfully shrunk the engine's surface area to near-zero. By
standing on the shoulders of ox.el, the entire public API collapsed into
just four primitives:
opd-siteopd-routeopd-assetsopd-export
But the real black magic? Cross-route linking.
Stitching the grid
I had the HTML output, but isolated pages fall short of a website. I needed
a web. Wiring up paths manually is a task for plebs. Classic
SSGs trap you into hardcoding deployed
paths into your source text, breaking local navigation, or worse, they force
your physical src/ directory to mirror the deployed URL structure exactly.
Frankly, I refused to be a slave to rigid directory layouts. I wanted to
drop a standard [[file:somewhere-far-away/some-file.org]] link into a post,
have Emacs follow it natively while I was editing, and trust the engine to
forge the exact permalink during build, regardless of its provenance or
route affiliation.
When you're trapped in a sterile, vacuum-sealed test chamber with a single routing pipeline, this works naturally. But weaving that web across the labyrinth of exponential crisscrossing links between entirely disparate, virtual routes? That required cartography. I needed to build a map before I could walk the territory; I needed to borrow the concept of a "two-pass compiler."
Pass 1 is the scouting mission: it discovers the files, evaluates their
destinations, and seeds a master URL :registry shared at the site
level.44
Did I mention you could have multiple sites running in the same engine?
Pass 2 executes the render. Having surrendered HTML generation
back to Org, all I had to do was slip a custom link transcoder into our
derived backend; a sleeper agent, if you will, intercepting Org's native
link resolution before it could act.
(defun opd-translate-link (link desc info) "Resolve cross-route Org links using the site's URL registry. LINK is the link to resolve. DESC is the description of the link. INFO is the plist used as a communication channel." (if-let* ((type (org-element-property :type link)) (path (org-element-property :path link)) ((and (string= "file" type) (string-suffix-p ".org" path))) (base (plist-get info :opd-base-url)) (site (opd--site-get base)) (absp (expand-file-name path)) (url (and site (gethash absp (gethash :registry site))))) ;; Use native `org-mode' link resolution to handle all ;; cases. Extend, not replace. (org-element-put-property link :path url) ;; Fall back to native `org-mode' link resolution. ;; FIXME: Goodness this is dirty. (replace-regexp-in-string "\\(src\\|href\\|poster\\)=\"\\(?:file://\\)\\([^\"]+\\)\"" "\\1=\"\\2\"" (org-html-link link desc info))))
Yes, it's an ugly regex. No, I am not apologizing. It's a load-bearing hack.
Yes, it's an ugly regex. No, I am not apologizing. It's a load-bearing hack.
By dynamically binding default-directory inside our temporary template
buffer right before calling org-export-as, I was gaslighting Emacs into
natively resolving the relative path against the source file's directory.
All that's left for the transcoder is to query the registry, swap the link,
and step back into the shadows. It bridges the gap between separate routes
transparently, without you ever having to think about it. ✨ Magic ✨
But where did that
:opd-base-urlcome from? That's not something you'd usually find in the standardinfoplist55 Case in point: this very link points to another route entirely. See what I did there? .
Magic requires sleight of hand. Because the org-export process operates in a
vacuum, completely detached from opd, I had to figure out a way to smuggle
the site context across the boundary and into the transcoder. After dumping
the content into the temporary buffer, I sneak a tracking bug directly into
the file's metadata, a tracer round that correlates the file back to its
route for link resolution.
(with-temp-buffer (let ;; Setting the `default-directory' to the output ;; path of the file naturally makes the export ;; come together. ((default-directory (if abspath (file-name-directory abspath) base)) (buffer-file-name (or abspath (expand-file-name "index.org" base)))) ;; Look, it's the template function! (funcall template context) (goto-char (point-min)) ;; Place point after potential property drawers to avoid breaking them. (when (looking-at org-property-drawer-re) (goto-char (match-end 0)) (forward-line)) ;; Here's the sleight of hand: Inject state directly into the buffer. (insert "#+OPD_BASE_URL: " (gethash :base-url site) "\n") ;; Look, it's the exporter! (let ((rendered (funcall exporter context))) (mkdir (file-name-directory final) t) (write-region rendered nil final))))
The Org AST eats the #+ keyword natively,
ferrying the injected data down the pipeline directly into the transcoder's
info channel.
(org-export-define-derived-backend 'opd 'html ;; Greasing Org's paw right here. :options-alist '((:opd-base-url "OPD_BASE_URL")) :translate-alist '((headline . opd-translate-headline) (link . opd-translate-link)))
Just like that, magical cross-route linking, achieved.
Containing the blast radius
I was riding high, and as the routing table scaled, a new, insidious villain emerged: State bleeding.
Emacs, and by extension org-mode, is a mansion built on a precarious
foundation of global variables. A sprawling estate wired to a single fuse
box overloaded with exposed copper. Turn on the toaster in the kitchen, and
the master bedroom catches fire. Handling a ToC on the main blog alongside an RSS feed devoid of one required
finesse. Setting org-export-with-toc bleeds over, contaminating the XML.
The naïve approach of manually let-binding variables around the route fails
with a pathetic whimper.
(let ((org-export-with-toc t)) ;; Some route with ToC. (opd-route ...)) ;; Another route without. (opd-site ...) (opd-export)
Because opd-route merely registers the configuration, the variables are
actually evaluated much later, deep inside the bowels of the opd-export
loop. By the time the export spits out HTML, your pristine let block has
long since evaporated into the ether. Your local overrides are dead on
arrival.
Okay, fine. I could set the option per-file instead.
Right. Good luck managing that metadata across ten thousand Zettelkasten files without losing your mind.
The fallback was to let-bind variables manually inside every single custom
exporter lambda. Sure, that works. But it feels dirty, disgustingly
redundant. You're writing boilerplate closures just to flip switches
unrelated to opd. Nah. We could do better. I needed isolated execution
chambers, not the five stages of grief.
The solution lay buried deep in the Emacs source code: an arcane Common Lisp
macro called cl-progv. By introducing an :env property to the route, I
could command the engine to dynamically bind and unroll execution state on a
strict, per-route basis.
(defmacro opd--with-env (route &rest body) "Evaluate BODY with dynamic bindings specified in ROUTE's `:env'." (declare (indent 1)) (let ((r-var (make-symbol "route"))) `(let* ((,r-var ,route) (env (plist-get ,r-var :env)) (vars (mapcar #'car env)) (vals (mapcar #'cdr env))) (cl-progv vars vals ,@body))))
Then, inside the orchestrator, instead of blindly executing the export trigger:
(funcall (plist-get route :export) route)
I drop it in the containment field:
(opd--with-env route
(funcall (plist-get route :export) route)))
I could wrap the entire build script in a master let block for global
defaults and declare an entire suite of variables locally to each route to
surgically override the globals with zero side effects. This trick turned out to be our savior against a deeply buried, hardcoded
quirk in ox-rss.el, where it blindly prepends ./ to your permalinks unless
you explicitly define org-html-link-home.
This trick turned out to be our savior against a deeply buried, hardcoded
quirk in ox-rss.el, where it blindly prepends ./ to your permalinks unless
you explicitly define org-html-link-home.
The stack unrolls, the state resets, and the bulkhead holds.
(let ((org-export-with-toc nil) ;; A bunch of other globally applied defaults. (org-html-html5-fancy t)) ;; Some route for posts (opd-route ...) ;; The RSS route with its local override (opd-route :name "rss" :pattern "*.org" :output "public/rss.xml" :url "/rss.xml" :env '((org-html-link-home . "http://localhost:8080")) ; < The containment field :exporter (lambda (_) (org-export-as 'rss))))
What happens in the route, stays in the route.
Warring on permalinks
I had quarantined the state, everything started working perfectly, tests were passing, greenery all around. I assumed the architecture was bulletproof; I spawned a local HTTP server to witness the fruits of my labors. Majestic. I clicked on a link to a post and… Huh?
Why the hell was I staring at a raw XML feed like an idiot?
I dug into the pipeline and realized my execution order was entirely backwards. The engine was computing and registering URLs before evaluating filters.
(defun opd--route-collect-and-aggregate (route) "Find and aggregate input files for a ROUTE." (let* ((filter (plist-get route :filter)) (base (plist-get route :base-dir)) (site (plist-get route :site)) ;; Find all files that match the input pattern but don't ;; match the exclude pattern. (files (opd--find-source-files (plist-get route :name) base (plist-get route :pattern) (plist-get route :exclude))) ;; Parse Org-mode files. (parsed-files (mapcar (lambda (input-path) (let* ((parsed (funcall (plist-get route :parser) input-path base site)) (route-name (plist-get route :name)) (tmp (cons `(route . ((name . ,route-name))) parsed)) (url (opd--render-route-prop route :url tmp))) ;; Register the URL to resolve it later. (puthash input-path url (gethash :registry site)) ;; Add these properties to each parsed Org-mode. (append parsed `((route . ((name . ,route-name))) (url . ,url))))) files)) ;; Apply filters that depend on data read from parser. (filtered (if (null filter) parsed-files (seq-filter filter parsed-files)))) ;; Aggregate the input list into either a single group with ;; all the files or multiple groups. (funcall (plist-get route :aggregate) filtered)))
I flipped the script immediately: evaluate the filter first, touch the registry second. If a file doesn't have clearance, it gets dropped before it even looks at the routing table.
(defun opd--route-accepts-p (route file site) "Parse FILE and return its context if it exists and satisfies ROUTE's filter." (when (file-exists-p file) (let* ((parsed (funcall (plist-get route :parser) file (plist-get route :base-dir) site)) (context (append `((route . ((name . ,(plist-get route :name))))) parsed)) (filter (plist-get route :filter))) (when (or (null filter) (funcall filter context)) context))))
It was mathematically correct, but it only stopped the superficial bleeding. The true, underlying rot remained. The engine was still silently cannibalizing itself.
During Pass 1, it discovers files and registers their computed URLs. But
what happens when you have a blog route that matches *.org files, and an rss
route that also matches *.org?
They rip each other to shreds.
The blog route would scan a file, stake its claim, and carve /blog/my-post/
into the registry. A bit later, the rss route would descend like a vulture,
eviscerate the original entry, and brand the exact same file as /rss.xml. It
was a routing table bloodbath, a mess of conflicting closures silently
slitting each other's throats in the dark. Internal links were resolving to
XML files. The registry was poisoned.
The actual fix was a stroke of genius, as smooth and satisfying as
swallowing a roll of sandpaper: a new :canonical flag.
I had to learn architectural humility. Not all routes are created equal. I
introduced the concept of link ownership. Only a primary route, like the
main blog route, had the authority to mint a URL and write it to the
registry. Secondary routes, your rss and tag lists, were stripped of their
write-access. They became unprivileged scavengers, strictly read-only.
With the carnage stopped, I turned to the configuration API. It was metastasizing into a landfill.
Combinatorial explosion
I was still hoarding Weblorg-era functions like opd-input-filter-drafts and
opd-input-aggregate-all-desc. It was functional, but aesthetically
offensive. It lacked mathematical purity and the convenience of dropping
something that just works.
Passing hardcoded function symbols around is a rigid, brittle, and unimaginative way to build an API; you either bloat the engine with endless primitives, or you shove the burden onto the user. I needed a grammar, a syntax of pure intent, not a script. So, I gutted the monolithic filters and abstracted the logic into boolean combinators and higher-order closures. I separated the logic of combination:
opd-filter-anyopd-filter-allopd-filter-omit
… from the logic of matching:
opd-match-tagopd-match-propopd-match-has-prop
The API's surface area dissolved almost entirely, replaced by pure, declarative composition. A typical "posts" route collapsed into something that reads like plain English:
(opd-route :name "posts" :pattern "posts/*.org" :url "/%s.html" :aggregate (opd-aggregate-each) :filter (opd-filter-omit (opd-match-tag "draft") (opd-match-tag "archive")))
But the true power of functional composition is the absence of a ceiling. Nothing prevents you from chaining these primitives into highly specific data pipelines:
(opd-route :name "rss" :pattern "*.org" :output "public/rss.xml" :url "/rss.xml" :canonical nil ; Don't forget the canonical flag. :aggregate (opd-aggregate-all (opd-sort-date)) :filter (opd-filter-all (opd-filter-any (opd-match-tag "blog") (opd-match-tag "lore")) (opd-match-has-prop 'date) (opd-filter-omit (opd-match-tag "draft") (opd-match-tag "archive")))
For the untrained eyes, this matches anything tagged with blog or lore but
not draft or archive, and has a date, aggregates everything into a single
file, sorted by date. Powerful.
For the untrained eyes, this matches anything tagged with blog or lore but
not draft or archive, and has a date, aggregates everything into a single
file, sorted by date. Powerful.
… and it executes with the cold efficiency of a guillotine, exactly as intended.
This was the absolute pinnacle of functional design. I could compose
infinitely complex filtering rules using native closures, finally escaping
the cognitive rot of manual if/and/or chains or the boilerplate of one-off
lambda function. opd no longer dictated how data should be processed; it was
merely providing the primitives for you to process it yourself.
With the architecture crystallized, I decided to punish it.
10,000 posts
Ten thousand posts. That was the benchmark. A wildly unscientific one, a synthetic stress test designed to break the machine's spirit before I'd even written my first real post. If it couldn't survive ten thousand clones, it didn't deserve to host a single word of mine.
for i in {1..10000}; do cp test.org "test-${i}.org"; done
I ran the loop, cloned a test file 10,000 times, cranked the garbage collection threshold to the moon and fired the engine. Emacs redlined a single CPU core at 99%.
Five and a half minutes.
It was agonizingly slow, processing files at around 32 milliseconds per file. In the compiled world of Go or Rust, five minutes for 10,000 files is an eternity.
But then I did the math. For a single-threaded interpreter dynamically building Abstract Syntax Trees on-the-fly, natively resolving cross-route links, and transcoding HTML, 32 milliseconds per file is actually an absolute marvel.
Yet, I demanded blood. Acceptance is failure dressed in a suit, and copium isn't in my vocabulary. There had to be a way to drop these timings. The first rule of optimization is simple: make the machine do less work. To find the friction, you have to follow the heat. Let's trace the exact lifecycle of a single file and find out where the CPU is gnawing on itself.
Wait a minute. I was paying a double tax.
The first pass was grinding through the entire AST just to find a title and a date at the top of the file. The second was doing it all over again to spit out the HTML. I was making the machine walk the same mile twice, carrying the same heavy luggage, for no reason other than my own laziness. It was an industrial-grade waste of CPU cycles.
I could amortize the performance hit by invoking a very specific voodoo incantation:
(let ((org-inhibit-startup t)) (org-unmodified ;; While using `org-with-file-buffer' is slower than ;; `with-temp-buffer' + `insert-file-contents', but ;; correct, less memory hungry, and potentially faster ;; if run from an instance with the file already open in ;; a buffer. (org-with-file-buffer filename ;; Rest of Org operations go here. )))
The pipeline is purely read-only; it never mutates anything, permitting this aggressive bypass unconditionally.
But that wasn't enough. The bottleneck remained locked within
org-element-parse-buffer. I could've used a faster alternative instead:
(org-collect-keywords '("TITLE" "DATE" "FILETAGS" "SLUG"))
Or even:
(while (re-search-forward org-keyword-regexp) ;; Capture the metadata. )
These approaches would've been orders of magnitude faster than parsing the entire buffer, but they come at a cost: Accuracy.
org-collect-keywords matches only the explicitly requested keywords. Custom
keywords slip through the cracks; that won't do. Regular expressions are
notoriously finicky and aren't guaranteed to match all keywords, they
silently drop edge cases. Trading accuracy for speed puts you on a highway
to the nearest mental asylum.
If only there was a way to not parse the entire Org buffer…
(org-element-parse-buffer)
I stared at org-element-parse-buffer. This was the gatekeeper standing
between me and sub-second compilation times. It greedily held the keys to
the kingdom. It stared back with unblinking contempt. I cracked open its
docstring for the thousandth time.
org-element-parse-buffer is a native-comp-function in ‘org-element.el’.
(org-element-parse-buffer &optional GRANULARITY VISIBLE-ONLY KEEP-DEFERRED)
Inferred type: (function (&optional t t t) t)
Recursively parse the buffer and return structure.
If narrowing is in effect, only parse the visible part of the
buffer.
Optional argument GRANULARITY determines the depth of the
recursion. It can be set to the following symbols:
‘headline’ Only parse headlines.
‘greater-element’ Don’t recurse into greater elements except
headlines and sections. Thus, elements
parsed are the top-level ones.
‘element’ Parse everything but objects and plain text.
‘object’ Parse the complete buffer (default).
When VISIBLE-ONLY is non-nil, don’t parse contents of hidden
elements.
When KEEP-DEFERRED is non-nil, do not resolve deferred properties.
Closer.
‘greater-element’ Don’t recurse into greater elements except
headlines and sections. Thus, elements
parsed are the top-level ones.
Motherf—
Are you telling me I could've spared millions of wasted CPU cycles just by
telling the parser to skim the damn surface? A single flag to grab the
top-level metadata while completely bypassing the recursive hellscape of
inline parsing? I didn't know whether to weep, laugh, or hunt down the
authors of org-element.el and buy them a drink.
(org-element-parse-buffer 'greater-element t)
That's it. Let's run the benchmark again.
Three minutes and forty-four seconds later, it spat out 10,000 HTML posts and a monolithic index containing 10,000 links. That's twenty-two milliseconds per file.
Not too shabby. The times plummeted; not by an order of magnitude, but still a respectable amount. I can compile an entire decade of daily blogging from scratch before I finish brewing a shot of espresso.
Hot reloading, anybody?
Because there's no such thing as absolute immunity from frontend envy, the stench of Vite.js eventually crept into my terminal. The humiliation of hitting "save" on a CSS file and having to manually run a build script just to see a background color change was too much. The web-devs were mocking me and my ancient Lisp machine from their React ivory towers. I needed instant hot-rebuilding.
Adding insult to injury, I naturally shot myself in the foot. In a misguided attempt to shave a single line of code, I swapped:
(with-temp-buffer (insert-file-contents file) ;; Rest of operations. )
For:
(with-temp-file file ;; Rest of operations. )
Emacs happily obliged, truncating my source files to absolute emptiness the
second the build started. Years of notes, gone in a blink. I stared at the
abyss. The abyss stared back. I owe my continued unmedicated state to git
reflog and Emacs's paranoid backup system, otherwise I would be writing this
from a padded cell.
Once I recovered from my own absurdity, I tapped into Emacs' native
filenotify. Sunk-cost fallacy be damned, the premise was simple: watch the
directory, detect a save, and trigger opd-export.
It worked, technically. But for a relatively big blog – say, 500 posts – a full build takes about 11 seconds. That's a fast cold-start for a compiler, but triggering a full 11-second build on every typo fix was masochism.
I started, as one would, using memoization, by slapping a blunt, lazy cache across the routes.
(defun opd--route-posts (route) "Pull all posts found for a given ROUTE. This function will run the find, filter, aggregate pipeline and cache the results. When it's called again with the same parameters it should use the cache and not really run the pipeline again." (let* ((site (plist-get route :site)) (cache (gethash :cache site)) (key (plist-get route :name)) (val (gethash key cache))) (or val (puthash key (opd--route-collect-and-aggregate route) cache))))
But caching at the route-level is equivalent to using a shotgun for killing
a mosquito. If you change a single comma in post.org, the entire route's
cache invalidates, and the engine obediently burns cycles rebuilding the
other 499 posts right alongside it. The cache had to be pushed deeper. Down
to the file-level.
I needed to go truly incremental. But wait… how do I know what to rebuild?
Before I could even build the cache, I had to figure out what the hell to
actually watch. My first instinct was a brute-force directory sweep,
wrestling with project-ignore and regex patterns to blacklist .git folders
and whatnot. It was tedious, error-prone garbage. Then it hit me: why am I
guessing? I already know exactly which files matter. I had an entire
registry of that. All I had to do was walk the registry backwards, tracing
every valid file up to its base directory, and attach the native OS watchers
exclusively to the directories that mattered. Silent, precise, zero-config.
(defun opd--collect-watch-dirs () "Retrieve a list of directories to watch based on registered files." (let ((watch-dirs (make-hash-table :test 'equal))) (maphash (lambda (_ site) (maphash (lambda (_ route) (let ((base (file-name-as-directory (expand-file-name (plist-get route :base-dir))))) ;; Always watch the route's base directory. (puthash base t watch-dirs) ;; Trace back from every registered file up to the base. (maphash (lambda (path _) (let ((dir (file-name-directory path))) (while (and dir (string-prefix-p base dir) (not (string= base dir))) (puthash dir t watch-dirs) ;; Move one directory up: "/a/b/c/" -> "/a/b/" (setq dir (file-name-directory (directory-file-name dir)))))) (gethash :registry site)))) (gethash :routes site))) opd--sites) (hash-table-keys watch-dirs)))
Because of my oh-so-brilliant functional refactor earlier, the aggregators became opaque, unidentifiable closures. The engine couldn't tell what was what. It blindly rebuilt everything anyway. I couldn't rely on that even if I wanted to and besides, that's a user-provided value; and humans are entropy incarnate. They will always break your assumptions.
To solve this, I had to confront a demon I'd been ignoring. A wild
:canonical flag appears. I was still harboring a grudge against this thing
and I'm not one to swallow bitter pills without a fight. It was parading
around as a necessary architectural evil, but in reality, it was a bear trap
in disguise, armed and ready to bite the foot of some unsuspecting user. I
needed to kill it with fire. But how?
I stared at the aggregators code.
(defun opd-aggregate-each () "Aggregate each post as a single collection. This is the default aggregation, it generates one collection per input file. It returns a list containing each post." (lambda (posts) posts))
"Ackshually, you could've used #'identity for that." – No shit, Sherlock.
Read on.
"Ackshually, you could've used #'identity for that." – No shit, Sherlock.
Read on.
The pattern finally clicked. I had gotten lazy and forgotten a minor detail
during the functional rewrite. There was an accidental, fundamental
difference between the return types of the aggregation closures.
opd-aggregate-each returns the post properties as-is, right at the
top-level. Other aggregators wrapped their payloads in nested posts or
category lists.
(defun opd-aggregate-all (&optional sorter) "Aggregate all posts within a single collection. This aggregation generates a single collection for all the input files. It is useful for index pages, RSS pages, etc. If SORTER is nil, posts are kept in the order they're found, otherwise SORTER is applied to the posts." (lambda (posts) `(((posts . ,(if sorter (seq-sort sorter posts) posts))))))
I didn't need a manual flag nor to tell the engine which route was primary, nor did I need to interrogate opaque closures to figure out what to rebuild. I could rely entirely on the shape of the chunk instead.
And out the window goes :canonical. Sayonara, and I hope never to see you
again.
A wild duck-typed router appears.
Instead of interrogating the closures, the engine simply inspects the data chunks after aggregation. It boils down to a fundamental binary: 1:1 routes versus 1:N routes.
Does the chunk have an abspath property at the top level? It's a 1:1 route.
It maps one input file to exactly one output file. Rebuild it only if the
path matches the exact file that changed. Is the abspath missing or buried
deep inside a nested list of posts? It's a 1:N aggregate – an index, tag
page, RSS feed, you name it. Rebuild it, because its aggregated data just
changed.
(defun opd--tree-has-abspath-p (tree path) "Recursively search an arbitrary Lisp TREE for (abspath . PATH)." (cond ;; Base case: found the exact key-value pair ((and (consp tree) (eq (car tree) 'abspath) (equal (cdr tree) path)) t) ;; Recursive case: traverse both sides of the cons cell ((consp tree) (or (opd--tree-has-abspath-p (car tree) path) (opd--tree-has-abspath-p (cdr tree) path))) ;; Regular atom (t nil)))
Brace yourself for the gory details. One giant, disgusting looking function coming right up:
(defun opd-watch-start () "Start watching registered sites for changes for incremental hot-reloading." (interactive) ;; Cleanup old watchers. (mapc #'file-notify-rm-watch opd--watch-descriptors) (setq opd--watch-descriptors nil) ;; Run a full build and warm up the cache + registry. (opd-export) (letrec (;; Track pending rebuild timers for debounce. (timers (make-hash-table :test 'equal)) (rebuild (lambda (file action) (when-let ((timer (gethash file timers))) (cancel-timer timer)) (puthash file (run-with-idle-timer 0.1 nil (lambda (f) (unwind-protect (let ((start (current-time))) (opd-export-incremental f) (opd--log "file %s: %s, rebuilt in %s" action (file-name-nondirectory file) (float-time (time-subtract (current-time) start)))) (remhash f timers))) file) timers))) (callback (lambda (event) (when-let* ((action (nth 1 event)) (file (or (nth 3 event) (nth 2 event))) ((stringp file))) (unless (or (string-prefix-p ".#" (file-name-nondirectory file)) (string-prefix-p "#" (file-name-nondirectory file)) (string-suffix-p "~" file)) (cond ;; Dynamically attach watchers to a new directory. ((and (memq action '(created renamed)) (file-directory-p file)) ;; XXX: double watch on renamed because of inode ;; retention? (funcall walk file)) ;; Remove the deleted file from cache. ((eq action 'deleted) (opd--log "file deleted: %s, purging cache" (file-name-nondirectory file)) (remhash file opd--file-cache) (opd--log (concat "dead links may linger in aggregate routes; " "you should probably run a full build"))) ;; Remove the renamed file from the cache and ;; rebuild with the new name. ((eq action 'renamed) (let ((old (nth 2 event))) (opd--log "file renamed: %s > %s" (file-name-nondirectory old) (file-name-nondirectory file)) (remhash old opd--file-cache) (funcall rebuild file 'renamed) (opd--log (concat "dead links may linger in aggregate routes; " "you should probably run a full build")))) ;; Debounced build for created/edited files. ((and (memq action '(changed created)) (not (file-directory-p file))) (funcall rebuild file action))))))) (walk (lambda (d) (opd--log "attaching dynamic watcher to: %s" (file-relative-name d)) (push (file-notify-add-watch d '(change) callback) opd--watch-descriptors) ;; Catch nested directories. (dolist (f (directory-files d t directory-files-no-dot-files-regexp)) (when (file-directory-p f) (funcall walk f)))))) (dolist (d (opd--collect-watch-dirs)) (push (file-notify-add-watch d '(change) callback) opd--watch-descriptors))) (opd--log "incremental watcher live, %d watchers attached" (length opd--watch-descriptors)) ;; Prevent premature exit in batch-mode. (when noninteractive (opd--log "running in batch mode - press ctrl+c to exit") (while opd--watch-descriptors (read-event nil nil 0.5))))
Take that, Vite.
The sweeper's demise
Performance breeds arrogance. In a fleeting bout of folly, I succumbed to feature creep and flirted briefly with the idea of a garbage collector, a "sweeper", to track every generated artifact, a tool to prune orphaned files and empty directories from the output without nuking the whole thing.
I wrote a complex state-tracking manifest, wired it into the routing loop. And what a blunder that was.
The code started shrieking again. Tracking stateful build manifests, manually walking directory trees, and sorting their depths violated my nostrils. It violated the functional purity I had painstakingly established.
It was an uphill battle for near-zero gain. This problem had already been solved decades ago by the UNIX philosophy: Just… you know… delete the whole damn output directory and rebuild from scratch.
I got rid of the cruft. Less code is good code. Sometimes, the smartest
engineering decision is recognizing when a problem had already been solved
half a century ago by a shell command: rm -rf output/. It's brutal,
stateless, and mathematically guaranteed to be correct.
The burning crusade
The test suite glowed green, lulling me into a false sense of security. I leaned back, waiting for the final output. Instead, an exception ruptured the pipeline, bleeding an incomprehensible backtrace across my terminal like a severed artery. The build failed with the hostile ambiguity of a machine actively deceiving its creator.
I tore the routing logic apart, byte by byte. I spent hours plunged in a
bloodshot debugging stupor, sifting through opaque ox.el internals,
completely convinced opd had betrayed me and was fundamentally broken. I
questioned the architecture. I questioned my own competence. And then, at
the bottom of the call stack, after tearing my hair out by the roots, I
finally found the culprit: a commented-out route sitting innocently in my
own configuration file.
The call was coming from inside the house. My own sloppy config had poisoned the well, and the engine didn't think twice before gulping.
This called for a shift in doctrine. Defensive programming is for the faint of heart. It politely catches errors, attempts a graceful recovery, and sweeps the mess under the rug. But silent recovery breeds insidious state corruption. I wanted it to detonate. Loudly. I needed to embrace offensive negative-space programming.
I started sprinkling cl-assert throughout the codebase like holy water,
branding function boundaries in the spirit of a zealot carving protective
wards into the pillars of a demon-infested cathedral. I was handing out
fatal assertions and execution-stopping errors that would make that
legendary Australian BBQ slap proud. You passed a nil output path? Slap. The
build halts. You thought about clobbering the canonical link? Slap. The
engine dies immediately and barfs your exact mistake directly to your face.
Enforcing invariants and smiting these bugs the moment they stepped out of
line became a holy crusade.
It was a minor sacrifice of "unopinionated design" on the altar of strict, unforgiving validation. But it was a silver bullet. It kept the state corruption at bay, the routing table clean, and myself out of a straitjacket.
Reloading went cold
I had the incremental compiler, I could change a comma in a 10,000-word Org file and watch the HTML swap out in a few milliseconds.
Then I tweaked a CSS file nested in a assets/css directory. I waited for the
reload. Nothing. I saved again. Silence. The watcher was stone-cold deaf.
I looked at the route definition:
(opd-assets :name "assets" :pattern "assets/*" :url "/%f" :output "public/%f")
I had forgotten a universally despised truth: Emacs' filenotify is a thin
veil over system libraries such as inotify and fsevents. And these system
libraries are rightfully, stubbornly, famously flat. They refuse to look
inside subdirectories unless explicitly forced to. My assets were completely
off the grid.
My first instinct was a brute-force directory sweep. I spent hours wrestling
with directory-files-recursively before I realized I was reinventing a
broken wheel. I already had a map of the territory. I fell back on the
elegance and simplicity of eshell-extended-glob, traced the footprints left
in the URL registry, and dynamically stapled a native OS watcher to every
single parent directory that actually mattered.
Up until this point, I had built a segregated society. Org files enjoyed a
highly sophisticated, cache-driven router, while static assets were
relegated to a dumb, imperative copy-file loop. I was treating non-Org files
as second-class citizens. Hypocrisy at its finest.
Why I was treating assets differently was beyond me. Maybe it was remnants of Weblorg's architecture influencing my design. Or maybe I had grown complacent, willing to tolerate a lazy, imperative hack so long as the files ended up in the right directory.
This called for a grand unification.
An image, a stylesheet, or any other file for that matter, is just a post
without an AST. I tore down the wall, unifying static assets as a special
case of a standard route. With a dummy parser – opd--parse-asset – that
bypassed Org entirely and yielded the file's path properties.
(defun opd--parse-asset (path &optional base _) "Minimal parser for a static asset at PATH. BASE is the route containing PATH." (let* ((paths (opd--resolve-paths path base)) (slug (opd--slugify (file-name-nondirectory path)))) (append paths `((slug . ,slug)))))
The assets were flowing natively through the exact same duck-typed, 1:1 incremental router as other posts. The engine didn't care what the file was, it traced the chunk and moved the bytes. Total systemic harmony.
Just as I was about to declare absolute victory, a bug crept out of the woodwork and shattered the illusion.
I opened my macro collection – setup.org, a utility file injected into the
top of some posts via the #+SETUPFILE keyword. I changed a macro definition
and hit save. The watcher fired, checked the routing table, realized
setup.org was an excluded utility file, and immediately went back to sleep.
It did absolutely nothing.
The true horror was the silence that followed. Every post that relied on that macro remained blissfully unaware, serving stale, outdated content.
The file-level AST cache I was so proud of was strictly bound to filesystem
modification times. If File A includes File B, and you save File B, File A's
mtime hasn't changed. To the cache, File A is pristine. Untouched. The
engine was blind to transient relationships.
I needed a way to track the bloodlines between files. I couldn't rebuild the entire site every time a macro changed, but I couldn't ignore the updates either.
I went back to the Org parser. During the first pass, while it was already
skimming the surface for titles and dates, I instructed it to hunt for
#+SETUPFILE and #+INCLUDE directives. If post.org included setup.org, the
engine explicitly burned setup.org into post.org's metadata chunk. A
primitive paper trail.
When the file watcher caught a change, it would iterate through the entire chunk cache, shaking every parsed post and asking: "Do you depend on this file?"
If, for some reason, you're unfamiliar with computer science terminology and you stuck all the way until here, I first want to say "Congrats" but also "Well damn, what kept you hooked this long?" This is to tell you that killing zombie children and slaughtering orphaned parents together is a common occurrence in our dialect and isn't as gory as it sounds. We may seem peaceful, but we are lexically violent. If a post's paper trail implicated the modified utility file, the engine invalidated the parent's cache, dragged it into the dirty queue by the neck, and forced it to the execution block alongside its child.
If, for some reason, you're unfamiliar with computer science terminology and you stuck all the way until here, I first want to say "Congrats" but also "Well damn, what kept you hooked this long?"
This is to tell you that killing zombie children and slaughtering orphaned parents together is a common occurrence in our dialect and isn't as gory as it sounds. We may seem peaceful, but we are lexically violent.
This was the flashlight that illuminated the blind spot. The cache became an active web of complicity. I had achieved true, hot-reloading nirvana. Or so I thought…
Redline
"Not too shabby" was a blatant lie.
I told myself twenty-two milliseconds per file was a victory. But deep down, it tasted like coping. The friction in the back of my brain wouldn't leave me alone, a constant, abrasive annoyance; my instincts were screaming that the engine had more to give, and the incessant screams only grew louder. Not because of some imaginary competition with the JavaScript ecosystem, but because leaving performance on the table when you know the Lisp machine has more gears to shift is a cardinal sin.
10,000 posts. We're back, baby.
To find the friction, I had to clean the house; the codebase was littered
with maphash and recursive lambdas traversing sites, routes, and duck-typed
ASTs to figure out the 1:1 vs 1:N routing structures. In Elisp66
If memory serves me right, Emacs Lisp has a hardcoded limit of 1600
recursion depth.
, deep
recursion is a great way to beg for a stack overflow and shoot your own
foot. Furthermore, the nested indentation and endless parentheses began to
look like a spaghetti murder scene.
(maphash (lambda (_ site) (maphash (lambda (_ route) ;; Do stuff with route. ) (gethash :routes site))) opd--sites)
I ripped out the external maphash + lambda combo. I built syntactic macros
to flatten the execution environment.
(defmacro opd--with-sites (site-var &rest body) "Evaluate BODY for each site, binding it to SITE-VAR." (declare (indent 1)) `(maphash (lambda (_ ,site-var) ,@body) opd--sites)) (defmacro opd--with-routes (site-var route-var &rest body) "Evaluate BODY for each route in SITE-VAR, binding it to ROUTE-VAR." (declare (indent 2)) `(maphash (lambda (_ ,route-var) ,@body) (gethash :routes ,site-var)))
Ah, much better.
(opd--with-sites site
(opd--with-routes site route
;; Do stuff with route.
))
No more pulling punches. I went all-out. I started eliminating the obvious
performance hotspots. I replaced every remaining lambda-wrapped utility with
a magic incantation called cl-loop, a swiss-army chainsaw borrowed from
Common Lisp, allowing you to iterate, accumulate, and bail out of complex
data structures at maximum velocity. The kicker? It eliminates the overhead
of environment-capturing closures. While we're at it, let's update those
macros too.
(defmacro opd--with-sites (site-var &rest body) "Evaluate BODY for each site, binding it to SITE-VAR." (declare (indent 1)) `(cl-loop for ,site-var being the hash-values of opd--sites do (progn ,@body))) (defmacro opd--with-routes (site-var route-var &rest body) "Evaluate BODY for each route in SITE-VAR, binding it to ROUTE-VAR." (declare (indent 2)) `(cl-loop for ,route-var being the hash-values of (gethash :routes ,site-var) do (progn ,@body)))
format-spec was next in line. It was escaping template strings repeatedly.
On every single iteration. I pulled it out, forcing the engine to escape the
templates exactly once during route initialization. eshell-glob-regexp got
the same treatment and was eagerly memoized at route creation.
The code was cleaner, but the benchmark barely flinched. We're far from done.
Time to summon the heavy artillery; I fired up M-x profiler-start, Emacs'
native truth serum. No more guesswork, I wanted to see exactly which
functions were gnawing on the CPU in the dark.
113492 73% - command-execute
113438 73% - funcall-interactively
113437 73% - execute-extended-command
113433 73% - command-execute
113433 73% - funcall-interactively
113433 73% - eval-buffer
113013 73% - let
113013 73% - opd-export
113013 73% - #<native-comp-function F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_12>
67956 44% - #<native-comp-function F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_11>
67956 44% - eval
67956 44% - let
67956 44% - funcall
67956 44% - #<byte-code-function 2CD>
67942 44% - opd-export-templates
62782 40% - #<interpreted-function B09>
62779 40% + org-export-as
1523 0% - #<interpreted-function 8AA>
1523 0% + org-export-as
804 0% + opd--log
547 0% + #<native-comp-function F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_23>
310 0% + #<interpreted-function BC0>
306 0% + mkdir
199 0% + make-lock-file-name
65 0% + #<byte-code-function 669>
41 0% + #<native-comp-function F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_24>
26 0% generate-new-buffer
19 0% + #<interpreted-function 8E1>
11 0% alist-get
1 0% opd--route-posts
14 0% + opd-export-assets
45057 29% - #<native-comp-function F616e6f6e796d6f75732d6c616d626461_anonymous_lambda_9>
45057 29% - eval
45057 29% - let
45057 29% - funcall
45057 29% - #<byte-code-function 626>
45057 29% - opd--route-posts
45057 29% - opd--route-collect-and-aggregate
44228 28% - opd--parse-org-file-with-cache
43978 28% - opd--parse-org-file
20624 13% + find-file-noselect
19547 12% + org-persist-write-all-buffer
3109 2% + org-element-parse-buffer
264 0% + opd--resolve-paths
149 0% + org-element-map
69 0% + opd--slugify
30 0% + vc-kill-buffer-hook
15 0% + #<byte-code-function DFE>
9 0% browse-url-delete-temp-file
9 0% add-hook
7 0% + uniquify-kill-buffer-function
6 0% + replace-buffer-in-windows
5 0% alist-get
4 0% process-kill-buffer-query-function
3 0% + #<byte-code-function 7E2>
3 0% run-hooks
2 0% org-check-running-clock
1 0% alist-get
393 0% + opd--render-route-prop
115 0% + opd--register-exclusions
99 0% + opd--find-source-files
35 0% + #<byte-code-function BE7>
9 0% + opd--build-format-spec
7 0% + #<byte-code-function B81>
3 0% alist-get
1 0% opd--merge-contexts
243 0% + package-initialize
167 0% + require
7 0% + internal-macroexpand-for-load
3 0% + condition-case
2 0% cancel-timer
1 0% + handle-focus-out
54 0% + byte-code
38747 25% Automatic GC
1443 0% + timer-event-handler
20 0% + redisplay_internal (C function)
4 0% + ...
1 0% + eldoc-schedule-timer
1 0% + internal-timer-start-idle
1 0% + #<byte-code-function BDB>
The profiler dump was a mirror reflecting the extent of my own stupidity.
The second pass consumes around 60% of the CPU, dominated entirely by
org-export-as. That was expected; there's virtually nothing to do here, we
can't optimize the core parser outside short of opening the door to another
rabbit hole. But the first pass? opd--parse-org-file was thrashing 40% of
the total build time just to extract keywords. The offenders couldn't hide
anymore:
find-file-noselect: 13% CPUorg-persist-write-all-buffer: 12% CPU
My earlier assumptions that org-with-file-buffer was an elegant, memory-safe
choice for sidestepping the overhead of temporary buffers were hilariously
inaccurate. The profiler showed find-file-noselect hemorrhaging CPU cycles
everywhere. I was forcing Emacs to treat 10,000 raw text files as
interactive buffers, and it was dutifully doing what it was designed to do:
triggering file locks, querying version control backends, and running a
dozen or so major-mode hooks. For every file. Worst of all, it was
triggering Org's org-persist, forcing the editor to aggressively write cache
data to the hard drive for every single file it parsed. This would've been
great for normal usage, but by now, we're no longer normal users, are we?
It was carrying the entire bureaucratic weight of an interactive operating system. Just to read a damn title string.
Out the window. We dropped back somewhere more comfortable, closer to the
metal, where I controlled the raw I/O:
with-temp-buffer paired with insert-file-contents. Lean, dumb, aggressively
fast.
I fired the benchmark again.
The terminal went dead silent. The CPU fan spun down. It was too quiet. I
stared at the prompt in disbelief. Usually, when the build script finishes
this fast, it means opd silently choked on something nasty and died in the
background. I ran the test suite. Green. I ran the benchmarks again. Same
exact output. This wasn't a crash.
One minute and fifteen seconds. 7.5 milliseconds per file.
I leaned back in my chair, staring at the numbers once more. I was beating those smelly JS frameworks to death with an ancient crowbar.
But the high didn't last. A cold boot of 7.5ms per file is blistering, but when I spun up the incremental watcher to actually write a post, it felt sluggish. The compilation time was fast, but the engine was wasting time thinking about what to compile.
Technically, a hash-table is amortized time complexity. But if you
kept reading this far and you don't know the difference between average-case
hashing and worst-case collisions, I can't help you. De-dust your Comp Sci
books. So I did what any performance-obsessed manic does: I slapped
hash-tables everywhere.
chunkcache, filecache, depcache, registry. If a
collection required a linear scan, I ripped it out. Linear scans are
algorithmic peasantry, it's knocking on 10,000 doors sequentially to find
one specific file. A hash table is the master ledger; it teleports you
exactly to where you need to be. That's
constant time.
Technically, a hash-table is amortized time complexity. But if you
kept reading this far and you don't know the difference between average-case
hashing and worst-case collisions, I can't help you. De-dust your Comp Sci
books.
Individual files were fast-ish while aggregates were still dragging. I realized the engine was treating all changes equally. A logical fallacy; not all changes were comparable. It needed a bifurcated execution path:
- Content change
- You change a word. The file metadata is identical. The engine intercepts it, updates the output, and bails immediately. This is the fast path.
- Structural change
- You add a new tag to your
#+FILETAGS, or change an included file. The engine has to reconstruct the aggregates.
A massive improvement, but it still didn't solve the final boss: the transient dependency nightmare from earlier. The one where changing a macro file left the parent files serving stale content. My attempts to shake the cache were buckling under the weight of the routing table.
I needed more speed. I needed nitrous oxide.
I built the depcache, a hash-table backed, reverse-dependency graph. When the
file watcher caught a change, it unleashed a DFS traversal algorithm.
If setup.org changed, the algorithm stalked through the graph, found
post-a.org and post-b.org that included it, invalidated their caches, and
dragged them onto the execution block. A cascading invalidation matrix.
The engine finally roared to life, idling with a cold, heavy hum. It sounded hungry.
Hot-rebuilds plummeted to a range of 7 to 200 milliseconds, depending
entirely on the complexity of the source Org file. opd was practically out
of the equation. The routing, dependency tracking, the duck-typing. It all
executed in near-zero time. The only overhead left was the irreducible cost
of org-export parsing the text.
(defun opd--incremental-export (files) "Incrementally rebuild the given FILES. This function uses granular caches to search all sites for the FILES; it returns non-nil if the FILES were found and rebuilt." (cl-assert (listp files) nil "files must be list, got: %s" files) (cl-assert (cl-every #'stringp files) nil "incremental export requires string paths") (let* ((abspaths (mapcar #'expand-file-name files)) (result nil)) (opd--with-sites site (let* ((cache (gethash :cache site)) (chunkcache (gethash :chunkcache site)) (depcache (gethash :depcache site)) (filecache (gethash :filecache site)) ;; Resolve dependencies and invalidate impacted files ;; via DFS which also handles transient dependencies. (impacted (cl-loop with seen = (make-hash-table :test 'equal) with stack = (copy-sequence abspaths) while stack for item = (pop stack) unless (gethash item seen) collect item do (remhash item filecache) (puthash item t seen) (dolist (parent (gethash item depcache)) (push parent stack))))) (opd--with-error (opd--with-routes site route (let* ((name (plist-get route :name)) (oldcache (gethash name chunkcache)) (impacts-route-p nil) (structural-change-p (cl-loop for f in impacted for exists = (file-exists-p f) for was-in-p = (and oldcache (gethash f oldcache)) do ;; Dirty dirty mutation by side-effect; saves ;; a secondary pass over `impacted'. (when was-in-p (setq impacts-route-p t)) thereis (if (opd--route-match-p route f) (not (eq (not was-in-p) (not (opd--route-accepts-p route f site)))) was-in-p)))) ;; Bypass when no change detected. (when (or structural-change-p impacts-route-p) (remhash name cache) ;; Invalidate route cache (opd--with-env route (let* ((newcache (gethash name chunkcache)) (chunks (if structural-change-p (opd--route-posts route) (cl-loop ;; `eq' for pointer equality. with set = (make-hash-table :test 'eq) for f in impacted do (dolist (c (gethash f newcache)) (puthash c t set)) finally return (hash-table-keys set))))) (when chunks (funcall (plist-get route :export) route chunks) (setq result t)))))))))) result))
The disparate parts had finally locked together.
It's alive
We Frankensteined a thing of beauty. We started with a rotting corpse – a brittle, naïve string-templating script. What crawled off the operating table is a declarative, hot-rebuilding orchestration layer powered by a purely functional dependency graph.
Tsoding said it best in his (not-so-obvious) love letter, "The annoying
usefulness of Emacs". This project is the epitome of that sentiment. Emacs
is a platform that is infinitely greater than the sum of its parts. By
leaning entirely into ox.el and the native Org AST, the engine's surface
area vanished. No bespoke plugin architectures nor brittle abstractions.
Just the user, the site, and the route.
As the adage goes, "simple" is the farthest thing from "easy." Stripping a system down to its absolute bare metal is agonizing work, but the payoff is a piece of software, older than most of the JavaScript andies out there, waking up to beat modern web tooling at its own game. The modern web is plagued by developers hiding behind their bundlers, obsessed with shipping megabytes of client-side code just to render static text on a screen. Sometimes, building a compiler from scratch is exactly the kind of unhinged sanity check we need to remind ourselves what computers are actually capable of.
Yes, yes. Forging Excalibur in the fires of a dying star and naming it
"Sharp metal stick". As they say, naming is one of the two hardest problems
in computer science. Looking at the codebase now, calling this creation opd feels like
an insult to its architecture. It became a razor-sharp, reactive compiler
forged in blood, sweat, stack traces, profiler dumps, and the ashes of my
long-gone notes (RIP). The acronym is ill-fitting. It has earned a real
name: ossg, the Org Static Site Generator.
Yes, yes. Forging Excalibur in the fires of a dying star and naming it "Sharp metal stick". As they say, naming is one of the two hardest problems in computer science.
Time to take that damn line out of my README.
Now if you'll excuse me, I have some posts to write.
Afterword
I have to address the elephant in the room: hot-rebuilding isn't true hot-reloading. To get the browser to magically inject CSS without a refresh requires WebSockets, background Node processes, and a lot of external baggage.
But frankly? I'm not writing web applications. I'm publishing notes, articles, and lore. Sacrificing the purity of a zero-dependency Lisp machine is a steep price to pay to save a browser-refresh keystroke.
There is, of course, a catch to the magic. To make the duck-typed router and the dependency graph work, the engine has to know the territory. You can't hot-reload an engine that's freezing cold. Before the watcher drops into incremental mode, it requires one full, synchronous master build to warm the cache, index the registry, and trace the bloodlines. Call it the ignition tax.
I can already hear the systems-level pedants. "Why no parallelism? Why no asynchronous workers?" Believe me, the temptation to spin up headless Emacs subprocesses and blast the AST parsing across all my CPU cores was strong.
But Emacs threading is a cooperative illusion; threads yield like polite Victorian gentlemen, which during heavy CPU loads means they don't yield at all. True async requires serializing massive Lisp structures across IPC boundaries with its nightmarish overhead or rebuilding Emacs from the ground up with an event-loop. Pick your poison.
I am a relentless engineer, but even I know when to stop. Bolting asynchronous IPC onto a static site generator just to shave off a few milliseconds crosses the line from optimization into ego-driven insanity.
I also need to confess a sin of omission regarding the reactive router. To
kill that abominable :canonical flag, I implemented a duck-typed chunk
inspector. The engine decides if a chunk is a 1:1 post or a 1:N aggregate
entirely by looking for an abspath property at the top level of the data
tree.
This is a beautiful, frictionless lie. It assumes you aren't going to do
something stupid like manually burying the abspath in a custom
opd-aggregate-each closure, or magically injecting abspath into the chunk of
your RSS-feed. If you violate this invisible, undocumented structural
contract, the hot-rebuilding watcher will silently miscategorize your files,
ignore your changes, and gaslight you into thinking you're losing your mind.
You have been warned. Treat the aggregation structure with respect, or the watcher will leave you for dead.
Is ossg perfect? Hell no. It's software. It has sharp edges, undocumented
assumptions, and probably a few dormant bugs waiting to vomit a stack trace
into your face. But it's fast, it's robust, it's mine, and it doesn't suck.
It lives entirely within my digital garden. You are free to take it. Fork it, rip its guts out, send patches, or use it to build your own unhinged Lisp compiler. Just don't open an issue complaining that it doesn't support your favorite JavaScript framework. I will close it out of spite.
Footnotes:
Sitemap in org-publish parlance refers to the index page listing all
posts.
🖖
If you know, you know.
Did I mention you could have multiple sites running in the same engine?
Case in point: this very link points to another route entirely. See what I did there?
If memory serves me right, Emacs Lisp has a hardcoded limit of 1600 recursion depth.