Literate Programming

Literate programming was invented by Donald Knuth in 1984. The key idea is simple: instead of writing code in the order the compiler demands, you write it in the order a human would want to read it. Code and documentation live together in a single source file, and tools extract either the book or the compilable code from it.

Compiler Order vs Understanding Order

In a normal source file, you see everything at once in the order the compiler requires: includes, forward declarations, types, helpers, then finally main() at the bottom. There is no narrative.

In a literate program, the code is reorganized into sections that tell a story. You start with the overview, introduce the core data structures, explain the main loop, and defer advanced features and error handling to later chapters. Each piece of code is presented when the reader is ready for it.

Compiler order (flat, no narrative) vs Understanding order (piece by piece, important first)

The Toolchain

A literate program is stored in a .nw file (Noweb format). Two tools process it:

noweave extracts the documentation, producing a LaTeX file that is then typeset into a PDF book.
notangle extracts the code, producing the actual .c and .h files that are compiled normally.

The same source produces both the book and the working program.

Literate programming toolchain: .nw file produces both PDF book and source code

Example

Here is a concrete example from a toy kernel. On the left is the Noweb source (ToyKernel.nw), mixing documentation sections with named code chunks (<<proc.h>>=, <<kernel.c>>=). On the right is the typeset LaTeX output: a readable document with numbered sections, cross-references, and properly formatted code.

ToyKernel.nw source on the left, rendered PDF on the right

Syncweb: Bidirectional Literate Programming

Classic Noweb is one-way: you edit the .nw file and extract code from it. This is the biggest practical complaint about literate programming — you cannot use your normal IDE or debugger workflow on the extracted code, because changes would be lost on the next extraction.

Syncweb solves this by making the process bidirectional. It places MD5 checksums in both the .nw file and the extracted source files, so it can detect which side changed and automatically merge edits back. You can edit either the book or the code, and syncweb keeps them in sync.

Syncweb: bidirectional sync between .nw and .c using MD5 marks

Syncweb: Code Indexing and Hyperlink Navigation

Syncweb also provides code indexing. It analyzes the extracted source files and generates cross-references: every function call, every type name, every variable becomes a clickable hyperlink in the PDF. This brings IDE-style "go to definition" navigation into the book.

The screenshot below shows the Linker book open in Evince. Notice the tooltip "Go to page 48" — clicking on a function name jumps to the page where that function is defined. Even better, hovering over a link shows an inline preview of the target page, so you can see a function's definition without leaving your current position — just like "peek definition" in a modern IDE. The result is the narrative of a book combined with the navigation of an IDE.

Back to Principia Softwarica