Code Generation & Execution

From islands to artifacts to results

Placed islands are compiled into backend-specific artifacts and dispatched at runtime. The PostgreSQL backend fuses all reachable islands into a single prepared statement. The interpreter tree-walks MIR directly. The orchestrator coordinates transitions between them.

Two-phase model

Execution follows the same separation as traditional compilers: compilation produces artifacts, then dispatch executes them. During code generation, each backend compiles its assigned islands. During execution, the runtime dispatches those artifacts according to the island dependency graph. Splitting these phases enables caching: the compiled artifacts, together with the island graph, contain enough information to dispatch a query without recompilation.

The execution infrastructure is backend-agnostic. Adding a new target requires a code-generation module and cost entries in the placement analysis, not changes to the orchestrator or the island model.

Compilation and dispatch are separate phases. Artifacts plus the island graph are enough to run a query without recompiling.

PostgreSQL backend

The PostgreSQL backend translates MIR fragments into declarative SQL. Three structural preconditions, all guaranteed by the placement pipeline, make this feasible: each PostgreSQL island is acyclic (loops are rejected), has a single entry block, and contains only side-effect-free instructions. Combined with SSA, no mutable state exists and every local is assigned exactly once on each control-flow path.

SSA to SQL: folding assignments back into expressions

The core operation is conceptually the reverse of ANF normalization. Where the compiler frontend decomposes expression trees into sequences of atomic assignments, the PostgreSQL backend folds them back into nested SQL sub-expressions. SQL lacks let-bindings, so the compiler converts the DAG into a tree by duplicating blocks below join points and compiles each control-flow path independently.

The overhead of duplication is purely textual. Conditional expressions short-circuit, so PostgreSQL evaluates only the taken arm per row. Prepared statements amortize plan time by reusing the plan across invocations. Because the result is a scalar expression rather than a subquery, the query planner retains full visibility: predicate pushdown and constant folding apply without barrier.

CTE-based alternatives (deduplication at join points via GSA, or compiling each block to a CTE directly) were considered and rejected: PostgreSQL’s planner visibility limitations negate sharing benefits for per-row evaluation, and the reported performance gains rely on subquery decorrelation that PostgreSQL does not implement.

The continuation row

Each compiled island returns a continuation encoded as a single composite row: (filter boolean, block int, locals int[], values jsonb[]).

The filter field is three-valued. TRUE and FALSE indicate early termination: the row is accepted or rejected entirely within PostgreSQL, and the WHERE clause filters it before it ever reaches the runtime. NULL indicates that execution must continue on another backend, in which case block identifies where to resume and the parallel locals/values arrays carry the live-out state. The arrays are parallel rather than fixed-width because the set of live-out locals is typically sparse.

Click a step to highlight the dispatch path. Three-valued filter: TRUE / FALSE terminate in PostgreSQL; NULL transfers live-out state to another backend.

Type-to-JSONB collapse

Values crossing the interpreter-PostgreSQL boundary are encoded as JSONB. Because HashQL contains types that cannot be represented in JSON, a semantic collapse is necessary. Multiple HashQL types map to the same JSON form: tuples and lists both become JSON arrays; structs and dictionaries both become JSON objects. Opaque wrappers are lost entirely, replaced by their inner value.

Deserialization relies on the MIR’s preserved type information at each consumption site to reconstruct the original value. Union types present the only complication: multiple deserialization paths are possible. The placement analysis guarantees that each union admitted to PostgreSQL is unambiguous: no two members collapse to the same JSON form.

Temporal metadata requires separate treatment. PostgreSQL stores temporal ranges as tstzrange, which has no JSON equivalent. The compiler decomposes each range into a struct with start and end fields, where start is an epoch-millisecond integer and end is either an integer or null for unbounded ranges. All intervals follow a canonical left-closed, right-open form.

Because the storage layer can produce NULL from missing JSONB keys, the compiler guards every decision point: control-flow expressions reject the row if the discriminant is NULL, and return continuations coalesce NULL to FALSE.

Query assembly

The full query assembles around the base table. The graph read head determines the starting table and constrains it through temporal overlap conditions on both time axes. Each pipeline body contributes its islands as CROSS JOIN LATERAL subqueries. LATERAL is necessary because the continuation expression references columns from the base table; CROSS JOIN because every candidate row must be evaluated. A LATERAL subquery avoids the problem of composite field access in the SELECT list, where PostgreSQL expands the expression once per accessed field rather than evaluating it once. Data required across islands is requested through additional SELECT columns, with table joins added lazily as referenced entity paths demand.

Interpreter

The interpreter follows a tree-walking design and executes MIR directly without a prior compilation step. This is a validation-first choice: a register-based bytecode VM is planned as future work. The interpreter is stack-based: each function call creates a frame storing locals, the current program counter, and a reference to the executing body. Storage is pre-allocated from the body’s local declarations.

Coroutine suspension

When the interpreter encounters an effectful terminator (graph reads), it yields a suspension to the orchestrator. The orchestrator fulfills the request and resumes execution by writing the returned value as the block argument and advancing the program counter to the successor block. Nested graph effects are handled by recursing into the orchestrator through the same suspension protocol.

Referential transparency ensures that pure computation performed before the suspension requires no cleanup. Catastrophic failure (network or database outage) prevents resumption but does not corrupt state.

Island-aware suspension

The placement pipeline may assign blocks within the same body to different targets. At each block boundary, the interpreter checks whether the transition crosses an island boundary. If so, execution yields to the orchestrator, which dispatches the next island to its assigned backend. This check applies only to graph-effect bodies (the root stack frame); callees reached through function application do not carry island assignments.

Traversal-aware liveness analysis, run during code generation, determines which locals and vertex paths must be transferred at each island boundary.

Orchestrator

The orchestrator walks the island dependency graph, dispatching each island to its assigned backend and mediating data transfer at island boundaries.

Graph read fulfillment

Fulfillment begins at PostgreSQL. The head operation determines the candidate set by constraining the temporal slice and vertex type. Because the transition model enforces that PostgreSQL is outgoing-only, all reachable PostgreSQL islands form an entry-side prefix of any execution path through a pipeline body. Their computation is fused into the initial query: head determination and all PostgreSQL island evaluation happen in a single prepared statement, executed as one database round-trip.

Each returned row passes through the pipeline’s chain of operations. The orchestrator proceeds block by block: for a PostgreSQL island, it checks the continuation. If block is NULL, the row already passed within PostgreSQL (early termination). If a continuation exists, the target block and live-out locals are written into the interpreter’s callstack. For an interpreter island, the orchestrator runs the interpreter until a value is returned or another island boundary is reached.

Demand-driven hydration

Instead of loading complete vertices, the orchestrator populates only the fields that downstream islands have requested. Each island’s requires set is resolved against provider islands during placement, and the provides sets determine which vertex paths are made available at each boundary. The type system prohibits access to fields outside the known set; traversal-aware path analysis determines which fields are actually reached; the island dependency contract guarantees every such field has been populated by a prior provider.