Target Placement

Where should each instruction run?

The graph is heterogeneous: different data lives on different backends. Target placement assigns each basic block to the backend that minimizes a separable cost surrogate, balancing computation cost against data transfer overhead under capability constraints.

The tension

Placement is governed by three competing forces. Correctness: each backend can only express a subset of the MIR instruction set. PostgreSQL cannot execute closures or function calls; the interpreter cannot issue SQL. Execution cost: backends differ in how efficiently they evaluate the same operation. PostgreSQL processes relational operations far faster than the interpreter, but the interpreter handles the full instruction set. Transfer cost: moving data between backends requires serialization, deserialization, and bandwidth.

A locally optimal assignment can be globally infeasible. Placing a block on PostgreSQL because it is cheapest for that block may leave no valid target for a downstream block that depends on closures. Placement is a combinatorial optimization problem, not a per-statement dispatch rule.

The placement framework is defined over an arbitrary set of execution targets, each occupying a different region of this space. Specialisation is a structural property: locality and efficiency cluster against capability, so no target can sit at the centroid. The named backends discussed below are one implementation; the surrounding ghosts indicate that the same discipline extends to anything else that fits inside the triangle.

Execution targets

Three targets are defined, each with its own capability profile and transition constraints.

Interpreter. The universal fallback. A tree-walking executor that can evaluate any MIR instruction. It is the slowest backend due to per-instruction overhead, but its presence guarantees that every block has at least one feasible assignment. The interpreter is the only backend that can execute effectful terminators (graph operations requiring runtime coordination).

PostgreSQL. The primary data source. MIR fragments are compiled into SQL and executed as prepared statements. PostgreSQL supports arithmetic, comparisons, and aggregate construction, but rejects closures, function calls, iteration, and values whose types collapse ambiguously under JSONB serialization. PostgreSQL is outgoing-only: queries can exit to other backends, but no backend can transition into PostgreSQL. Re-entry would duplicate the temporal seed query, and no case has been identified where this reduces total cost.

Embedding. Stores vector encodings of entities. Currently exists only in the analysis phase as a validation target for the three-target solver; no execution backend is implemented, as the underlying graph does not yet support a dedicated vector store.

Cost model

Three cost components govern the model: (1) per-block computation cost, the sum of statement and terminator costs on a given target; (2) per-edge local transition cost, the cost of serializing live-out locals plus a fixed overhead when a CFG edge crosses a backend boundary; and (3) per-block vertex path premium, the cost of retrieving vertex data from a remote origin backend. The third component is the one that drives the cost model’s shape: vertex data does not flow along the CFG, so it cannot be attributed to edges the way ordinary locals can.

Statement eligibility

Each statement and terminator receives a per-target cost. Supported operations get a finite cost reflecting the target’s relative efficiency; unsupported operations get infinity. Eligibility propagates through data dependencies: if a statement produces a value that is ineligible on a target, every downstream consumer of that value is also ineligible. This propagation is modeled as a forward must-analysis: a local is eligible only if it is computable on every path that reaches it.

The captured environment is checked per-field, not per-local. This allows individually representable captures to remain eligible even when other fields in the same environment are not. Vertex path access is restricted to projections into the target’s native backend by backend-specific predicates.

Why block-based path attribution

Vertex data does not flow along CFG edges. If an island on target Ω needs a path originating on target Δ, the runtime fetches directly from Δ, bypassing intermediate backends. This means edge-based attribution (charging transfer cost on CFG edges, as done for ordinary locals) does not work for vertex paths.

Three edge-based strategies were considered and rejected. The first, treating the vertex as a regular local, charges the full vertex size at every backend transition, even when a block never accesses vertex data. The second, suppressing the vertex from liveness entirely, charges zero, severely underestimating. The third, tracking individual paths through a dataflow lattice, cannot discharge a path when execution reaches the origin backend, because knowing whether a block is on the origin requires knowing the assignment the cost model is trying to determine: a circular dependency.

The solution is to charge the vertex path premium at the block, not the edge. For each path a block accesses whose origin differs from the block’s assigned target, the estimated transfer size is charged as a per-block premium. This is a uniform structural approximation: it usually overcounts along same-target chains where blocks share a fetch, but can undercount when a single block is reachable from multiple island entries. The overcount dominates in practice. The approximation is uniform across the entire CFG, which avoids the structural bias that edge-based or partially-deduplicated models introduce at join points and inside loops.

The separable objective

Under per-block path attribution, the placement objective decomposes cleanly into unary and pairwise terms:

E(x) = \sum_{bb_n \in B} c(bb_n, x_{bb_n}) + \sum_{e \in E} t_e\bigl(x_{\mathrm{src}(e)}, x_{\mathrm{dst}(e)}\bigr)

The per-block cost sums statement costs, the terminator cost, and the path transfer premium. The per-edge cost combines a fixed overhead (preventing spurious backend switches on zero-data transitions) with a data-proportional component based on the estimated sizes of live-out locals.

This formulation maps to a standard pairwise energy minimization problem, structurally equivalent to the Partitioned Boolean Quadratic Problem (PBQP) used in compiler register allocation. The ideal cost model would account for shared fetches across same-target island chains, but that introduces higher-order submodular structure that, combined with the non-metric asymmetric transition costs, multiple labels, and cycles in the CFG, makes the problem intractable. The separable approximation eliminates the higher-order term entirely.

Size estimation

Transfer costs require knowing how much data each value carries. The model uses information units (one unit per primitive value) with a cardinality axis for collections. Both dimensions are ranges with lower and upper bounds, because values flow through unions and join points where exact sizes are not statically determined. A static phase resolves sizes from type structure; a dynamic phase propagates sizes through assignments via forward dataflow, using affine equations over function argument sizes to handle inter-body dependencies.

Block splitting

The solver assigns one target per block, but statement-level eligibility can vary within a block. Block splitting subdivides each block into contiguous regions of uniform target support, connected by zero-cost Goto terminators. The inserted Goto itself is costed at zero, but the new edge still carries normal transition costs. This refines the placement granularity so that a single interpreter-only statement does not force the entire block onto the interpreter.

One example block, five program elements, three strategies for resolving intra-block eligibility variation. S₃ runs only on Γ; that restriction propagates to S₄ and the terminator via data dependency. Intersection collapses every row to the shared supported set, losing options. Per-instruction destroys the CFG’s notion that adjacent statements have local affinity. Block splitting cuts at the seam where the supported set actually changes, producing the fewest blocks that admit a uniform per-block assignment.

Solving

Exact global optimization of the objective is intractable. The solver first prunes infeasible assignments via AC-3 arc consistency over the CFG’s transition constraints, then runs a greedy forward pass over a topologically ordered DAG of placement regions, using local lookahead to estimate transition costs for unassigned neighbors. Small cyclic regions (SCCs) are solved exactly via branch-and-bound instead of greedy assignment, because loops compound transition costs with every iteration. Alternating adjustment passes re-evaluate each block with all neighbors assigned until no update decreases cost. A feasible assignment always exists: the interpreter is in every domain.

Fusion and islands

After solving, consecutive same-target blocks introduced by splitting are fused back together. The pass merges only blocks connected by argument-free Goto terminators with no block parameters and identical target assignments. Transitive chains are collapsed.

The fused layout produces execution islands: maximally connected components of same-target blocks. Each island is dispatched to its assigned backend as one unit by the orchestrator. Islands track a requires set (paths needed from external providers) and a provides set (paths made available to downstream consumers, populated on demand).

A condensed island dependency graph connects islands with three edge kinds: control-flow edges from cross-target CFG transitions, inheritance edges that let same-target islands share fetched data via dominance, and data-flow edges resolving each required path against a provider. When no existing island can provide a required path, a synthetic data island is inserted to fetch it from the origin backend.

The island dependency graph and per-block target assignments are the final output of placement and the input to code generation and execution.