Type System

Structural by default, nominal when it matters

HashQL’s type system uses structural subtyping with Hindley-Milner style inference. Nominal identity is opt-in through newtypes. Type information is preserved through every compiler stage, guiding placement eligibility, serialization, and cost estimation.

Structural subtyping

Two types are compatible if their shapes are compatible, regardless of what they are called. A struct with fields name: String and age: Number is assignable to any type that expects those fields, whether or not the types share a declared relationship.

This is a deliberate departure from nominal type systems, where compatibility requires explicit declarations like implements or extends. The HASH graph defines entity types through an external ontology that evolves independently of query code. Requiring nominal relationships between every pair of compatible types would be impractical. Structural subtyping lets the compiler check compatibility based on what the data actually looks like.

Subtyping follows width and depth rules. Width subtyping: a struct with more fields is a subtype of one with fewer. If a function expects (name: String), you can pass (name: String, age: Number). Depth subtyping: subtyping recurses into field types. If A is a subtype of B, then (x: A) is a subtype of (x: B).

For closed structs, domains must match and each shared field compares pointwise:

\mathrm{dom}(S_1) = \mathrm{dom}(S_2) \;\land\; \forall \ell \in \mathrm{dom}(S_1).\, \mathrm{proj}(S_1,\ell) <: \mathrm{proj}(S_2,\ell) \;\Rightarrow\; S_1 <: S_2

For open structs as supertypes, the subtype may carry extra fields; only fields required by the supertype are compared:

\mathrm{dom}(S_1) \supseteq \mathrm{dom}(S_2) \;\land\; \forall \ell \in \mathrm{dom}(S_2).\, \mathrm{proj}(S_1,\ell) <: \mathrm{proj}(S_2,\ell) \;\Rightarrow\; S_1 <: S_2

Tuples are width-invariant: arity must match and elements compare pointwise (depth subtyping):

\mathrm{dom}(S_1) = \mathrm{dom}(S_2) \;\land\; \forall i \in \mathrm{dom}(S_1).\, T_i <: U_i \;\Rightarrow\; S_1 <: S_2

Nominal opt-in through newtypes

Structural typing alone cannot distinguish semantically different values that share the same shape. A UUID and a text string are both representable as String, but they should not be interchangeable. Newtypes (opaque types) wrap a structural type with a unique nominal identity, creating a boundary that structural subtyping cannot cross. The wrapper is zero-cost: it exists in the type system but has no runtime representation.

This makes the type system structural by default, nominal opt-in. Structural compatibility is the default mode of reasoning. When semantic identity must be preserved (identifiers, temporal bound markers, domain-specific wrappers), a newtype introduces nominal isolation explicitly.

Type constructors create newtype values. Each constructor is a closure that accepts the inner value and produces a value of the nominal type. For unit newtypes, the constructor takes no arguments. Because constructors are closures, they participate in higher-order functions: you can map a constructor over a list, pass it as an argument, or store it in a variable. In the MIR, constructors are lowered as separate bodies with closure calling conventions: an empty environment, but always a fat pointer, because call sites cannot statically determine whether a callee is a thin or fat pointer.

Newtypes as sum type variants

HashQL does not have tagged unions or discriminated variants as a built-in language construct. Instead, each variant of a sum type is a newtype, and the sum is a union of those newtypes. The nominal identity of each newtype serves as the discriminant:

newtype Ok<T> = T in
newtype Err<E> = E in
type Result<T, E> = Ok<T> | Err<E> in
...

Ok and Err both wrap their inner value directly. Structurally, Ok<String> and Err<String> are identical; nominally they are distinct. The union Result composes the two variants without a tagged discriminant field. Pattern matching (planned) will dispatch on the nominal identity of the newtype, not on a runtime tag.

Nominal opt-in via newtype. Ok<T> and Err<E> are opaque wrappers; structurally identical values remain distinct. Their join is the explicit union. No pointwise collapse occurs.

Type hierarchy

The type system is organized as a lattice. At the top is the unknown type (⊤), the supertype of everything. At the bottom is the never type (⊥), a subtype of everything and uninhabitable: no value can have type ⊥. Between them, types are organized by structural compatibility.

Five primitives. Integer <: Number is the only primitive subtyping relation (S-INT). All others are unrelated.

\bot <: T,\quad T <: \top

T_1 \sqcap T_2 <: T_1,\quad T_2 <: T_1 \sqcup T_2

Primitives

The primitive types are Boolean, Number, String, and Null. Numbers are arbitrary precision. In the MIR, booleans are lowered to 1-bit integers, which unifies logical and bitwise operations: for a single-bit integer, bitwise AND/OR are equivalent to logical AND/OR, eliminating the need for separate logical operators in the instruction set.

Composite types

Structs are records with named fields, written as (a: Integer, b: String). They come in two flavors. Closed structs have an exactly known field set. Open structs, written with a leading # as #(a: Integer, b: String), permit additional unknown fields and arise naturally from projections into types whose full shape is not constrained by the query. The distinction matters for width subtyping: an open struct is a supertype of any closed struct that contains (at least) its known fields.

Width subtyping (S-STRC2): any struct <: an open struct when its domain is a superset and fields compare pointwise. The two closed structs are incomparable (S-STRC1 requires matching domains).

Tuples are fixed-width, positionally accessed heterogeneous containers. Unlike structs, fields are accessed by index rather than name. Tuples are width-invariant: during type simplification, tuples of the same width in a union are joined pointwise (their fields are unified element by element). After simplification, any remaining distinct tuple members differ in width.

Lists are dynamically sized, homogeneous containers. The element type is tracked statically. Lists are covariant over their element type.

Dictionaries are key-value collections where both the key type and value type are tracked. Unlike structs, the set of keys is not statically known.

Closure types

Closures are the function type: A → B where A is the parameter type and B is the return type. Closure subtyping is contravariant in the parameter and covariant in the return: a closure that accepts a wider input and returns a narrower output can safely substitute for one that is more restrictive. This is the standard function subtyping rule from System F<:.

Union and intersection types

Union types (A | B) represent values that could be either type. Intersection types (A & B) represent values that must satisfy both types simultaneously. Both arise during type inference: a conditional whose branches return different types produces a union; a value constrained by two independent requirements produces an intersection.

Type simplification normalizes these compositions. Redundant members are eliminated: if A is a subtype of B, then $A <: B \Rightarrow A \mid B = B$ . Intersections tighten to the overlap of their members, degenerating to ⊥ when no overlap exists. Struct and tuple unions are joined pointwise, as described above, which is critical for preserving distinguishability in downstream stages.

Type inference

HashQL uses Hindley-Milner style type inference extended with subtyping constraints. The programmer rarely writes type annotations; the compiler derives types from how values are used.

Inference proceeds in two phases. First, the compiler generates type constraints by walking the HIR. Each operation produces constraints: a property access constrains the target to have that field; a comparison constrains both operands to be compatible; a function call constrains arguments to match parameters. Second, constraint solving unifies types and propagates information through the constraint graph until a fixpoint is reached.

HM-style inference extended with subtyping. The HIR walk emits per-operation constraints into a monotone bag, and the solver alternates unification with subtype propagation until a fixpoint substitution remains. Classic HM uses equality; HashQL adds A <: B, so a more specific type may flow into a less specific one without widening by hand.

Where classic Hindley-Milner generates equality constraints $A = B$ , the extended system generates subtyping constraints $A <: B$ . This allows the compiler to accept programs that pass a more specific type where a less specific one is expected, without requiring the programmer to explicitly widen.

Ontology types referenced in the query are resolved by loading their definitions from the graph at compile time. When a query accesses a property path on an entity, the type checker verifies that the path exists on the entity’s declared type and that the resulting value type is consistent with how the query uses it. Ontology types are not separate from program types; they are program types. This bridges the external ontology and the language’s type system without a separate mapping layer.

Types through the pipeline

Type information is not discarded after type checking. The HIR carries resolved types. The MIR preserves them: each local carries the type determined during type checking or derived during transformation. This preservation is a compiler invariant, not an incidental convenience.

Downstream stages depend on it. The placement analysis uses types to determine which backends can represent a value: certain type combinations collapse ambiguously under serialization, and the analysis must reject them from specific backends. The size estimation pass computes transfer costs from type structure: primitives carry one information unit, closed structs and tuples sum their fields, dynamic containers are deferred to a forward dataflow analysis. The code generation phase uses types to guide serialization and deserialization at backend boundaries, reconstructing the original value using the expected type at each consumption site.