Conventional systems are built to answer a question: given this input, what is the output? They are not built to answer the prior question: is any output licensed here at all?
This omission is not cosmetic. It creates a structural failure mode. In regions where no stable relationship exists, a forced-output system will still produce an answer, and may still attach a confidence score to that answer. The central claim of this work is that in such regions, confidence is not merely low. It is undefined.
Begin with the smallest possible forecasting problem: a binary process. In one regime, the process is fair and no structure exists. In another, a stable bias exists and can be learned. In a third, the bias drifts: structure is locally real, but it does not survive globally as a fixed rule.
This third case is the minimal witness. A system that must always predict the next outcome will continue to assert across the entire space. If it attaches confidence to those assertions, that confidence can only be derived from interpolation across neighboring regions or from smoothness assumptions built into the model. Neither is a mathematically valid substitute for local empirical support.
This is why the drifting-bias binary process functions as a “new XOR” problem. The difficulty is not raw prediction. The difficulty is that the correct behavior depends on distinguishing among three states that conventional systems collapse: stable structure, no structure, and insufficient evidence. A forced-output system cannot represent that distinction architecturally. An evidence-ledger system can.
In the demonstrator, rewards are symmetric: +1 for a correct assertion, −1 for an incorrect assertion, and 0 for abstention. Any advantage therefore arises from conditional accuracy and selective exposure, not from payout asymmetry. The system wins by refusing to assert where structure has not been earned.
A confidence score is not a decorative label attached to a guess. It is, implicitly, a claim about frequency: if the system issues outputs of this type under these conditions, what fraction of them are correct? That claim requires a measurable reference class.
In a supported region, such a reference class exists. The system has accumulated enough local exposure to estimate next-outcome behavior and to test whether that behavior survives perturbation. In an unsupported region, neither condition holds. There is no local empirical distribution from which a valid confidence estimate can be drawn.
Produces an answer everywhere. Confidence in sparse or unsupported regions is inherited from neighboring regions, global smoothness assumptions, or internal model geometry.
Produces an answer only where local evidence has accumulated, survived exposure, and crossed admissibility criteria. Elsewhere, no licensed output exists.
The mathematical problem is simple. If two nearby regions have different survival properties, interpolation between them does not preserve calibration. Local support is not a continuous quantity in the required sense. A model may smooth across a boundary, but the world need not. Once calibration is imported from another region, the resulting confidence value no longer estimates correctness in the region where it is being reported.
This is the core epistemic distinction of the architecture. Conventional systems say, in effect, “I must answer, so here is an answer with uncertainty attached.” GIE says: “If no stable relationship has survived here, no answer is licensed, and therefore no valid confidence can be attached.”
The foundational question of machine inference has long been: given this input, what should the output be? Enormous effort has gone into answering that question well — larger models, better optimization, richer priors, more compute, finer calibration.
GIE begins with a prior question: is any output licensed here at all?
This is not a refinement of the standard question. It is a prerequisite. A system that cannot answer the prior question cannot know when its answer to the standard question means anything. It can only produce outputs everywhere and hope that enough of them survive contact with reality.
What GIE makes visible is the difference between regions where a stable, repeatable relationship exists in the world and regions where it does not. This distinction always existed. The architecture of forced output had no home for it.
Every domain of inference — finance, medicine, language, navigation, scientific discovery, autonomous control — shares the same underlying hazard. A system that produces outputs in regions where no stable relationship exists will eventually assert structure that is not there. Not because it is careless, but because it was never equipped to detect absence.
The General Inference Engine was built around the question that conventional machinery leaves unresolved: has this relationship survived sufficient exposure to reality to be trusted?
Given an input, produce the best available output everywhere. Uncertainty is expressed as a score attached to a committed answer.
Accumulate evidence locally. Establish where stable relationships exist. Assert where they are supported. Remain silent where they are not — without confusion and without smoothing across the gap.
The difference is categorical. A system that must answer everywhere, however sophisticated its uncertainty handling, is doing something fundamentally different from a system that first determines whether any answer is earned.
The 2016 S&P 500 system was an early domain-specific instantiation of the same underlying principle formalized here. It accumulated local evidence, selected structure under out-of-sample pressure, and relied on survival rather than fit as the arbiter of authority.
GIE is the domain-independent generalization of that principle. KEY is the minimum sufficient cross-domain proof that the architecture can be expressed in small, inspectable, reusable form outside the original financial setting.
GIE is an instrument of discovery as much as it is an inference engine. By enforcing the prior question — does a stable relationship exist here? — it creates visibility into the structure of a domain that forced-output systems cannot provide.
What emerges under GIE that was not architecturally available before:
GIE operates on a small set of foundational commitments, each of which departs from conventional practice in a specific and consequential way.
Evolutionary search, genetic algorithms, and gradient-based optimization can serve as proposal mechanisms. They are not the core contribution. What defines the architecture is what happens after proposal: candidates are exposed to an independent world, tested under perturbation, and retained only if they survive. Surviving structures are then promoted into reusable building blocks for higher-level inference.
This is the difference between search with validation and search as the whole story. GIE does not optimize parameters inside a fixed hypothesis class and call that inference. It discovers structure, validates it under exposure, promotes it, and composes it upward.
Search methods explore what could survive. GIE operates on what has survived. It is not a search strategy; it is a credential system for participation in inference. Nothing enters the inference pipeline with authority unless it has earned that right through empirical exposure.
GIE sits above any particular modality or domain. Its role is the same whether the eventual output is a word, a trade, a diagnosis, or a movement command. The architecture governs whether an output is earned. What form the downstream output takes is a separate question.
Evidence-licensed assertion over market states — with principled abstention during drift or structural breakdown.
Diagnostic assertion only where accumulated exposure supports it; silence where it does not.
Claim validation and abstention at the sentence level — independent of generation.
Action licensing based on survived structure in the environment — not pattern-matched extrapolation.
Hypothesis evaluation under evidence, with explicit separation of provisional and confirmed structure.
Behavioral pattern licensing — with a principled third outcome alongside approve and block.
The KEY Emerges demo is the minimum sufficient realization of the architecture — enough to make its commitments visible in working code and observable behavior. It is not a stylized animation of a claim. It is a compact proof that the architecture can be instantiated cleanly, decoupled from narration, and reused across levels.
Evidence accumulates locally. Outcome tallies are tied to specific regions of representation space, not collapsed into global weights.
Assertion is earned, not assumed. A region asserts only when local exposure and outcome consistency meet admissibility criteria. Other regions remain non-assertive — not low-confidence, but unresolved.
Abstention is visible. Sparse and ambiguous regions are shown as unresolved rather than forced into a classification. The boundary of earned inference is directly observable.
Authority is revisable. As new observations arrive, previously provisional regions can strengthen, confirm, or collapse. Authority is never granted permanently.
Order does not determine outcome. The same body of evidence yields the same representation regardless of the sequence in which it arrived.
Voids are preserved. The demo does not smooth across unsupported gaps. The difference between supported and unsupported regions remains visible.
Higher-level admissibility emerges by promotion. Surviving local structures become legal inputs to higher composition, showing how abstraction can be built without granting authority to unsatisfied hypotheses.
GIE is not presented here as a theoretical proposal awaiting its first contact with data. The KEY demo shows the mechanism at minimum scale. Separate production systems built on the same underlying principles have operated continuously for decades on live S&P 500 closing-price data, with timestamped out-of-sample evaluation, in a domain where sustained failure is immediate and measurable.
These are distinct evidential claims, and both matter. The demo demonstrates mechanism. The production system demonstrates survival in a high-consequence, adversarial environment where noise is abundant and structure, where it exists, must be discovered and held under pressure.
GIE advances specific, testable propositions. Each can be examined, challenged, and disproven.
Forecasting is only possible if a system allows noise to cancel faster than structure decays. GIE operationalizes this principle directly: experience accumulates geometrically, randomness washes out, and stable structure becomes visible — not as a statistical artifact, but as what remains after repeated exposure.
In any domain where the cost of an unearned assertion is real — medicine, finance, autonomous systems, legal reasoning, scientific inference — the prior question is not optional. A system that cannot abstain will eventually assert a relationship that does not exist. In low-consequence domains this produces error. In high-consequence domains it produces harm.
Expanding the capability of systems that skip the prior question does not solve this. It enlarges the surface over which unearned assertions will eventually appear. More capable forced-output systems are more capable at the wrong level of the problem.