$

Epistemology

Ghost is an epistemic architecture for multi-agent trading analysis — it defines how machines form knowledge about markets, not just how they process market data.

This document defines how Ghost forms knowledge, what it can and cannot know, and how it validates its own understanding.


Knowledge Sources

Ghost’s knowledge comes from three categories:

CategorySourcesRefresh RateDecay
Market DataOHLC, volume, indicatorsReal-time to dailyHours
Options StructureOI, gamma, max painDailyDays (weekly cycle)
News/SentimentHeadlines, filings, macroReal-timeHours to days

Freshness Hierarchy:

Real-time flow data > Same-session analysis > Same-day KB > Stale (>12h)

Default freshness window: 12 hours. Data older than this is hidden unless explicitly requested (--all).


Confidence as Uncertainty

Every Ghost output includes a confidence score (0.0-1.0). This is NOT a prediction of correctness — it’s a measure of uncertainty about the analysis itself.

ConfidenceMeaningTypical Conditions
0.8-1.0High agreement, clear structureMulti-TF confluence, aligned agents
0.5-0.7Mixed signals, some uncertaintyPartial confluence, agent disagreement
0.0-0.4Low visibility, unclear structureNo confluence, stale data, chaos

Calibration Tracking: Ghost measures its own confidence accuracy over time. If 80% confident predictions hit only 60% of the time, the system is OVERCONFIDENT. This feedback adjusts future analysis.


Epistemological Boundaries

What Ghost CAN Know

DomainWhat’s KnowableMethod
StructureWhere levels exist, confluence strengthFib, MA, VWAP intersection
PositioningWho is trapped/defendingAnchored VWAP analysis
FlowWhat transactions occurredVolume, delta, absorption
Options MechanicsWhere dealers hedgeGamma, max pain, walls
TimingMomentum state, squeeze detectionRSI, MACD, Keltner

What Ghost CANNOT Know

DomainWhy UnknowableImplication
Future pricesMarkets are probabilisticLevels are zones, not guarantees
Trader intentOnly transactions visibleInfer from flow, don’t assert
ManipulationIndistinguishable from flowPresent data, flag anomalies
Exact timingSignals indicate condition, not clock”Ready” not “now”
News impact durationSentiment shifts unpredictablyFlag, don’t weight

Context-Dependent Knowledge

The same data means different things in different contexts.

Day-of-Week Interpretation

Monday        →  Fresh structure, Friday's 0DTE gone
Tuesday-Wed   →  Structure adjusting, not final
Thursday      →  Pre-pin activity, structure maturing
Friday        →  Expiration mechanics dominate

See OPTIONS_HERMENEUTICS.md for full day-of-week framework.

Freshness Affects Meaning

AgeTreatment
<2hCurrent — full weight
2-12hRecent — weight normally
12-24hStale — flag as stale
>24hExpired — hide by default

Options data decays fastest (weekly cycle). Fib structure decays slowest (months/years).


Data Hierarchy

When sources conflict, Ghost applies this hierarchy:

Fresh quantitative flow > Stale qualitative narrative. Always.
PrioritySource TypeExample
1Real-time flowAbsorption at $158, 72% organic
2Same-day options structureGamma flip at $160
3Multi-TF confluenceFib 61.8% + 50 SMA + VWAP cluster
4News sentiment”Bearish on tariff fears”

Flow is measured from actual transactions. Narrative is interpretation that may lag or be priced in. When they conflict, structure the analysis around flow-indicated levels.

Narrative Temporal Asymmetry: When price moves BEFORE a narrative catalyst, the observable flow data preceded the headline. This is not mysterious — information leaks through observable channels (military logistics, counterparty behavior, insurance repricing, speculative positioning) before it reaches headlines. The epistemic principle: if flow moved first, flow knew first. The narrative explains the flow retroactively but did not cause it. Attributing the move to the headline is a narrative attribution error — confusing the label with the cause. See docs/NARRATIVE_FORENSICS.md for the forensic methodology.


Anti-Knowledge (What Things Are NOT)

Levels Are Not:

Max Pain Is Not:

Signals Are Not:

See ONTOLOGY.md for full Anti-Concepts section.


Verification Principles

Ghost validates its own knowledge through a feedback loop:

Prediction Tracking

ComponentWhat’s Measured
Hit/MissDid price reach the predicted level?
Confidence CalibrationAccuracy by confidence bucket
Condition-AwareAccuracy under specific conditions
Decay WeightingRecent predictions weigh more (14-day half-life)

Guardrail Accuracy

Guardrails (sweep completion, max pain proximity, catalyst override) are measured separately. If a guardrail consistently misfires, it loses weight.

Self-Correction

Prediction → Verification → Feedback → Adjusted Confidence

Directors receive their track record in prompts. If they’ve been overconfident on “multi-TF squeeze” conditions, they see that history.

Prompt Interrogation

Ghost validates its own signal density through prompt interrogation — sending each director its real inputs but replacing output instructions with meta-questions: “Classify each input as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”

This is the input-side complement to prediction tracking (which validates outputs). Prediction tracking asks “were we right?” Interrogation asks “are we even reading the right data?”

Prompt Design → Interrogation → Compression → Regime Bump → Prediction Tracking

When interrogation reveals noise (e.g., MS Director classifying dealer dynamics as “too speculative”), that section is cut or gated. See INTERROGATION.md for the full process and scripts/interrogate/ for the eval scripts.

Prompt Regime Isolation

Feedback is scoped to the current prompt regime. When prompt strategy changes significantly (e.g., reordering signal priorities), predictions from the old regime are excluded from feedback queries. This prevents old accuracy patterns from contaminating the new regime’s self-correction loop.

Regime v1-v4 predictions → excluded from feedback
Regime v5 predictions → active in feedback loop

The regime constant (PROMPT_REGIME in ghost/db/predictions.py) is the single source of truth. See CONDITION_ACCURACY.md for details on how this affects scoring.


Bayesian Pattern Calibration

Playbooks (see ONTOLOGY.md section 8) are a second feedback loop — distinct from prediction tracking, which measures director accuracy, and from prompt interrogation, which measures signal density. Pattern calibration measures whether a recurring market pattern’s historical base rates and likelihood ratios remain valid as new data arrives. Where prediction tracking asks “was this call right?” and interrogation asks “are we reading the right data?”, pattern calibration asks “is this pattern still real?”

The playbook system imposes its own epistemic discipline, separate from the director feedback loop. These rules govern how knowledge about recurring patterns is formed, verified, and held.

Empirical Verification First

Before any playbook is built, the observed pattern must survive an empirical scan on historical data. Narrative framings are plausible; data is dispositive. A pattern the trader “sees in the tape” may not exist in measurable form — and if it doesn’t, building a playbook around it encodes a phantom.

Observed pattern → Empirical scan on YTD OHLCV → Verified shape or reshaped/deferred

Four of the first seven VST playbooks required reshaping during their framing surface because the empirical scan contradicted the initial narrative:

PlaybookInitial framingWhat the scan revealed
mine_t1_shakeoutPre-market wick retraces prior mine closeyfinance pre-market data unavailable — deferred to live capture
recovery_t_plus_2Most mines recover within 2 days5 of 8 completed mines recovered (62.5%), not 75% — Regime 2 calibration
accumulation_disguisedMine + catalyst T+2/T+3 = accumulation baitOnly 2 of 9 mines had identifiable catalyst follow-through — captured as narrative signal, not standalone playbook
closing_mark_anchor5x volume into the bell93% of VST sessions satisfy the criterion — consolidated with sister playbook

The epistemic rule: scan first, frame second. A framing that survives empirical verification is load-bearing. A framing that fails it is noise dressed as insight.

Anti-Predictive Signals and the Tautology Trap

A signal whose likelihood ratio is mathematically equivalent to its outcome is tautological — it “predicts” by being the thing it predicts. The recovery_t_plus_2 build caught this during review: one signal (“close above prior mine open”) was structurally equivalent to the outcome metric by construction. Every positive instance satisfied it with probability 1; every negative instance satisfied it with probability 0. The likelihood ratio was infinite, the posterior was deterministic, and the signal carried zero real information.

The epistemic rule: signals must be logically disjoint from the outcome they help predict. Before shipping a playbook, verify that each mechanical signal produces non-degenerate likelihood ratios on the actual calibration data. Degenerate signals (LR → ∞, LR → 0, LR = 1.0) should be removed, not kept “for completeness.”

Definition criteria belong in the definition_criteria field and are excluded from Bayesian math by construction. Only features that vary across positive instances can serve as discriminating signals.

Small-Sample Shrinkage and the Calibration Threshold

Empirical frequencies on small samples are noisy. A pattern with 2/3 historical wins looks like 67%, but the confidence interval spans nearly the entire probability space. Playbooks with fewer than CALIBRATED_SAMPLE_THRESHOLD = 5 instances must not publish standalone posteriors — the math is not informative enough to override the prior.

Ghost applies Laplace +1/+2 smoothing and Beta-Binomial shrinkage to keep small-sample posteriors honest:

Sample sizeTreatment
0-2 instancesResearch Backlog entry. Documented but not callable.
3-4 instancesEither Regime 2 with transparent weakness flag, or Research Backlog.
5-9 instancesShrinkage-corrected posteriors. Usable, but LLM consumer sees small-sample warning.
10+ instancesNormal confidence.

The epistemic rule: we do not pretend confidence we do not have. A thin sample is a fact about our knowledge, not a problem to be engineered away.

Data Limitations as Epistemic Limits

When the data cannot answer a question, the honest response is “we don’t know” — not “we’ll assume.” The tri-state signal model (see ONTOLOGY.md section 8) encodes this directly: signals can be present, absent, or unknown, and unknown is skipped in Bayesian inference rather than collapsed to absent.

This matters most when historical data is uneven. The early VST stop mine instances (Jan-Mar 2026) have options/flow signals marked unknown because GCS reports only go back to April. The Apr 2 and Apr 9 instances have them observed. Marking the early ones absent would penalize the playbook for our blind spot — systematically biasing posteriors downward as if the signal had been measured and found missing. That would be an epistemic error disguised as a data entry convention.

The likelihood estimator counts known instances (present + absent) in its denominator, not total. This is the honest treatment of missing data: do not let an inability to observe become evidence of absence.

The same principle blocks playbook construction for entire classes of patterns. mine_t1_shakeout is deferred indefinitely because yfinance does not publish pre-market wick data below the session open. No amount of cleverness with daily OHLCV can recover it. The epistemic answer is to wait for live-capture data, not to approximate the pattern from proxies.

Cross-Playbook Failure Mode Consistency

Playbooks are not independent instruments. When the same macro condition breaks multiple playbooks in the same way, the failure is a feature of the market state, not a bug in any single playbook. Consistent failure modes across the library are themselves evidence.

VST’s library documents two such cross-playbook failure modes:

  1. Q4 earnings overhang — all historical instances that landed during active Q4 earnings uncertainty underperformed their playbook’s prior. This is not a pre_mine_setup failure or a recovery_t_plus_2 failure. It is a market-state filter that should modify interpretation of any VST playbook firing during that window.
  2. Hormuz escalation regime — instances during active Hormuz closure (Feb 28 – Apr 7, 2026) produced systematically different outcomes than instances outside it. The 38-day crisis acted as a regime filter that the playbooks could not internalize from their own calibration data alone.

The epistemic rule: when a failure repeats across unrelated playbooks, it is evidence of a regime, not a noise tail. Surface it as a known failure condition in every affected playbook’s YAML, not just the one where it was first observed.

Calibration Regimes as Epistemic Status

The three calibration regimes (see ONTOLOGY.md section 8) are not administrative categories — they are epistemic status markers that shape how the LLM consumer should reason about a posterior.

RegimeEpistemic statusConsumer reasoning
1. Broad-market discriminationStrong signal. Calibrated against the YTD non-instance pool with enough data to produce real likelihood ratios.Treat a firing posterior as a directional signal with sizing weight.
2. Within-population discriminationWeak context flag. Calibrated only within the conditioned population by outcome, with small samples and transparent uncertainty.Treat a firing posterior as context, not conviction. Do not size on it alone.
3. Live-capture onlyEpistemic placeholder. Pattern exists in the world but not yet in the data we can see.Do not call the playbook. It lives in the Research Backlog until live capture produces enough instances.

A Regime 1 playbook firing at 70% and a Regime 2 playbook firing at 70% are not equivalent knowledge claims. The Regime 1 posterior is a confident statement about a well-calibrated pattern. The Regime 2 posterior is a low-confidence context flag that happened to exceed 50%. Conflating them would be an epistemic category error.

Online Calibration Tracking

Playbook calibration is not a one-time act. As new instances accumulate in live capture, the historical instance pool grows, and the likelihood ratios and base rates should be recomputed. A playbook that was well-calibrated at YTD=67 trading days may drift as the sample reaches 150 or 300.

Sub-task F of the VST module build (online feedback loop) exists for exactly this reason: to detect when live-observed outcomes diverge from the playbook’s published posteriors, and to flag the playbook for recalibration. The epistemic rule is the same as for prompt regime isolation — when the world changes, stop trusting the old math.


Conflict Rules

When agents disagree, Ghost follows strict rules:

Valid Conflicts

A conflict is when Director A says bullish and Director B says bearish on the same aspect. Different aspects are complementary data, not conflicts.

Valid ConflictNot a Conflict
News bullish on earnings vs Technical bearish on extensionNews bullish on catalyst vs Technical bearish on MA stack
Fib says support vs VWAP says distributionDifferent agents, different domains

Conflict Reporting Rules

  1. Only cite real agents — No invented sources
  2. Quote actual data — “Sector News: +2.03% vs XLU” not “some agents disagree”
  3. Note timeframe differences — If conclusions differ due to lookback period, say so
  4. Empty is valid — Don’t fabricate conflicts to fill a section

The 50% Principle

Not all numbers are equal. The 50% retracement is reference only, not structural.

LevelStatusUsage
38.2%StructuralList in key S/R
50%ReferenceDescribe location only (“between 50% and 61.8%“)
61.8%Structural (Golden)Always list — the real floor/ceiling

See FIB_HERMENEUTICS.md for full treatment.