Epistemology

Ghost is an epistemic architecture for multi-agent trading analysis — it defines how machines form knowledge about markets, not just how they process market data.

This document defines how Ghost forms knowledge, what it can and cannot know, and how it validates its own understanding.

Knowledge Sources

Ghost’s knowledge comes from three categories:

Category	Sources	Refresh Rate	Decay
Market Data	OHLC, volume, indicators	Real-time to daily	Hours
Options Structure	OI, gamma, max pain	Daily	Days (weekly cycle)
News/Sentiment	Headlines, filings, macro	Real-time	Hours to days

Freshness Hierarchy:

Real-time flow data > Same-session analysis > Same-day KB > Stale (>12h)

Default freshness window: 12 hours. Data older than this is hidden unless explicitly requested (--all).

Confidence as Uncertainty

Every Ghost output includes a confidence score (0.0-1.0). This is NOT a prediction of correctness — it’s a measure of uncertainty about the analysis itself.

Confidence	Meaning	Typical Conditions
0.8-1.0	High agreement, clear structure	Multi-TF confluence, aligned agents
0.5-0.7	Mixed signals, some uncertainty	Partial confluence, agent disagreement
0.0-0.4	Low visibility, unclear structure	No confluence, stale data, chaos

Calibration Tracking: Ghost measures its own confidence accuracy over time. If 80% confident predictions hit only 60% of the time, the system is OVERCONFIDENT. This feedback adjusts future analysis.

Epistemological Boundaries

What Ghost CAN Know

Domain	What’s Knowable	Method
Structure	Where levels exist, confluence strength	Fib, MA, VWAP intersection
Positioning	Who is trapped/defending	Anchored VWAP analysis
Flow	What transactions occurred	Volume, delta, absorption
Options Mechanics	Where dealers hedge	Gamma, max pain, walls
Timing	Momentum state, squeeze detection	RSI, MACD, Keltner

What Ghost CANNOT Know

Domain	Why Unknowable	Implication
Future prices	Markets are probabilistic	Levels are zones, not guarantees
Trader intent	Only transactions visible	Infer from flow, don’t assert
Manipulation	Indistinguishable from flow	Present data, flag anomalies
Exact timing	Signals indicate condition, not clock	”Ready” not “now”
News impact duration	Sentiment shifts unpredictably	Flag, don’t weight

Context-Dependent Knowledge

The same data means different things in different contexts.

Day-of-Week Interpretation

Monday        →  Fresh structure, Friday's 0DTE gone
Tuesday-Wed   →  Structure adjusting, not final
Thursday      →  Pre-pin activity, structure maturing
Friday        →  Expiration mechanics dominate

See OPTIONS_HERMENEUTICS.md for full day-of-week framework.

Freshness Affects Meaning

Age	Treatment
<2h	Current — full weight
2-12h	Recent — weight normally
12-24h	Stale — flag as stale
>24h	Expired — hide by default

Options data decays fastest (weekly cycle). Fib structure decays slowest (months/years).

Data Hierarchy

When sources conflict, Ghost applies this hierarchy:

Fresh quantitative flow > Stale qualitative narrative. Always.

Priority	Source Type	Example
1	Real-time flow	Absorption at $158, 72% organic
2	Same-day options structure	Gamma flip at $160
3	Multi-TF confluence	Fib 61.8% + 50 SMA + VWAP cluster
4	News sentiment	”Bearish on tariff fears”

Flow is measured from actual transactions. Narrative is interpretation that may lag or be priced in. When they conflict, structure the analysis around flow-indicated levels.

Narrative Temporal Asymmetry: When price moves BEFORE a narrative catalyst, the observable flow data preceded the headline. This is not mysterious — information leaks through observable channels (military logistics, counterparty behavior, insurance repricing, speculative positioning) before it reaches headlines. The epistemic principle: if flow moved first, flow knew first. The narrative explains the flow retroactively but did not cause it. Attributing the move to the headline is a narrative attribution error — confusing the label with the cause. See docs/NARRATIVE_FORENSICS.md for the forensic methodology.

Anti-Knowledge (What Things Are NOT)

Levels Are Not:

Exact prices: They’re zones (±0.5-1%)
Guaranteed: Probabilities, not certainties
Permanent: Decay with time and tests

Max Pain Is Not:

A prediction: “Price will move to max pain” is WRONG
A magnet all week: Pull strength depends on DTE
Reliable alone: Requires OI concentration >25%

Signals Are Not:

Commands: They inform, not decide
Timers: Condition, not exact timing
Isolated: Context determines meaning

See ONTOLOGY.md for full Anti-Concepts section.

Verification Principles

Ghost validates its own knowledge through a feedback loop:

Prediction Tracking

Component	What’s Measured
Hit/Miss	Did price reach the predicted level?
Confidence Calibration	Accuracy by confidence bucket
Condition-Aware	Accuracy under specific conditions
Decay Weighting	Recent predictions weigh more (14-day half-life)

Guardrail Accuracy

Guardrails (sweep completion, max pain proximity, catalyst override) are measured separately. If a guardrail consistently misfires, it loses weight.

Self-Correction

Prediction → Verification → Feedback → Adjusted Confidence

Directors receive their track record in prompts. If they’ve been overconfident on “multi-TF squeeze” conditions, they see that history.

Prompt Interrogation

Ghost validates its own signal density through prompt interrogation — sending each director its real inputs but replacing output instructions with meta-questions: “Classify each input as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”

This is the input-side complement to prediction tracking (which validates outputs). Prediction tracking asks “were we right?” Interrogation asks “are we even reading the right data?”

Prompt Design → Interrogation → Compression → Regime Bump → Prediction Tracking

When interrogation reveals noise (e.g., MS Director classifying dealer dynamics as “too speculative”), that section is cut or gated. See INTERROGATION.md for the full process and scripts/interrogate/ for the eval scripts.

Prompt Regime Isolation

Feedback is scoped to the current prompt regime. When prompt strategy changes significantly (e.g., reordering signal priorities), predictions from the old regime are excluded from feedback queries. This prevents old accuracy patterns from contaminating the new regime’s self-correction loop.

Regime v1-v4 predictions → excluded from feedback
Regime v5 predictions → active in feedback loop

The regime constant (PROMPT_REGIME in ghost/db/predictions.py) is the single source of truth. See CONDITION_ACCURACY.md for details on how this affects scoring.

Bayesian Pattern Calibration

Playbooks (see ONTOLOGY.md section 8) are a second feedback loop — distinct from prediction tracking, which measures director accuracy, and from prompt interrogation, which measures signal density. Pattern calibration measures whether a recurring market pattern’s historical base rates and likelihood ratios remain valid as new data arrives. Where prediction tracking asks “was this call right?” and interrogation asks “are we reading the right data?”, pattern calibration asks “is this pattern still real?”

The playbook system imposes its own epistemic discipline, separate from the director feedback loop. These rules govern how knowledge about recurring patterns is formed, verified, and held.

Empirical Verification First

Before any playbook is built, the observed pattern must survive an empirical scan on historical data. Narrative framings are plausible; data is dispositive. A pattern the trader “sees in the tape” may not exist in measurable form — and if it doesn’t, building a playbook around it encodes a phantom.

Observed pattern → Empirical scan on YTD OHLCV → Verified shape or reshaped/deferred

Four of the first seven VST playbooks required reshaping during their framing surface because the empirical scan contradicted the initial narrative:

Playbook	Initial framing	What the scan revealed
`mine_t1_shakeout`	Pre-market wick retraces prior mine close	yfinance pre-market data unavailable — deferred to live capture
`recovery_t_plus_2`	Most mines recover within 2 days	5 of 8 completed mines recovered (62.5%), not 75% — Regime 2 calibration
`accumulation_disguised`	Mine + catalyst T+2/T+3 = accumulation bait	Only 2 of 9 mines had identifiable catalyst follow-through — captured as narrative signal, not standalone playbook
`closing_mark_anchor`	5x volume into the bell	93% of VST sessions satisfy the criterion — consolidated with sister playbook

The epistemic rule: scan first, frame second. A framing that survives empirical verification is load-bearing. A framing that fails it is noise dressed as insight.

Anti-Predictive Signals and the Tautology Trap

A signal whose likelihood ratio is mathematically equivalent to its outcome is tautological — it “predicts” by being the thing it predicts. The recovery_t_plus_2 build caught this during review: one signal (“close above prior mine open”) was structurally equivalent to the outcome metric by construction. Every positive instance satisfied it with probability 1; every negative instance satisfied it with probability 0. The likelihood ratio was infinite, the posterior was deterministic, and the signal carried zero real information.

The epistemic rule: signals must be logically disjoint from the outcome they help predict. Before shipping a playbook, verify that each mechanical signal produces non-degenerate likelihood ratios on the actual calibration data. Degenerate signals (LR → ∞, LR → 0, LR = 1.0) should be removed, not kept “for completeness.”

Definition criteria belong in the definition_criteria field and are excluded from Bayesian math by construction. Only features that vary across positive instances can serve as discriminating signals.

Small-Sample Shrinkage and the Calibration Threshold

Empirical frequencies on small samples are noisy. A pattern with 2/3 historical wins looks like 67%, but the confidence interval spans nearly the entire probability space. Playbooks with fewer than CALIBRATED_SAMPLE_THRESHOLD = 5 instances must not publish standalone posteriors — the math is not informative enough to override the prior.

Ghost applies Laplace +1/+2 smoothing and Beta-Binomial shrinkage to keep small-sample posteriors honest:

Sample size	Treatment
0-2 instances	Research Backlog entry. Documented but not callable.
3-4 instances	Either Regime 2 with transparent weakness flag, or Research Backlog.
5-9 instances	Shrinkage-corrected posteriors. Usable, but LLM consumer sees small-sample warning.
10+ instances	Normal confidence.

The epistemic rule: we do not pretend confidence we do not have. A thin sample is a fact about our knowledge, not a problem to be engineered away.

Data Limitations as Epistemic Limits

When the data cannot answer a question, the honest response is “we don’t know” — not “we’ll assume.” The tri-state signal model (see ONTOLOGY.md section 8) encodes this directly: signals can be present, absent, or unknown, and unknown is skipped in Bayesian inference rather than collapsed to absent.

This matters most when historical data is uneven. The early VST stop mine instances (Jan-Mar 2026) have options/flow signals marked unknown because GCS reports only go back to April. The Apr 2 and Apr 9 instances have them observed. Marking the early ones absent would penalize the playbook for our blind spot — systematically biasing posteriors downward as if the signal had been measured and found missing. That would be an epistemic error disguised as a data entry convention.

The likelihood estimator counts known instances (present + absent) in its denominator, not total. This is the honest treatment of missing data: do not let an inability to observe become evidence of absence.

The same principle blocks playbook construction for entire classes of patterns. mine_t1_shakeout is deferred indefinitely because yfinance does not publish pre-market wick data below the session open. No amount of cleverness with daily OHLCV can recover it. The epistemic answer is to wait for live-capture data, not to approximate the pattern from proxies.

Cross-Playbook Failure Mode Consistency

Playbooks are not independent instruments. When the same macro condition breaks multiple playbooks in the same way, the failure is a feature of the market state, not a bug in any single playbook. Consistent failure modes across the library are themselves evidence.

VST’s library documents two such cross-playbook failure modes:

Q4 earnings overhang — all historical instances that landed during active Q4 earnings uncertainty underperformed their playbook’s prior. This is not a pre_mine_setup failure or a recovery_t_plus_2 failure. It is a market-state filter that should modify interpretation of any VST playbook firing during that window.
Hormuz escalation regime — instances during active Hormuz closure (Feb 28 – Apr 7, 2026) produced systematically different outcomes than instances outside it. The 38-day crisis acted as a regime filter that the playbooks could not internalize from their own calibration data alone.

The epistemic rule: when a failure repeats across unrelated playbooks, it is evidence of a regime, not a noise tail. Surface it as a known failure condition in every affected playbook’s YAML, not just the one where it was first observed.

Calibration Regimes as Epistemic Status

The three calibration regimes (see ONTOLOGY.md section 8) are not administrative categories — they are epistemic status markers that shape how the LLM consumer should reason about a posterior.

Regime	Epistemic status	Consumer reasoning
1. Broad-market discrimination	Strong signal. Calibrated against the YTD non-instance pool with enough data to produce real likelihood ratios.	Treat a firing posterior as a directional signal with sizing weight.
2. Within-population discrimination	Weak context flag. Calibrated only within the conditioned population by outcome, with small samples and transparent uncertainty.	Treat a firing posterior as context, not conviction. Do not size on it alone.
3. Live-capture only	Epistemic placeholder. Pattern exists in the world but not yet in the data we can see.	Do not call the playbook. It lives in the Research Backlog until live capture produces enough instances.

A Regime 1 playbook firing at 70% and a Regime 2 playbook firing at 70% are not equivalent knowledge claims. The Regime 1 posterior is a confident statement about a well-calibrated pattern. The Regime 2 posterior is a low-confidence context flag that happened to exceed 50%. Conflating them would be an epistemic category error.

Online Calibration Tracking

Playbook calibration is not a one-time act. As new instances accumulate in live capture, the historical instance pool grows, and the likelihood ratios and base rates should be recomputed. A playbook that was well-calibrated at YTD=67 trading days may drift as the sample reaches 150 or 300.

Sub-task F of the VST module build (online feedback loop) exists for exactly this reason: to detect when live-observed outcomes diverge from the playbook’s published posteriors, and to flag the playbook for recalibration. The epistemic rule is the same as for prompt regime isolation — when the world changes, stop trusting the old math.

Conflict Rules

When agents disagree, Ghost follows strict rules:

Valid Conflicts

A conflict is when Director A says bullish and Director B says bearish on the same aspect. Different aspects are complementary data, not conflicts.

Valid Conflict	Not a Conflict
News bullish on earnings vs Technical bearish on extension	News bullish on catalyst vs Technical bearish on MA stack
Fib says support vs VWAP says distribution	Different agents, different domains

Conflict Reporting Rules

Only cite real agents — No invented sources
Quote actual data — “Sector News: +2.03% vs XLU” not “some agents disagree”
Note timeframe differences — If conclusions differ due to lookback period, say so
Empty is valid — Don’t fabricate conflicts to fill a section

The 50% Principle

Not all numbers are equal. The 50% retracement is reference only, not structural.

Level	Status	Usage
38.2%	Structural	List in key S/R
50%	Reference	Describe location only (“between 50% and 61.8%“)
61.8%	Structural (Golden)	Always list — the real floor/ceiling

See FIB_HERMENEUTICS.md for full treatment.

ONTOLOGY.md — What the concepts are
ETHICS.md — What Ghost must and must not do
PRAXIS.md — How knowledge becomes action
INTERROGATION.md — Prompt signal density evaluation process
OPTIONS_HERMENEUTICS.md — Day-of-week interpretation framework
PEDAGOGICAL.md — How to learn this system