Epistemology
Ghost is an epistemic architecture for multi-agent trading analysis — it defines how machines form knowledge about markets, not just how they process market data.
This document defines how Ghost forms knowledge, what it can and cannot know, and how it validates its own understanding.
Knowledge Sources
Ghost’s knowledge comes from three categories:
| Category | Sources | Refresh Rate | Decay |
|---|---|---|---|
| Market Data | OHLC, volume, indicators | Real-time to daily | Hours |
| Options Structure | OI, gamma, max pain | Daily | Days (weekly cycle) |
| News/Sentiment | Headlines, filings, macro | Real-time | Hours to days |
Freshness Hierarchy:
Real-time flow data > Same-session analysis > Same-day KB > Stale (>12h)
Default freshness window: 12 hours. Data older than this is hidden unless explicitly requested (--all).
Confidence as Uncertainty
Every Ghost output includes a confidence score (0.0-1.0). This is NOT a prediction of correctness — it’s a measure of uncertainty about the analysis itself.
| Confidence | Meaning | Typical Conditions |
|---|---|---|
| 0.8-1.0 | High agreement, clear structure | Multi-TF confluence, aligned agents |
| 0.5-0.7 | Mixed signals, some uncertainty | Partial confluence, agent disagreement |
| 0.0-0.4 | Low visibility, unclear structure | No confluence, stale data, chaos |
Calibration Tracking: Ghost measures its own confidence accuracy over time. If 80% confident predictions hit only 60% of the time, the system is OVERCONFIDENT. This feedback adjusts future analysis.
Epistemological Boundaries
What Ghost CAN Know
| Domain | What’s Knowable | Method |
|---|---|---|
| Structure | Where levels exist, confluence strength | Fib, MA, VWAP intersection |
| Positioning | Who is trapped/defending | Anchored VWAP analysis |
| Flow | What transactions occurred | Volume, delta, absorption |
| Options Mechanics | Where dealers hedge | Gamma, max pain, walls |
| Timing | Momentum state, squeeze detection | RSI, MACD, Keltner |
What Ghost CANNOT Know
| Domain | Why Unknowable | Implication |
|---|---|---|
| Future prices | Markets are probabilistic | Levels are zones, not guarantees |
| Trader intent | Only transactions visible | Infer from flow, don’t assert |
| Manipulation | Indistinguishable from flow | Present data, flag anomalies |
| Exact timing | Signals indicate condition, not clock | ”Ready” not “now” |
| News impact duration | Sentiment shifts unpredictably | Flag, don’t weight |
Context-Dependent Knowledge
The same data means different things in different contexts.
Day-of-Week Interpretation
Monday → Fresh structure, Friday's 0DTE gone
Tuesday-Wed → Structure adjusting, not final
Thursday → Pre-pin activity, structure maturing
Friday → Expiration mechanics dominate
See OPTIONS_HERMENEUTICS.md for full day-of-week framework.
Freshness Affects Meaning
| Age | Treatment |
|---|---|
| <2h | Current — full weight |
| 2-12h | Recent — weight normally |
| 12-24h | Stale — flag as stale |
| >24h | Expired — hide by default |
Options data decays fastest (weekly cycle). Fib structure decays slowest (months/years).
Data Hierarchy
When sources conflict, Ghost applies this hierarchy:
Fresh quantitative flow > Stale qualitative narrative. Always.
| Priority | Source Type | Example |
|---|---|---|
| 1 | Real-time flow | Absorption at $158, 72% organic |
| 2 | Same-day options structure | Gamma flip at $160 |
| 3 | Multi-TF confluence | Fib 61.8% + 50 SMA + VWAP cluster |
| 4 | News sentiment | ”Bearish on tariff fears” |
Flow is measured from actual transactions. Narrative is interpretation that may lag or be priced in. When they conflict, structure the analysis around flow-indicated levels.
Narrative Temporal Asymmetry: When price moves BEFORE a narrative catalyst, the observable flow data preceded the headline. This is not mysterious — information leaks through observable channels (military logistics, counterparty behavior, insurance repricing, speculative positioning) before it reaches headlines. The epistemic principle: if flow moved first, flow knew first. The narrative explains the flow retroactively but did not cause it. Attributing the move to the headline is a narrative attribution error — confusing the label with the cause. See docs/NARRATIVE_FORENSICS.md for the forensic methodology.
Anti-Knowledge (What Things Are NOT)
Levels Are Not:
- Exact prices: They’re zones (±0.5-1%)
- Guaranteed: Probabilities, not certainties
- Permanent: Decay with time and tests
Max Pain Is Not:
- A prediction: “Price will move to max pain” is WRONG
- A magnet all week: Pull strength depends on DTE
- Reliable alone: Requires OI concentration >25%
Signals Are Not:
- Commands: They inform, not decide
- Timers: Condition, not exact timing
- Isolated: Context determines meaning
See ONTOLOGY.md for full Anti-Concepts section.
Verification Principles
Ghost validates its own knowledge through a feedback loop:
Prediction Tracking
| Component | What’s Measured |
|---|---|
| Hit/Miss | Did price reach the predicted level? |
| Confidence Calibration | Accuracy by confidence bucket |
| Condition-Aware | Accuracy under specific conditions |
| Decay Weighting | Recent predictions weigh more (14-day half-life) |
Guardrail Accuracy
Guardrails (sweep completion, max pain proximity, catalyst override) are measured separately. If a guardrail consistently misfires, it loses weight.
Self-Correction
Prediction → Verification → Feedback → Adjusted Confidence
Directors receive their track record in prompts. If they’ve been overconfident on “multi-TF squeeze” conditions, they see that history.
Prompt Interrogation
Ghost validates its own signal density through prompt interrogation — sending each director its real inputs but replacing output instructions with meta-questions: “Classify each input as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”
This is the input-side complement to prediction tracking (which validates outputs). Prediction tracking asks “were we right?” Interrogation asks “are we even reading the right data?”
Prompt Design → Interrogation → Compression → Regime Bump → Prediction Tracking
When interrogation reveals noise (e.g., MS Director classifying dealer dynamics as “too speculative”), that section is cut or gated. See INTERROGATION.md for the full process and scripts/interrogate/ for the eval scripts.
Prompt Regime Isolation
Feedback is scoped to the current prompt regime. When prompt strategy changes significantly (e.g., reordering signal priorities), predictions from the old regime are excluded from feedback queries. This prevents old accuracy patterns from contaminating the new regime’s self-correction loop.
Regime v1-v4 predictions → excluded from feedback
Regime v5 predictions → active in feedback loop
The regime constant (PROMPT_REGIME in ghost/db/predictions.py) is the single source of truth. See CONDITION_ACCURACY.md for details on how this affects scoring.
Bayesian Pattern Calibration
Playbooks (see ONTOLOGY.md section 8) are a second feedback loop — distinct from prediction tracking, which measures director accuracy, and from prompt interrogation, which measures signal density. Pattern calibration measures whether a recurring market pattern’s historical base rates and likelihood ratios remain valid as new data arrives. Where prediction tracking asks “was this call right?” and interrogation asks “are we reading the right data?”, pattern calibration asks “is this pattern still real?”
The playbook system imposes its own epistemic discipline, separate from the director feedback loop. These rules govern how knowledge about recurring patterns is formed, verified, and held.
Empirical Verification First
Before any playbook is built, the observed pattern must survive an empirical scan on historical data. Narrative framings are plausible; data is dispositive. A pattern the trader “sees in the tape” may not exist in measurable form — and if it doesn’t, building a playbook around it encodes a phantom.
Observed pattern → Empirical scan on YTD OHLCV → Verified shape or reshaped/deferred
Four of the first seven VST playbooks required reshaping during their framing surface because the empirical scan contradicted the initial narrative:
| Playbook | Initial framing | What the scan revealed |
|---|---|---|
mine_t1_shakeout | Pre-market wick retraces prior mine close | yfinance pre-market data unavailable — deferred to live capture |
recovery_t_plus_2 | Most mines recover within 2 days | 5 of 8 completed mines recovered (62.5%), not 75% — Regime 2 calibration |
accumulation_disguised | Mine + catalyst T+2/T+3 = accumulation bait | Only 2 of 9 mines had identifiable catalyst follow-through — captured as narrative signal, not standalone playbook |
closing_mark_anchor | 5x volume into the bell | 93% of VST sessions satisfy the criterion — consolidated with sister playbook |
The epistemic rule: scan first, frame second. A framing that survives empirical verification is load-bearing. A framing that fails it is noise dressed as insight.
Anti-Predictive Signals and the Tautology Trap
A signal whose likelihood ratio is mathematically equivalent to its outcome is tautological — it “predicts” by being the thing it predicts. The recovery_t_plus_2 build caught this during review: one signal (“close above prior mine open”) was structurally equivalent to the outcome metric by construction. Every positive instance satisfied it with probability 1; every negative instance satisfied it with probability 0. The likelihood ratio was infinite, the posterior was deterministic, and the signal carried zero real information.
The epistemic rule: signals must be logically disjoint from the outcome they help predict. Before shipping a playbook, verify that each mechanical signal produces non-degenerate likelihood ratios on the actual calibration data. Degenerate signals (LR → ∞, LR → 0, LR = 1.0) should be removed, not kept “for completeness.”
Definition criteria belong in the definition_criteria field and are excluded from Bayesian math by construction. Only features that vary across positive instances can serve as discriminating signals.
Small-Sample Shrinkage and the Calibration Threshold
Empirical frequencies on small samples are noisy. A pattern with 2/3 historical wins looks like 67%, but the confidence interval spans nearly the entire probability space. Playbooks with fewer than CALIBRATED_SAMPLE_THRESHOLD = 5 instances must not publish standalone posteriors — the math is not informative enough to override the prior.
Ghost applies Laplace +1/+2 smoothing and Beta-Binomial shrinkage to keep small-sample posteriors honest:
| Sample size | Treatment |
|---|---|
| 0-2 instances | Research Backlog entry. Documented but not callable. |
| 3-4 instances | Either Regime 2 with transparent weakness flag, or Research Backlog. |
| 5-9 instances | Shrinkage-corrected posteriors. Usable, but LLM consumer sees small-sample warning. |
| 10+ instances | Normal confidence. |
The epistemic rule: we do not pretend confidence we do not have. A thin sample is a fact about our knowledge, not a problem to be engineered away.
Data Limitations as Epistemic Limits
When the data cannot answer a question, the honest response is “we don’t know” — not “we’ll assume.” The tri-state signal model (see ONTOLOGY.md section 8) encodes this directly: signals can be present, absent, or unknown, and unknown is skipped in Bayesian inference rather than collapsed to absent.
This matters most when historical data is uneven. The early VST stop mine instances (Jan-Mar 2026) have options/flow signals marked unknown because GCS reports only go back to April. The Apr 2 and Apr 9 instances have them observed. Marking the early ones absent would penalize the playbook for our blind spot — systematically biasing posteriors downward as if the signal had been measured and found missing. That would be an epistemic error disguised as a data entry convention.
The likelihood estimator counts known instances (present + absent) in its denominator, not total. This is the honest treatment of missing data: do not let an inability to observe become evidence of absence.
The same principle blocks playbook construction for entire classes of patterns. mine_t1_shakeout is deferred indefinitely because yfinance does not publish pre-market wick data below the session open. No amount of cleverness with daily OHLCV can recover it. The epistemic answer is to wait for live-capture data, not to approximate the pattern from proxies.
Cross-Playbook Failure Mode Consistency
Playbooks are not independent instruments. When the same macro condition breaks multiple playbooks in the same way, the failure is a feature of the market state, not a bug in any single playbook. Consistent failure modes across the library are themselves evidence.
VST’s library documents two such cross-playbook failure modes:
- Q4 earnings overhang — all historical instances that landed during active Q4 earnings uncertainty underperformed their playbook’s prior. This is not a
pre_mine_setupfailure or arecovery_t_plus_2failure. It is a market-state filter that should modify interpretation of any VST playbook firing during that window. - Hormuz escalation regime — instances during active Hormuz closure (Feb 28 – Apr 7, 2026) produced systematically different outcomes than instances outside it. The 38-day crisis acted as a regime filter that the playbooks could not internalize from their own calibration data alone.
The epistemic rule: when a failure repeats across unrelated playbooks, it is evidence of a regime, not a noise tail. Surface it as a known failure condition in every affected playbook’s YAML, not just the one where it was first observed.
Calibration Regimes as Epistemic Status
The three calibration regimes (see ONTOLOGY.md section 8) are not administrative categories — they are epistemic status markers that shape how the LLM consumer should reason about a posterior.
| Regime | Epistemic status | Consumer reasoning |
|---|---|---|
| 1. Broad-market discrimination | Strong signal. Calibrated against the YTD non-instance pool with enough data to produce real likelihood ratios. | Treat a firing posterior as a directional signal with sizing weight. |
| 2. Within-population discrimination | Weak context flag. Calibrated only within the conditioned population by outcome, with small samples and transparent uncertainty. | Treat a firing posterior as context, not conviction. Do not size on it alone. |
| 3. Live-capture only | Epistemic placeholder. Pattern exists in the world but not yet in the data we can see. | Do not call the playbook. It lives in the Research Backlog until live capture produces enough instances. |
A Regime 1 playbook firing at 70% and a Regime 2 playbook firing at 70% are not equivalent knowledge claims. The Regime 1 posterior is a confident statement about a well-calibrated pattern. The Regime 2 posterior is a low-confidence context flag that happened to exceed 50%. Conflating them would be an epistemic category error.
Online Calibration Tracking
Playbook calibration is not a one-time act. As new instances accumulate in live capture, the historical instance pool grows, and the likelihood ratios and base rates should be recomputed. A playbook that was well-calibrated at YTD=67 trading days may drift as the sample reaches 150 or 300.
Sub-task F of the VST module build (online feedback loop) exists for exactly this reason: to detect when live-observed outcomes diverge from the playbook’s published posteriors, and to flag the playbook for recalibration. The epistemic rule is the same as for prompt regime isolation — when the world changes, stop trusting the old math.
Conflict Rules
When agents disagree, Ghost follows strict rules:
Valid Conflicts
A conflict is when Director A says bullish and Director B says bearish on the same aspect. Different aspects are complementary data, not conflicts.
| Valid Conflict | Not a Conflict |
|---|---|
| News bullish on earnings vs Technical bearish on extension | News bullish on catalyst vs Technical bearish on MA stack |
| Fib says support vs VWAP says distribution | Different agents, different domains |
Conflict Reporting Rules
- Only cite real agents — No invented sources
- Quote actual data — “Sector News: +2.03% vs XLU” not “some agents disagree”
- Note timeframe differences — If conclusions differ due to lookback period, say so
- Empty is valid — Don’t fabricate conflicts to fill a section
The 50% Principle
Not all numbers are equal. The 50% retracement is reference only, not structural.
| Level | Status | Usage |
|---|---|---|
| 38.2% | Structural | List in key S/R |
| 50% | Reference | Describe location only (“between 50% and 61.8%“) |
| 61.8% | Structural (Golden) | Always list — the real floor/ceiling |
See FIB_HERMENEUTICS.md for full treatment.
Related Documentation
ONTOLOGY.md— What the concepts areETHICS.md— What Ghost must and must not doPRAXIS.md— How knowledge becomes actionINTERROGATION.md— Prompt signal density evaluation processOPTIONS_HERMENEUTICS.md— Day-of-week interpretation frameworkPEDAGOGICAL.md— How to learn this system