Prompt Interrogation Process

Status: Active eval tool Location: scripts/interrogate/ Cost: 1 Gemini call per director (~$0.01 each)

What It Is

Prompt interrogation is how Ghost evaluates its own signal density. Instead of running a full analysis and guessing what the LLM used, we ask the LLM directly by sending it all normal inputs but replacing the output format with meta-questions about its own synthesis process.

The LLM receives exactly the same context it would in production — real KB data, real market data, real prompt instructions — but instead of producing a thesis, it classifies each input as ESSENTIAL, USEFUL, PASS-THROUGH, or NOISE.

Why It Matters

Prompt bloat is invisible. A section that was essential when written may be noise 3 months later because:

Other sections now cover the same signal
The output format changed and the field is no longer consumed
The data source became unreliable and the section is always caveated away

The interrogation process makes this visible. When a director says “I receive dealer dynamics but classify it as NOISE — too speculative for my role,” that’s a concrete, evidence-based signal to cut or gate that section.

Key principle: The LLM that runs the prompt is the best judge of what it actually uses from that prompt.

The Classification Framework

Every input section and output field is classified into one of four categories:

Category	Definition	Action
ESSENTIAL	Output changes materially without this	Keep as-is
USEFUL	Adds nuance, modifies conviction	Keep, consider compressing
PASS-THROUGH	Copied to output with minimal synthesis	Compress or restructure
NOISE	Received but rarely changes output	Cut or gate behind a threshold

How To Run

Prerequisites

Recent ghost run TICKER so KB has fresh data
Gemini API credentials configured (same as normal Ghost runs)

Commands

# Run individual director interrogation
uv run python3 scripts/interrogate/interrogate_tech_director.py
uv run python3 scripts/interrogate/interrogate_news_director.py
uv run python3 scripts/interrogate/interrogate_mkt_structure.py
uv run python3 scripts/interrogate/interrogate_strat.py

# Run all four in parallel (recommended)
uv run python3 scripts/interrogate/interrogate_tech_director.py &
uv run python3 scripts/interrogate/interrogate_news_director.py &
uv run python3 scripts/interrogate/interrogate_mkt_structure.py &
uv run python3 scripts/interrogate/interrogate_strat.py &
wait

Each script:

Loads the director’s real Jinja2 template
Populates it with real KB/market data (same as a ghost run)
Replaces the output format section with interrogation questions
Sends to Gemini (1 call, ~$0.01)
Prints the LLM’s self-assessment

Configuring the Ticker

Each script has TICKER = "NVDA" at the top. Change this to interrogate with a different ticker’s data.

Interrogation Structure

Each script asks 4-5 rounds of questions. The rounds follow a consistent pattern across all four directors:

Round 1: Input Triage

“For each input section, classify as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”

This identifies what the LLM actually references when synthesizing its output.

Round 2: Deep Dive

Specific questions about the highest-priority inputs. Example (Technical Director): “How many Fib confluence zones did you USE vs IGNORE? When Fib overlaps with MA, does Fib add anything?”

Round 3: Output Attribution

“Trace each output field back to its source inputs.” This reveals pass-through fields and genuine synthesis.

Round 4: Signal Compression

“What is the MINIMUM set of inputs for 90% of the same output? What’s redundant? What’s missing?”

Round 5: Minimum Viable Output / Redesign

“If you could redesign your input, what would it look like? What output fields could be cut?”

How to Interpret Results

High-Signal Findings

Input classified NOISE by the director that receives it — strong signal to cut or gate
Input classified PASS-THROUGH — candidate for pre-synthesis (compute upstream, send just the result)
Duplicate content across sections — merge into one location
“90% of output from N inputs” — the inputs NOT in that list are compression candidates

Low-Signal Findings

“USEFUL” — vague, doesn’t tell you what to do. Ask follow-up in Round 2.
“I would add X” — feature requests from the LLM. Evaluate skeptically.

Cross-Director Patterns

When multiple directors classify the same data:

All say ESSENTIAL — critical path, don’t touch
Mixed (ESSENTIAL + NOISE) — routing problem. Maybe only one director should get it.
All say NOISE — cut it from the pipeline entirely

When to Run Interrogation

Run interrogation after any of these changes:

Change	Why Re-Interrogate
Added/removed data sections from a prompt	New section might be noise; old sections might now be redundant
Changed output format	Fields that were essential might now be unused
Merged or restructured prompt sections	Verify the merge didn’t bury signal
Prompt regime bump	Baseline the new regime’s signal density
Quarterly review	Prompts accumulate cruft over time

History

v5_signal_density_2026-03-05

4-round interrogation across all 4 directors. Key findings acted on:

Finding	Source	Action
Max pain language rules duplicated in 2 places	MS Director	Deduplicated (-16 lines)
Dealer dynamics interpretation matrix = NOISE	MS Director	Removed (-7 lines)
Gamma flip rules duplicated	MS Director	Removed from Critical Rules
Max pain prompt space disproportionate	MS Director	Compressed evidence warning + examples
`agent_contributions`, `search_transparency` = NOISE	News Director	Cut in Phase 1
Fib: 70% referenced, individual level names = noise	Tech Director	Future: compact Fib output format
Strategist uses only 30-40% of inputs	Strategist	Phase 3: cut 10 NOISE sections (-178 lines)

Phase 3 re-interrogation (post v5 KB refresh):

Finding	Source	Action
Extension Analysis = NOISE	Strategist	Removed input section + output schema
Unfilled Gaps = NOISE (“rarely use gap analysis”)	Strategist	Removed input section + gap flags
Trendlines = NOISE (“don’t use trendline data”)	Strategist	Removed input section + trendline flags
Prediction Feedback = NOISE	Strategist	Removed input section + zone_validation_quality
Prior Session VWAP = NOISE	Strategist	Removed input section
Air Pockets = NOISE	Strategist	Removed from intraday input + output schema
Trapdoors = NOISE	Strategist	Removed from output schema
Volume analysis = NOISE	Strategist	Removed volume_signature from output schema
Gap analysis = NOISE	Strategist	Removed gap_status from output schema
Opening Range = NOISE	Strategist	Already absent from input rendering

Phase 4 compactness interrogation (output schema compression):

Finding	Source	Action
entry_paths duplicates entry_zones + scalp_target	Strategist	Entire section removed from LLM output, computed in `full_stack.py`
market_environment has 4 unused fields	Strategist	Cut vix_signal, ticker_vs_sector, environment_quality, environment_notes
directors_summary.market_structure.buying_type redundant with intraday	Strategist	Cut from schema
entry_zones.primary.sources redundant with why	Strategist	Merged into why field
flow_activation.flow_signals over-structured	Strategist	Flattened to {activated, signals_met}
absorption_level + absorption_ratio = 2 fields for 1 datum	Strategist	Merged to single absorption string
options_structure has 3 unused fields	Strategist	Cut tactical_gamma_flip, tactical_gamma_sign, sweep_classification
sweep_vwap_signal redundant with pattern	Strategist	Cut
key_signals unlimited, often 3+ per director	Strategist	Capped at 2 per director
flags unlimited, often 10+	Strategist	Capped at 8
RETEST/SWEEP flags = LLM arithmetic	Strategist	Generated from code-computed entry paths

Result: 37 output fields → 24 fields. ~40% fewer output tokens per report.

Script Architecture

Each script follows the same pattern:

# 1. Load real data (KB + market tools)
fib = get_kb_entry(TICKER, "fib_analysis")
# ...

# 2. Render the real Jinja2 prompt
prompt = load_prompt("director/technical.jinja2", ticker=TICKER, ...)

# 3. Find the output format section and replace it
marker = "## Output Format"
context_only = prompt[:prompt.find(marker)]
full_prompt = context_only + INTERROGATION

# 4. Send to Gemini (1 call)
response = client.models.generate_content(model=..., contents=full_prompt)

The key insight: the LLM sees exactly the same context it would in production. The only difference is the final instruction — “classify your inputs” instead of “produce a thesis.”

Files

scripts/interrogate/
  interrogate_tech_director.py    # Technical Director (Fib/MA/VWAP/Momentum inputs)
  interrogate_news_director.py    # STALE — News Director replaced by News Analyst (single agent)
  interrogate_mkt_structure.py    # STALE — Market Structure Director replaced by mechanical rules
  interrogate_strat.py            # Market Aggregator/Strategist (signal density)
  interrogate_strat_compact.py    # Market Aggregator/Strategist (output compactness)

ONTOLOGY.md — Signal density classification is part of the conceptual framework
EPISTEMOLOGY.md — Interrogation is a verification mechanism (how Ghost validates its own knowledge formation)
CONDITION_ACCURACY.md — Feedback loop measures prediction outcomes; interrogation measures prompt signal quality
PRAXIS.md — Interrogation bridges the gap between prompt design (theory) and LLM behavior (practice)