Prompt Interrogation Process
Status: Active eval tool
Location: scripts/interrogate/
Cost: 1 Gemini call per director (~$0.01 each)
What It Is
Prompt interrogation is how Ghost evaluates its own signal density. Instead of running a full analysis and guessing what the LLM used, we ask the LLM directly by sending it all normal inputs but replacing the output format with meta-questions about its own synthesis process.
The LLM receives exactly the same context it would in production — real KB data, real market data, real prompt instructions — but instead of producing a thesis, it classifies each input as ESSENTIAL, USEFUL, PASS-THROUGH, or NOISE.
Why It Matters
Prompt bloat is invisible. A section that was essential when written may be noise 3 months later because:
- Other sections now cover the same signal
- The output format changed and the field is no longer consumed
- The data source became unreliable and the section is always caveated away
The interrogation process makes this visible. When a director says “I receive dealer dynamics but classify it as NOISE — too speculative for my role,” that’s a concrete, evidence-based signal to cut or gate that section.
Key principle: The LLM that runs the prompt is the best judge of what it actually uses from that prompt.
The Classification Framework
Every input section and output field is classified into one of four categories:
| Category | Definition | Action |
|---|---|---|
| ESSENTIAL | Output changes materially without this | Keep as-is |
| USEFUL | Adds nuance, modifies conviction | Keep, consider compressing |
| PASS-THROUGH | Copied to output with minimal synthesis | Compress or restructure |
| NOISE | Received but rarely changes output | Cut or gate behind a threshold |
How To Run
Prerequisites
- Recent
ghost run TICKERso KB has fresh data - Gemini API credentials configured (same as normal Ghost runs)
Commands
# Run individual director interrogation
uv run python3 scripts/interrogate/interrogate_tech_director.py
uv run python3 scripts/interrogate/interrogate_news_director.py
uv run python3 scripts/interrogate/interrogate_mkt_structure.py
uv run python3 scripts/interrogate/interrogate_strat.py
# Run all four in parallel (recommended)
uv run python3 scripts/interrogate/interrogate_tech_director.py &
uv run python3 scripts/interrogate/interrogate_news_director.py &
uv run python3 scripts/interrogate/interrogate_mkt_structure.py &
uv run python3 scripts/interrogate/interrogate_strat.py &
wait
Each script:
- Loads the director’s real Jinja2 template
- Populates it with real KB/market data (same as a
ghost run) - Replaces the output format section with interrogation questions
- Sends to Gemini (1 call, ~$0.01)
- Prints the LLM’s self-assessment
Configuring the Ticker
Each script has TICKER = "NVDA" at the top. Change this to interrogate with a different ticker’s data.
Interrogation Structure
Each script asks 4-5 rounds of questions. The rounds follow a consistent pattern across all four directors:
Round 1: Input Triage
“For each input section, classify as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”
This identifies what the LLM actually references when synthesizing its output.
Round 2: Deep Dive
Specific questions about the highest-priority inputs. Example (Technical Director): “How many Fib confluence zones did you USE vs IGNORE? When Fib overlaps with MA, does Fib add anything?”
Round 3: Output Attribution
“Trace each output field back to its source inputs.” This reveals pass-through fields and genuine synthesis.
Round 4: Signal Compression
“What is the MINIMUM set of inputs for 90% of the same output? What’s redundant? What’s missing?”
Round 5: Minimum Viable Output / Redesign
“If you could redesign your input, what would it look like? What output fields could be cut?”
How to Interpret Results
High-Signal Findings
- Input classified NOISE by the director that receives it — strong signal to cut or gate
- Input classified PASS-THROUGH — candidate for pre-synthesis (compute upstream, send just the result)
- Duplicate content across sections — merge into one location
- “90% of output from N inputs” — the inputs NOT in that list are compression candidates
Low-Signal Findings
- “USEFUL” — vague, doesn’t tell you what to do. Ask follow-up in Round 2.
- “I would add X” — feature requests from the LLM. Evaluate skeptically.
Cross-Director Patterns
When multiple directors classify the same data:
- All say ESSENTIAL — critical path, don’t touch
- Mixed (ESSENTIAL + NOISE) — routing problem. Maybe only one director should get it.
- All say NOISE — cut it from the pipeline entirely
When to Run Interrogation
Run interrogation after any of these changes:
| Change | Why Re-Interrogate |
|---|---|
| Added/removed data sections from a prompt | New section might be noise; old sections might now be redundant |
| Changed output format | Fields that were essential might now be unused |
| Merged or restructured prompt sections | Verify the merge didn’t bury signal |
| Prompt regime bump | Baseline the new regime’s signal density |
| Quarterly review | Prompts accumulate cruft over time |
History
v5_signal_density_2026-03-05
4-round interrogation across all 4 directors. Key findings acted on:
| Finding | Source | Action |
|---|---|---|
| Max pain language rules duplicated in 2 places | MS Director | Deduplicated (-16 lines) |
| Dealer dynamics interpretation matrix = NOISE | MS Director | Removed (-7 lines) |
| Gamma flip rules duplicated | MS Director | Removed from Critical Rules |
| Max pain prompt space disproportionate | MS Director | Compressed evidence warning + examples |
agent_contributions, search_transparency = NOISE | News Director | Cut in Phase 1 |
| Fib: 70% referenced, individual level names = noise | Tech Director | Future: compact Fib output format |
| Strategist uses only 30-40% of inputs | Strategist | Phase 3: cut 10 NOISE sections (-178 lines) |
Phase 3 re-interrogation (post v5 KB refresh):
| Finding | Source | Action |
|---|---|---|
| Extension Analysis = NOISE | Strategist | Removed input section + output schema |
| Unfilled Gaps = NOISE (“rarely use gap analysis”) | Strategist | Removed input section + gap flags |
| Trendlines = NOISE (“don’t use trendline data”) | Strategist | Removed input section + trendline flags |
| Prediction Feedback = NOISE | Strategist | Removed input section + zone_validation_quality |
| Prior Session VWAP = NOISE | Strategist | Removed input section |
| Air Pockets = NOISE | Strategist | Removed from intraday input + output schema |
| Trapdoors = NOISE | Strategist | Removed from output schema |
| Volume analysis = NOISE | Strategist | Removed volume_signature from output schema |
| Gap analysis = NOISE | Strategist | Removed gap_status from output schema |
| Opening Range = NOISE | Strategist | Already absent from input rendering |
Phase 4 compactness interrogation (output schema compression):
| Finding | Source | Action |
|---|---|---|
| entry_paths duplicates entry_zones + scalp_target | Strategist | Entire section removed from LLM output, computed in full_stack.py |
| market_environment has 4 unused fields | Strategist | Cut vix_signal, ticker_vs_sector, environment_quality, environment_notes |
| directors_summary.market_structure.buying_type redundant with intraday | Strategist | Cut from schema |
| entry_zones.primary.sources redundant with why | Strategist | Merged into why field |
| flow_activation.flow_signals over-structured | Strategist | Flattened to {activated, signals_met} |
| absorption_level + absorption_ratio = 2 fields for 1 datum | Strategist | Merged to single absorption string |
| options_structure has 3 unused fields | Strategist | Cut tactical_gamma_flip, tactical_gamma_sign, sweep_classification |
| sweep_vwap_signal redundant with pattern | Strategist | Cut |
| key_signals unlimited, often 3+ per director | Strategist | Capped at 2 per director |
| flags unlimited, often 10+ | Strategist | Capped at 8 |
| RETEST/SWEEP flags = LLM arithmetic | Strategist | Generated from code-computed entry paths |
Result: 37 output fields → 24 fields. ~40% fewer output tokens per report.
Script Architecture
Each script follows the same pattern:
# 1. Load real data (KB + market tools)
fib = get_kb_entry(TICKER, "fib_analysis")
# ...
# 2. Render the real Jinja2 prompt
prompt = load_prompt("director/technical.jinja2", ticker=TICKER, ...)
# 3. Find the output format section and replace it
marker = "## Output Format"
context_only = prompt[:prompt.find(marker)]
full_prompt = context_only + INTERROGATION
# 4. Send to Gemini (1 call)
response = client.models.generate_content(model=..., contents=full_prompt)
The key insight: the LLM sees exactly the same context it would in production. The only difference is the final instruction — “classify your inputs” instead of “produce a thesis.”
Files
scripts/interrogate/
interrogate_tech_director.py # Technical Director (Fib/MA/VWAP/Momentum inputs)
interrogate_news_director.py # STALE — News Director replaced by News Analyst (single agent)
interrogate_mkt_structure.py # STALE — Market Structure Director replaced by mechanical rules
interrogate_strat.py # Market Aggregator/Strategist (signal density)
interrogate_strat_compact.py # Market Aggregator/Strategist (output compactness)
Related Documentation
ONTOLOGY.md— Signal density classification is part of the conceptual frameworkEPISTEMOLOGY.md— Interrogation is a verification mechanism (how Ghost validates its own knowledge formation)CONDITION_ACCURACY.md— Feedback loop measures prediction outcomes; interrogation measures prompt signal qualityPRAXIS.md— Interrogation bridges the gap between prompt design (theory) and LLM behavior (practice)