$

Prompt Interrogation Process

Status: Active eval tool Location: scripts/interrogate/ Cost: 1 Gemini call per director (~$0.01 each)


What It Is

Prompt interrogation is how Ghost evaluates its own signal density. Instead of running a full analysis and guessing what the LLM used, we ask the LLM directly by sending it all normal inputs but replacing the output format with meta-questions about its own synthesis process.

The LLM receives exactly the same context it would in production — real KB data, real market data, real prompt instructions — but instead of producing a thesis, it classifies each input as ESSENTIAL, USEFUL, PASS-THROUGH, or NOISE.


Why It Matters

Prompt bloat is invisible. A section that was essential when written may be noise 3 months later because:

The interrogation process makes this visible. When a director says “I receive dealer dynamics but classify it as NOISE — too speculative for my role,” that’s a concrete, evidence-based signal to cut or gate that section.

Key principle: The LLM that runs the prompt is the best judge of what it actually uses from that prompt.


The Classification Framework

Every input section and output field is classified into one of four categories:

CategoryDefinitionAction
ESSENTIALOutput changes materially without thisKeep as-is
USEFULAdds nuance, modifies convictionKeep, consider compressing
PASS-THROUGHCopied to output with minimal synthesisCompress or restructure
NOISEReceived but rarely changes outputCut or gate behind a threshold

How To Run

Prerequisites

Commands

# Run individual director interrogation
uv run python3 scripts/interrogate/interrogate_tech_director.py
uv run python3 scripts/interrogate/interrogate_news_director.py
uv run python3 scripts/interrogate/interrogate_mkt_structure.py
uv run python3 scripts/interrogate/interrogate_strat.py

# Run all four in parallel (recommended)
uv run python3 scripts/interrogate/interrogate_tech_director.py &
uv run python3 scripts/interrogate/interrogate_news_director.py &
uv run python3 scripts/interrogate/interrogate_mkt_structure.py &
uv run python3 scripts/interrogate/interrogate_strat.py &
wait

Each script:

  1. Loads the director’s real Jinja2 template
  2. Populates it with real KB/market data (same as a ghost run)
  3. Replaces the output format section with interrogation questions
  4. Sends to Gemini (1 call, ~$0.01)
  5. Prints the LLM’s self-assessment

Configuring the Ticker

Each script has TICKER = "NVDA" at the top. Change this to interrogate with a different ticker’s data.


Interrogation Structure

Each script asks 4-5 rounds of questions. The rounds follow a consistent pattern across all four directors:

Round 1: Input Triage

“For each input section, classify as ESSENTIAL / USEFUL / PASS-THROUGH / NOISE.”

This identifies what the LLM actually references when synthesizing its output.

Round 2: Deep Dive

Specific questions about the highest-priority inputs. Example (Technical Director): “How many Fib confluence zones did you USE vs IGNORE? When Fib overlaps with MA, does Fib add anything?”

Round 3: Output Attribution

“Trace each output field back to its source inputs.” This reveals pass-through fields and genuine synthesis.

Round 4: Signal Compression

“What is the MINIMUM set of inputs for 90% of the same output? What’s redundant? What’s missing?”

Round 5: Minimum Viable Output / Redesign

“If you could redesign your input, what would it look like? What output fields could be cut?”


How to Interpret Results

High-Signal Findings

Low-Signal Findings

Cross-Director Patterns

When multiple directors classify the same data:


When to Run Interrogation

Run interrogation after any of these changes:

ChangeWhy Re-Interrogate
Added/removed data sections from a promptNew section might be noise; old sections might now be redundant
Changed output formatFields that were essential might now be unused
Merged or restructured prompt sectionsVerify the merge didn’t bury signal
Prompt regime bumpBaseline the new regime’s signal density
Quarterly reviewPrompts accumulate cruft over time

History

v5_signal_density_2026-03-05

4-round interrogation across all 4 directors. Key findings acted on:

FindingSourceAction
Max pain language rules duplicated in 2 placesMS DirectorDeduplicated (-16 lines)
Dealer dynamics interpretation matrix = NOISEMS DirectorRemoved (-7 lines)
Gamma flip rules duplicatedMS DirectorRemoved from Critical Rules
Max pain prompt space disproportionateMS DirectorCompressed evidence warning + examples
agent_contributions, search_transparency = NOISENews DirectorCut in Phase 1
Fib: 70% referenced, individual level names = noiseTech DirectorFuture: compact Fib output format
Strategist uses only 30-40% of inputsStrategistPhase 3: cut 10 NOISE sections (-178 lines)

Phase 3 re-interrogation (post v5 KB refresh):

FindingSourceAction
Extension Analysis = NOISEStrategistRemoved input section + output schema
Unfilled Gaps = NOISE (“rarely use gap analysis”)StrategistRemoved input section + gap flags
Trendlines = NOISE (“don’t use trendline data”)StrategistRemoved input section + trendline flags
Prediction Feedback = NOISEStrategistRemoved input section + zone_validation_quality
Prior Session VWAP = NOISEStrategistRemoved input section
Air Pockets = NOISEStrategistRemoved from intraday input + output schema
Trapdoors = NOISEStrategistRemoved from output schema
Volume analysis = NOISEStrategistRemoved volume_signature from output schema
Gap analysis = NOISEStrategistRemoved gap_status from output schema
Opening Range = NOISEStrategistAlready absent from input rendering

Phase 4 compactness interrogation (output schema compression):

FindingSourceAction
entry_paths duplicates entry_zones + scalp_targetStrategistEntire section removed from LLM output, computed in full_stack.py
market_environment has 4 unused fieldsStrategistCut vix_signal, ticker_vs_sector, environment_quality, environment_notes
directors_summary.market_structure.buying_type redundant with intradayStrategistCut from schema
entry_zones.primary.sources redundant with whyStrategistMerged into why field
flow_activation.flow_signals over-structuredStrategistFlattened to {activated, signals_met}
absorption_level + absorption_ratio = 2 fields for 1 datumStrategistMerged to single absorption string
options_structure has 3 unused fieldsStrategistCut tactical_gamma_flip, tactical_gamma_sign, sweep_classification
sweep_vwap_signal redundant with patternStrategistCut
key_signals unlimited, often 3+ per directorStrategistCapped at 2 per director
flags unlimited, often 10+StrategistCapped at 8
RETEST/SWEEP flags = LLM arithmeticStrategistGenerated from code-computed entry paths

Result: 37 output fields → 24 fields. ~40% fewer output tokens per report.


Script Architecture

Each script follows the same pattern:

# 1. Load real data (KB + market tools)
fib = get_kb_entry(TICKER, "fib_analysis")
# ...

# 2. Render the real Jinja2 prompt
prompt = load_prompt("director/technical.jinja2", ticker=TICKER, ...)

# 3. Find the output format section and replace it
marker = "## Output Format"
context_only = prompt[:prompt.find(marker)]
full_prompt = context_only + INTERROGATION

# 4. Send to Gemini (1 call)
response = client.models.generate_content(model=..., contents=full_prompt)

The key insight: the LLM sees exactly the same context it would in production. The only difference is the final instruction — “classify your inputs” instead of “produce a thesis.”


Files

scripts/interrogate/
  interrogate_tech_director.py    # Technical Director (Fib/MA/VWAP/Momentum inputs)
  interrogate_news_director.py    # STALE — News Director replaced by News Analyst (single agent)
  interrogate_mkt_structure.py    # STALE — Market Structure Director replaced by mechanical rules
  interrogate_strat.py            # Market Aggregator/Strategist (signal density)
  interrogate_strat_compact.py    # Market Aggregator/Strategist (output compactness)