Evident
An evidence-grounded AI decision system that ranks outreach targets, cites why, and explicitly refuses to recommend when confidence is too low, with bounded cost and a full audit trail.
The problem
LLMs tend to produce confident-sounding answers even when the evidence behind them is thin. For a system that decides who is worth contacting, that failure mode isn't just a bad output, it's a bad decision made at scale. Evident is built around the opposite principle: every decision is grounded in retrieved public evidence, the reasoning is exposed, and the system explicitly returns "insufficient evidence" rather than guessing. It also caps API spend so runs stay predictable.
What it does
- Takes a faculty/people directory URL plus a research interest as input.
- Returns a ranked shortlist of contacts, each with reasoning, cited evidence, and a personalized outreach draft.
- Produces three explicit outcomes per contact: recommended, not_recommended, or insufficient_evidence (an explicit refusal).
- Runs a deterministic pre-filter to remove weak candidates before spending model budget.
- Streams live progress to the UI via Server-Sent Events.
Approach & architecture
Ingestion is pluggable behind a ContactSource interface: the built-in deterministic faculty parser is one source, and a vendored, profile-driven scraper engine (evidence_scraper) is another, letting Evident pull contacts from many more site layouts than a single hardcoded parser. Contacts are cleaned and enriched with evidence chunks and identity scoring, then passed through a deterministic pre-filter. Only the survivors reach the LLM evaluation step. Uncertain contacts enter a bounded agentic loop: at most one adaptive retrieval pass and one re-evaluation, then the system finalizes, explainable, not open-ended.
how it fits together
Evidence-grounded decision pipeline
Retrieval-grounded evaluation that ranks outreach targets, cites its evidence, and refuses when support is too thin.
- Contact source (pluggable)Directory parser or evidence_scraper → RawContact[]: name, title, email, research text, evidence, identity signals.
- Clean & enrichDedup, evidence chunks, identity scoring.
- Deterministic pre-filterDrops weak candidates before any model spend.
- LLM evaluation · triage modelCheap first pass over the shortlist (Claude, retrieval-backed).
- Refuse-when-weak gateStructural floor enforced across every path — an over-confident model cannot upgrade a thin contact.
- Bounded loop · escalate uncertain≤1 adaptive retrieval + 1 re-eval on the primary model, then finalize.
- Hybrid rank → drafts (top only)AI fit + evidence strength + seniority · persist + full audit trail.
product screens




key engineering decisions
A single refuse-when-weak gate across every path
The uncertainty gate is applied uniformly across the LLM path, the heuristic fallback, and the second pass, so an over-confident model can't upgrade a thin contact to "recommended."
Cost-safe LLM usage
Explicit timeouts, automatic retries with backoff, a deterministic pre-filter to avoid wasted calls, and crash-safe response parsing. Per-run caps bound evaluations, drafts, retries, and outbound fetches.
Pluggable ingestion engine
A vendored profile-driven Playwright scraper (evidence_scraper) validated against 12+ real directory layouts across universities and law firms, with discovery hardening: timeouts, scope limits, a lean tool schema, anchor-text recovery, and SPA hydration waits.
Bounded agentic loop
At most one adaptive retrieval plus one re-evaluation before finalizing. This keeps the reasoning explainable and the cost predictable instead of allowing an open-ended agent to spiral.
results & outcomes
- Ranked targets with cited reasoning, explicit refusals on weak evidence, and roughly 60% fewer unnecessary model calls.
- Full auditability per contact: score breakdown, cited evidence, confidence justification, and decision-revision history.
- Deployed as a Docker image on AWS ECS/Fargate, run on-demand to control cost, with the API key injected via AWS Secrets Manager.
A short writeup on evidence grounding, the single refusal gate, and the bounded agentic loop.
