Back to Portfolio Workflow Augmentation

I Was Asking AI to Do the Wrong Job

A 5-month journey from manual competitive intelligence to a working augmentation system

The Problem

Every week I spent hours searching for competitive intelligence. Running Google searches individually, synthesizing what I found, writing it up, sharing it with the team, and still wondering if I'd caught everything. The loop was slow. The coverage was spotty. If a story broke on a day I wasn't searching, I missed it.

What I Tried First

First I built a web scraper with AI relevancy scoring. Sources were unreliable and quality was impossible to control. Then I built an orchestration layer: semantic search, automated analysis, brief generation, all in one run. I shared it with my VP. The sources didn't hold up under scrutiny. We couldn't ask follow-up questions. There was no mechanism for it to improve over time. Two different architectures, same fundamental problem.

"I was asking AI to do the wrong job."

The Insight

The breakthrough wasn't better automation. It was a different question: what should AI do, and what should humans do? Map the division of labor. Let trusted publications curate through their editorial process. Let users signal what matters through simple likes and dislikes. Move AI from the judgment seat to the production seat: given what the user cares about, write the brief in the format leadership already reads.

What Makes It Interesting

ML scoring without API costs

Embedding-based cosine similarity against user preference centroids in pgvector. Free, unlimited, no rate limits. The first scoring layer that filters thousands of stories before any LLM call.

Curation agents with forced rejection

Version 1 of the scoring prompt: every story came back as a 4 or 5. The LLM defaulted to justifying everything. The fix wasn't a better prompt. It was a structural redesign: the model must produce a rejection list with per-story reasoning. Forced to argue against inclusion, not justify it.

Research depth on demand

Go deeper on any story, ask follow-up questions, get context. The thing the orchestration layer couldn't do. The human decides what's worth investigating. AI does the investigation.

"So what?" built into the output

Most AI summarization stops at "here's what happened." Briefs connect stories to strategic implications: what it means for competitive positioning, what questions leadership should be asking, what to watch next. That translation is encoded in vertical-specific guide files that capture how the organization thinks about each domain.

The Sycophancy Fix

The structural redesign was the forced rejection mechanism. The model must produce a full rejection list with per-story reasoning. "For every story you reject, explain why it doesn't make the cut." This flips the default: instead of justifying inclusion, the model has to argue against it. Same principle as "argue against your own position" in debate.

Production data revealed the fix wasn't complete. The model curated a story it recognized as irrelevant in its own reasoning: it labeled "Lamar Jackson" as a sports story, then included it anyway. Fix: a structural constraint ensuring reasoning classification matches output placement. The pattern: treat prompt engineering like debugging code. Isolate the failure mode, write a test case, apply a targeted constraint.

5 months
of iteration
3
Architectures
10hrs → 30min
Monthly
7,874
Stories Processed

What I'd Do Differently

Start with the augmentation question: what should AI do vs what should humans do. I built two full systems before asking it.

Try It

Health tech version. Same architecture, different vertical.

Open live demo
Built with React + TypeScript · Supabase + pgvector · Deno Edge Functions · Gemini · Vitest
Back to Portfolio