The Content Engine: Multi-Agent Pipeline That Produced 4,884 Articles
TLDR
A 5-agent content pipeline: research, draft, score, revise, and publish. The system produced 4,884 articles across one production run and 34 comparison blog posts in a single day — with ~90% passing quality review on the first draft.
But the agents aren't the impressive part. The context layer underneath them is. 4,884 articles. 34 blog posts in one day. And the difference between those posts and AI slop came down to a handful of documents that every agent reads before writing a single word: messaging framework, brand voice guide, competitive positioning, product knowledge. Without those docs, you get content that sounds like a chatbot trying to impress a marketing director. With them, you get content that sounds like your company actually wrote it.
What you get:
- Research agent that expands outlines with real data, stats, and examples
- Draft agent that writes in your brand voice (not generic AI voice)
- Scoring agent with a rubric that catches AI-sounding patterns
- Human-in-the-loop revision workflow (not fully automated — and that's on purpose)
- Publish queue with human approval gate before anything goes live
Not a builder? I build production versions of this system for a flat fee — context layer calibrated to your brand, agents tuned to your voice, quality gates included. Book a 30-minute scoping call to get a build spec. Or keep reading for the full breakdown.
Step 0: Build Your Context Layer
This is 2-4 hours of work. It's the difference between AI slop and content that sounds like your company.
The context layer is a collection of documents that every agent reads before doing any work. Think of it as the difference between telling someone "write a blog post about X" and handing them your brand guide, messaging framework, competitive positioning, and product docs then telling them "write a blog post about X."
When I built this for the 4,884-article run, the context layer took a Saturday afternoon. Every article after that inherited the same voice, positioning, and accuracy guardrails. The per-post cost of that context investment approaches zero at scale.
What goes in the context layer:
context_layer:
messaging.md: "Core messaging framework — one-liner, value props, positioning, proof points"
product-knowledge.md: "What your product does — features, use cases, limitations (prevents unsupported claims)"
competitive.md: "How you compare to alternatives — honest, not hype"
brand-voice.md: "How you sound — with do's, don'ts, and before/after examples"
customer-pain.md: "Real pain points from real conversations"
terminology.md: "Approved terms, forbidden terms, jargon glossary"
Messaging Framework Template
# Messaging Framework
## One-liner
[Your product] helps [target audience] [achieve outcome] by [mechanism].
## Core value propositions
1. [Value prop 1] — [supporting proof point]
2. [Value prop 2] — [supporting proof point]
3. [Value prop 3] — [supporting proof point]
## Positioning statement
For [target buyer] who [pain point], [product] is the [category]
that [key differentiator]. Unlike [alternative], we [unique advantage].
## Proof points
- [Specific metric or customer result]
- [Specific metric or customer result]
## What we are NOT
- We are not [common misconception]
- We do not [thing people assume we do]
Brand Voice Template
This is the single most impactful file in the context layer. When the brand voice doc says "no transition stuffing," the agent actually follows it.
# Brand Voice
## Identity
We sound like: a smart colleague explaining something over coffee.
We do NOT sound like: a marketing brochure trying to impress.
## Sentence rules
- First sentence of every paragraph: under 7 words
- Mix lengths: 20% short (1-4 words), 50% medium (5-7), 30% long (8-15)
- Active voice. "We built X" not "X was built by us"
- 80-90% one- and two-syllable words
## Vocabulary
DO: helps, works with, is, built, runs, handles
DON'T: empowers, seamlessly integrates, serves as, leverages, drives
## Paragraph rules
- Max 4 sentences per paragraph
- Lead with the point
- One idea per paragraph
## Forbidden patterns
- "Dive into," "delve into," "game-changing," "leverage"
- "Furthermore," "Moreover," "Additionally"
- "Innovative," "cutting-edge," "groundbreaking"
- Exclamation points (almost never)
- Rhetorical questions as transitions
## Before/After examples
BAD: "Our innovative pipeline leverages cutting-edge AI to process prospects at scale."
GOOD: "The pipeline processes 500 prospects in 20 minutes."
BAD: "For optimal performance with larger datasets, we recommend leveraging our batch processing capabilities."
GOOD: "This breaks at 10,000 rows. Use batch processing instead."
These examples do more than all the rules combined. The LLM learns by example — three good before/after pairs are worth a page of instructions.
<!-- INSERT IMAGE: CONTEXT LAYER STRUCTURE — Visual showing docs (messaging, brand voice, competitive, product knowledge) feeding into all agents (see prompts/content-engine-visuals.md #2) --> <!-- INSERT IMAGE: BEFORE/AFTER — "AI content without context layer" (generic, bland) vs. "with context layer" (branded, specific) (see prompts/content-engine-visuals.md #5) -->Competitive Positioning Template
# Competitive Landscape
## [Competitor A]
- What they do well: [honest assessment]
- Where they fall short: [specific, not generic]
- Our angle: [why someone would choose us instead]
- When to recommend them: [yes, sometimes they're the better fit]
## [Competitor B]
[Same structure]
Step 1: Research Agent
Takes a topic outline and expands it with real data — stats, examples, competitor angles, and source citations. The research agent reads your product knowledge and competitive positioning docs to focus its research on what matters for your messaging.
Agent prompt:
You are a research agent for a content team.
CONTEXT DOCS (read before doing anything):
- [messaging.md injected here]
- [competitive.md injected here]
- [product-knowledge.md injected here]
TOPIC: {topic}
TARGET KEYWORDS: {keywords}
TASKS:
1. Find 5-8 data points (stats, studies, benchmarks) relevant to the topic.
Prefer recent data (last 2 years). Include source URLs.
2. Identify 2-3 competitor angles — how competitors position on this topic.
Reference our competitive positioning doc for context.
3. Find 2-3 real customer examples or case studies.
4. Identify common objections or misconceptions.
5. List 3-5 secondary keywords for SEO.
RULES:
- Do NOT fabricate stats. If you can't find a real data point, say so.
- Focus research on angles that support our positioning (see messaging doc).
- Every stat needs a source URL.
OUTPUT: Structured research brief with data points, competitor angles,
stats, objections, secondary keywords, and source list.
When the research agent has the competitive positioning doc, it focuses on angles that differentiate you. Without it, you get generic industry research that could belong to any company.
Step 2: Draft Agent
Takes the research brief + brand voice system prompt + messaging framework and produces a full draft. Structure is enforced in the prompt.
Agent prompt:
You are a content writer.
=== BRAND VOICE (follow exactly) ===
[brand-voice.md injected here]
=== MESSAGING FRAMEWORK ===
[messaging.md injected here]
=== RESEARCH BRIEF ===
{research_brief}
STRUCTURE:
1. H1 title — specific, keyword-rich, under 60 characters
2. Opening hook — 2-3 sentences that state the problem. No throat-clearing.
3. 4-6 H2 sections, each with 2-3 H3 subsections
4. Integrate data points naturally (not dumped in a list)
5. Conclusion with ONE clear CTA
6. Target: 3,500 words (+/- 300)
SENTENCE STRUCTURE (critical for sounding human):
- 20% short sentences (under 10 words)
- 50% medium (10-25 words)
- 30% long (25+ words)
- Never more than 2 short sentences in a row
- Never more than 1 long sentence in a row
FORBIDDEN:
- Bullet-heavy sections (use flowing prose)
- "In conclusion" or "To summarize"
- Generic benefit statements without proof
- Starting paragraphs with "When it comes to..."
The brand voice doc is the backbone here. Without it, the draft agent defaults to that vaguely enthusiastic, slightly robotic tone that screams "AI wrote this." With it, the first draft already sounds right 90% of the time.
Step 3: Scoring Agent
Automated quality gate. Scores every draft against a rubric and flags specific issues. This is where you catch AI-sounding patterns before a human has to.
Quality rubric:
| Criterion | Weight | Pass | Fail |
|---|---|---|---|
| Readability | 20 | Mixed sentence lengths, natural flow | Monotone, choppy, jargon walls |
| SEO | 15 | Keyword in H1 + first 100 words + 2 H2s | Keyword stuffing or missing from key positions |
| Accuracy | 15 | All stats sourced, fair comparisons | Unsourced claims, strawman arguments |
| Brand Voice | 20 | Matches guide, sounds human, active voice | AI-sounding phrases, forbidden words |
| Structure | 15 | Clear hierarchy, strong hook, CTA conclusion | Flat structure, weak opening, no CTA |
| Messaging | 15 | Value props correct, positioning aligned | Off-brand claims, wrong positioning |
Passing threshold: 75/100
Agent prompt:
Score this blog post on a 100-point rubric. Be harsh.
=== BRAND VOICE GUIDE ===
[brand-voice.md injected here]
=== MESSAGING FRAMEWORK ===
[messaging.md injected here]
=== BLOG POST ===
{draft}
DEDUCTIONS:
- -5 per "Additionally/Furthermore/Moreover"
- -3 per "dive into/delve/leverage"
- -5 per unsourced stat
- -5 per strawman comparison
- -10 per factual error
- -5 per paragraph that "sounds like AI"
OUTPUT: Scores per category, total, pass/fail, line-level feedback,
and low-confidence sections flagged for human review.
The "sounds human" test: Read a paragraph aloud. If you cringe, or if any sentence makes you think "an AI wrote this," flag it for human review regardless of score.
<!-- INSERT IMAGE: QUALITY RUBRIC DASHBOARD — 6 scoring categories with weights visualized, threshold line at 75 (see prompts/content-engine-visuals.md #4) -->Step 4: Human Review
The scorer catches 80%. The human catches the remaining 20% — tone-deafness, factual nuance, brand judgment calls that no rubric can encode.
Revision flow:
Draft → Scoring Agent evaluates
IF score ≥ 75 AND no low-confidence flags → human approval queue
IF score ≥ 75 BUT low-confidence sections → human reviews flagged sections
IF score < 75 → specific feedback sent back to Draft Agent
→ Draft Agent revises (fixing only what's called out)
→ Scoring Agent re-evaluates
→ Max 2 automated revision cycles
→ After 2 failed revisions → human takes over regardless of score
Why cap at 2 revisions? Drafts that don't pass after 2 revisions have structural problems that more LLM passes won't fix. The feedback loop oscillates — the agent fixes one thing and breaks another. A human needs to step in. I learned this the hard way during the 34-post run: three posts got stuck in revision loops, and manual intervention resolved them in minutes.
<!-- INSERT IMAGE: REVISION FEEDBACK LOOP — Scorer → below threshold → back to Drafter → re-score circular flow (see prompts/content-engine-visuals.md #3) -->Step 5: Publish Queue
Posts land in a staging queue. Not directly to live. A human reviews the batch, approves or rejects each post, and scheduling handles the rest.
The flow:
Agent stages post with metadata (title, slug, meta description, tags)
→ Human reviews in staging queue
→ Human approves / rejects / edits
→ Approved posts pushed to CMS as drafts
→ Human sets publish schedule
→ Posts go live
Nothing goes live without a human pressing the button. The agents do the heavy lifting — research, drafting, scoring, revision. The human makes the judgment calls — this reads right, this needs another pass, this one's dead.
During the 34-post run, 31 posts passed through the queue with minor edits. 3 needed substantial rework. That's a 91% first-pass rate — and the 3 that needed work would've been caught immediately by any editor who knows the brand.
<!-- INSERT IMAGE: HUMAN-IN-THE-LOOP FLOW — Agent flags sections → Human reviews → Approve/Reject/Edit → Publish or revise (see prompts/content-engine-visuals.md #6) -->What the Playbook Doesn't Cover
This playbook gives you the architecture. You could build it yourself.
Here's what the playbook doesn't give you:
- Calibration. The scoring thresholds took dozens of test runs across 4,884 articles. The playbook gives you the final numbers, not the 40+ hours of testing that produced them.
- Edge cases. The Pro Tips below cover the highlights. Production means handling hundreds of edge cases — malformed outputs, API rate limits, model regressions, CMS quirks.
- Integration. Wiring this into YOUR stack — your CMS, your brand docs, your approval workflows. Your version will be different from every other version I've built.
- Maintenance. Model updates break prompts. Brand guidelines evolve. Scoring rubrics need recalibration. The pipeline needs ongoing tuning.
Most teams that DIY this spend 60-100 hours building 70% of what the production system does. Then they hire someone to fix the last 30%.
I build production content engines for $5,000-$12,000, flat fee. You own everything. No subscriptions. No vendor lock-in. The scoping call is free — you'll walk away with a build spec and a quote, even if you never hire me.
Currently booking for March 2026. I take on 2-3 builds per month.
Book a 30-minute scoping call →
Pro Tips & Pitfalls
The context layer is a one-time investment. 2-4 hours to write. After that, every post inherits those docs. The per-post cost of that context approaches zero at scale.
The model matters less than the context. A mediocre model with great context beats a great model with no context every time. I've tested this — the context layer is the variable that moves the needle.
Human review is non-negotiable at scale. The scoring agent catches mechanical issues. It can't catch tone-deafness, factual nuance, or brand judgment. A human can.
Train your reviewer to spot AI slop. The tells: unnecessarily complex sentences, hedge phrases ("it's worth noting that"), transition stuffing ("Furthermore," "Additionally"), and technically-correct-but-soulless writing.
Know when to kill a draft. If it fails scoring twice and the reviewer isn't excited, kill it. Publishing mediocre content damages your brand more than not publishing at all.
Don't skip the staging queue. One off-brand post that slips through erodes the trust you've built with every good post before it.
Tools & Links
| Tool | Purpose | Pricing |
|---|---|---|
| Anthropic API | LLM agents (Claude) | Usage-based |
| Exa | Web search for research agent | Usage-based |
| WordPress REST API | Publishing | Free (self-hosted) |
| Google Docs API | Alternative publishing | Free |
| Firecrawl | Web scraping for competitive research | Free tier / paid |
The Math
Production Results
| Run | Output | Timeline |
|---|---|---|
| Content Publisher | 4,884 articles (integration guides, reporting playbooks, technical walkthroughs) | Node.js, 4 concurrent workers |
| Content Engine | 34 comparison blog posts (3,500-4,500 words each) | Single day, 5-agent pipeline |
Cost Per Post
| Item | Cost Per Post | At 34 Posts | At 500+ Posts |
|---|---|---|---|
| LLM API (research + draft + scoring + revision) | ~$0.50-1.00 | ~$25 | ~$375 |
| Web search API | ~$0.10 | ~$3.50 | ~$50 |
| Human review (15-30 min/post) | Varies | ~17 hours | Batch review |
| Total marginal cost | ~$0.60-1.10 | ~$30 | ~$0.85/post |
vs. Traditional Content Team
A traditional content team (2-3 writers + editor + SEO specialist) costs $400-600K/year for 200-400 posts. This pipeline: ~$5-10K/year in API costs for 500+ posts, plus reviewer time.
The 34-post run would have taken a writer 2-3 months. The pipeline did it in a day. The real cost is building the context layer: 2-4 hours, one time. After that, marginal cost per post is under $1.
The math isn't about replacing writers. It's about turning one person with good judgment into a content team.
Want This Built for Your Business?
4,884 articles. 34 blog posts in a single day. $0.60 per post. Those aren't projections — they're production results.
I build content engines like this one for businesses that need consistent, on-brand content at scale. Here's what the scoping call covers:
- Your content goals — volume, voice, channels, and cadence
- A build spec — exactly what agents, context docs, and integrations you'd need
- A flat-rate quote — $5,000-$12,000 depending on complexity. You own everything.
The call is 30 minutes. No pitch. If I can help, I'll tell you what it costs. If I can't, I'll tell you that too.
"Julian takes clear ideas and turns them into fully executed AI go-to-market builds — with extreme ownership and zero ego." — Hannah Recker
Currently booking for March 2026. I take on 2-3 builds per month.