The Content Engine: Multi-Agent Pipeline That Produced 4,884 Articles

TLDR

A 5-agent content pipeline: research, draft, score, revise, and publish. The system produced 4,884 articles across one production run and 34 comparison blog posts in a single day — with ~90% passing quality review on the first draft.

But the agents aren't the impressive part. The context layer underneath them is. 4,884 articles. 34 blog posts in one day. And the difference between those posts and AI slop came down to a handful of documents that every agent reads before writing a single word: messaging framework, brand voice guide, competitive positioning, product knowledge. Without those docs, you get content that sounds like a chatbot trying to impress a marketing director. With them, you get content that sounds like your company actually wrote it.

What you get:

Research agent that expands outlines with real data, stats, and examples
Draft agent that writes in your brand voice (not generic AI voice)
Scoring agent with a rubric that catches AI-sounding patterns
Human-in-the-loop revision workflow (not fully automated — and that's on purpose)
Publish queue with human approval gate before anything goes live

Not a builder? I build production versions of this system for a flat fee — context layer calibrated to your brand, agents tuned to your voice, quality gates included. Book a 30-minute scoping call to get a build spec. Or keep reading for the full breakdown.

Step 0: Build Your Context Layer

This is 2-4 hours of work. It's the difference between AI slop and content that sounds like your company.

The context layer is a collection of documents that every agent reads before doing any work. Think of it as the difference between telling someone "write a blog post about X" and handing them your brand guide, messaging framework, competitive positioning, and product docs then telling them "write a blog post about X."

When I built this for the 4,884-article run, the context layer took a Saturday afternoon. Every article after that inherited the same voice, positioning, and accuracy guardrails. The per-post cost of that context investment approaches zero at scale.

What goes in the context layer:

context_layer:
  messaging.md: "Core messaging framework — one-liner, value props, positioning, proof points"
  product-knowledge.md: "What your product does — features, use cases, limitations (prevents unsupported claims)"
  competitive.md: "How you compare to alternatives — honest, not hype"
  brand-voice.md: "How you sound — with do's, don'ts, and before/after examples"
  customer-pain.md: "Real pain points from real conversations"
  terminology.md: "Approved terms, forbidden terms, jargon glossary"

Messaging Framework Template

# Messaging Framework

## One-liner
[Your product] helps [target audience] [achieve outcome] by [mechanism].

## Core value propositions
1. [Value prop 1] — [supporting proof point]
2. [Value prop 2] — [supporting proof point]
3. [Value prop 3] — [supporting proof point]

## Positioning statement
For [target buyer] who [pain point], [product] is the [category]
that [key differentiator]. Unlike [alternative], we [unique advantage].

## Proof points
- [Specific metric or customer result]
- [Specific metric or customer result]

## What we are NOT
- We are not [common misconception]
- We do not [thing people assume we do]

Brand Voice Template

This is the single most impactful file in the context layer. When the brand voice doc says "no transition stuffing," the agent actually follows it.

# Brand Voice

## Identity
We sound like: a smart colleague explaining something over coffee.
We do NOT sound like: a marketing brochure trying to impress.

## Sentence rules
- First sentence of every paragraph: under 7 words
- Mix lengths: 20% short (1-4 words), 50% medium (5-7), 30% long (8-15)
- Active voice. "We built X" not "X was built by us"
- 80-90% one- and two-syllable words

## Vocabulary
DO: helps, works with, is, built, runs, handles
DON'T: empowers, seamlessly integrates, serves as, leverages, drives

## Paragraph rules
- Max 4 sentences per paragraph
- Lead with the point
- One idea per paragraph

## Forbidden patterns
- "Dive into," "delve into," "game-changing," "leverage"
- "Furthermore," "Moreover," "Additionally"
- "Innovative," "cutting-edge," "groundbreaking"
- Exclamation points (almost never)
- Rhetorical questions as transitions

## Before/After examples
BAD: "Our innovative pipeline leverages cutting-edge AI to process prospects at scale."
GOOD: "The pipeline processes 500 prospects in 20 minutes."

BAD: "For optimal performance with larger datasets, we recommend leveraging our batch processing capabilities."
GOOD: "This breaks at 10,000 rows. Use batch processing instead."

These examples do more than all the rules combined. The LLM learns by example — three good before/after pairs are worth a page of instructions.

Competitive Positioning Template

# Competitive Landscape

## [Competitor A]
- What they do well: [honest assessment]
- Where they fall short: [specific, not generic]
- Our angle: [why someone would choose us instead]
- When to recommend them: [yes, sometimes they're the better fit]

## [Competitor B]
[Same structure]

Step 1: Research Agent

Takes a topic outline and expands it with real data — stats, examples, competitor angles, and source citations. The research agent reads your product knowledge and competitive positioning docs to focus its research on what matters for your messaging.

Agent prompt:

You are a research agent for a content team.

CONTEXT DOCS (read before doing anything):
- [messaging.md injected here]
- [competitive.md injected here]
- [product-knowledge.md injected here]

TOPIC: {topic}
TARGET KEYWORDS: {keywords}

TASKS:
1. Find 5-8 data points (stats, studies, benchmarks) relevant to the topic.
   Prefer recent data (last 2 years). Include source URLs.
2. Identify 2-3 competitor angles — how competitors position on this topic.
   Reference our competitive positioning doc for context.
3. Find 2-3 real customer examples or case studies.
4. Identify common objections or misconceptions.
5. List 3-5 secondary keywords for SEO.

RULES:
- Do NOT fabricate stats. If you can't find a real data point, say so.
- Focus research on angles that support our positioning (see messaging doc).
- Every stat needs a source URL.

OUTPUT: Structured research brief with data points, competitor angles,
stats, objections, secondary keywords, and source list.

When the research agent has the competitive positioning doc, it focuses on angles that differentiate you. Without it, you get generic industry research that could belong to any company.

Step 2: Draft Agent

Takes the research brief + brand voice system prompt + messaging framework and produces a full draft. Structure is enforced in the prompt.

Agent prompt:

You are a content writer.

=== BRAND VOICE (follow exactly) ===
[brand-voice.md injected here]

=== MESSAGING FRAMEWORK ===
[messaging.md injected here]

=== RESEARCH BRIEF ===
{research_brief}

STRUCTURE:
1. H1 title — specific, keyword-rich, under 60 characters
2. Opening hook — 2-3 sentences that state the problem. No throat-clearing.
3. 4-6 H2 sections, each with 2-3 H3 subsections
4. Integrate data points naturally (not dumped in a list)
5. Conclusion with ONE clear CTA
6. Target: 3,500 words (+/- 300)

SENTENCE STRUCTURE (critical for sounding human):
- 20% short sentences (under 10 words)
- 50% medium (10-25 words)
- 30% long (25+ words)
- Never more than 2 short sentences in a row
- Never more than 1 long sentence in a row

FORBIDDEN:
- Bullet-heavy sections (use flowing prose)
- "In conclusion" or "To summarize"
- Generic benefit statements without proof
- Starting paragraphs with "When it comes to..."

The brand voice doc is the backbone here. Without it, the draft agent defaults to that vaguely enthusiastic, slightly robotic tone that screams "AI wrote this." With it, the first draft already sounds right 90% of the time.

Step 3: Scoring Agent

Automated quality gate. Scores every draft against a rubric and flags specific issues. This is where you catch AI-sounding patterns before a human has to.

Quality rubric:

Criterion	Weight	Pass	Fail
Readability	20	Mixed sentence lengths, natural flow	Monotone, choppy, jargon walls
SEO	15	Keyword in H1 + first 100 words + 2 H2s	Keyword stuffing or missing from key positions
Accuracy	15	All stats sourced, fair comparisons	Unsourced claims, strawman arguments
Brand Voice	20	Matches guide, sounds human, active voice	AI-sounding phrases, forbidden words
Structure	15	Clear hierarchy, strong hook, CTA conclusion	Flat structure, weak opening, no CTA
Messaging	15	Value props correct, positioning aligned	Off-brand claims, wrong positioning

Passing threshold: 75/100

Agent prompt:

Score this blog post on a 100-point rubric. Be harsh.

=== BRAND VOICE GUIDE ===
[brand-voice.md injected here]

=== MESSAGING FRAMEWORK ===
[messaging.md injected here]

=== BLOG POST ===
{draft}

DEDUCTIONS:
- -5 per "Additionally/Furthermore/Moreover"
- -3 per "dive into/delve/leverage"
- -5 per unsourced stat
- -5 per strawman comparison
- -10 per factual error
- -5 per paragraph that "sounds like AI"

OUTPUT: Scores per category, total, pass/fail, line-level feedback,
and low-confidence sections flagged for human review.

The "sounds human" test: Read a paragraph aloud. If you cringe, or if any sentence makes you think "an AI wrote this," flag it for human review regardless of score.

Step 4: Human Review

The scorer catches 80%. The human catches the remaining 20% — tone-deafness, factual nuance, brand judgment calls that no rubric can encode.

Revision flow:

Draft → Scoring Agent evaluates
  IF score ≥ 75 AND no low-confidence flags → human approval queue
  IF score ≥ 75 BUT low-confidence sections → human reviews flagged sections
  IF score < 75 → specific feedback sent back to Draft Agent
    → Draft Agent revises (fixing only what's called out)
    → Scoring Agent re-evaluates
    → Max 2 automated revision cycles
    → After 2 failed revisions → human takes over regardless of score

Why cap at 2 revisions? Drafts that don't pass after 2 revisions have structural problems that more LLM passes won't fix. The feedback loop oscillates — the agent fixes one thing and breaks another. A human needs to step in. I learned this the hard way during the 34-post run: three posts got stuck in revision loops, and manual intervention resolved them in minutes.

Step 5: Publish Queue

Posts land in a staging queue. Not directly to live. A human reviews the batch, approves or rejects each post, and scheduling handles the rest.

The flow:

Agent stages post with metadata (title, slug, meta description, tags)
  → Human reviews in staging queue
  → Human approves / rejects / edits
  → Approved posts pushed to CMS as drafts
  → Human sets publish schedule
  → Posts go live

Nothing goes live without a human pressing the button. The agents do the heavy lifting — research, drafting, scoring, revision. The human makes the judgment calls — this reads right, this needs another pass, this one's dead.

During the 34-post run, 31 posts passed through the queue with minor edits. 3 needed substantial rework. That's a 91% first-pass rate — and the 3 that needed work would've been caught immediately by any editor who knows the brand.

What the Playbook Doesn't Cover

This playbook gives you the architecture. You could build it yourself.

Here's what the playbook doesn't give you:

Calibration. The scoring thresholds took dozens of test runs across 4,884 articles. The playbook gives you the final numbers, not the 40+ hours of testing that produced them.
Edge cases. The Pro Tips below cover the highlights. Production means handling hundreds of edge cases — malformed outputs, API rate limits, model regressions, CMS quirks.
Integration. Wiring this into YOUR stack — your CMS, your brand docs, your approval workflows. Your version will be different from every other version I've built.
Maintenance. Model updates break prompts. Brand guidelines evolve. Scoring rubrics need recalibration. The pipeline needs ongoing tuning.

Most teams that DIY this spend 60-100 hours building 70% of what the production system does. Then they hire someone to fix the last 30%.

I build production content engines for $5,000-$12,000, flat fee. You own everything. No subscriptions. No vendor lock-in. The scoping call is free — you'll walk away with a build spec and a quote, even if you never hire me.

Currently booking for March 2026. I take on 2-3 builds per month.

Book a 30-minute scoping call →

Pro Tips & Pitfalls

The context layer is a one-time investment. 2-4 hours to write. After that, every post inherits those docs. The per-post cost of that context approaches zero at scale.

The model matters less than the context. A mediocre model with great context beats a great model with no context every time. I've tested this — the context layer is the variable that moves the needle.

Human review is non-negotiable at scale. The scoring agent catches mechanical issues. It can't catch tone-deafness, factual nuance, or brand judgment. A human can.

Train your reviewer to spot AI slop. The tells: unnecessarily complex sentences, hedge phrases ("it's worth noting that"), transition stuffing ("Furthermore," "Additionally"), and technically-correct-but-soulless writing.

Know when to kill a draft. If it fails scoring twice and the reviewer isn't excited, kill it. Publishing mediocre content damages your brand more than not publishing at all.

Don't skip the staging queue. One off-brand post that slips through erodes the trust you've built with every good post before it.

Tools & Links

Tool	Purpose	Pricing
Anthropic API	LLM agents (Claude)	Usage-based
Exa	Web search for research agent	Usage-based
WordPress REST API	Publishing	Free (self-hosted)
Google Docs API	Alternative publishing	Free
Firecrawl	Web scraping for competitive research	Free tier / paid

The Math

Production Results

Run	Output	Timeline
Content Publisher	4,884 articles (integration guides, reporting playbooks, technical walkthroughs)	Node.js, 4 concurrent workers
Content Engine	34 comparison blog posts (3,500-4,500 words each)	Single day, 5-agent pipeline

Cost Per Post

Item	Cost Per Post	At 34 Posts	At 500+ Posts
LLM API (research + draft + scoring + revision)	~$0.50-1.00	~$25	~$375
Web search API	~$0.10	~$3.50	~$50
Human review (15-30 min/post)	Varies	~17 hours	Batch review
Total marginal cost	~$0.60-1.10	~$30	~$0.85/post

vs. Traditional Content Team

A traditional content team (2-3 writers + editor + SEO specialist) costs $400-600K/year for 200-400 posts. This pipeline: ~$5-10K/year in API costs for 500+ posts, plus reviewer time.

The 34-post run would have taken a writer 2-3 months. The pipeline did it in a day. The real cost is building the context layer: 2-4 hours, one time. After that, marginal cost per post is under $1.

The math isn't about replacing writers. It's about turning one person with good judgment into a content team.

Want This Built for Your Business?

4,884 articles. 34 blog posts in a single day. $0.60 per post. Those aren't projections — they're production results.

I build content engines like this one for businesses that need consistent, on-brand content at scale. Here's what the scoping call covers:

Your content goals — volume, voice, channels, and cadence
A build spec — exactly what agents, context docs, and integrations you'd need
A flat-rate quote — $5,000-$12,000 depending on complexity. You own everything.

The call is 30 minutes. No pitch. If I can help, I'll tell you what it costs. If I can't, I'll tell you that too.

"Julian takes clear ideas and turns them into fully executed AI go-to-market builds — with extreme ownership and zero ego." — Hannah Recker

Currently booking for March 2026. I take on 2-3 builds per month.

Book a 30-minute scoping call →

The Content Engine: Multi-Agent Pipeline That Produced 4,884 Articles

TLDR

Step 0: Build Your Context Layer

Messaging Framework Template

Brand Voice Template

Competitive Positioning Template

Step 1: Research Agent

Step 2: Draft Agent

Step 3: Scoring Agent

Step 4: Human Review

Step 5: Publish Queue

What the Playbook Doesn't Cover

Pro Tips & Pitfalls

Tools & Links

The Math

Production Results

Cost Per Post

vs. Traditional Content Team

Want This Built for Your Business?

Want this built for you?