AI & Automation
Production AI systems — agents, pipelines, and automations that replace repetitive manual work and process data at a scale humans can't.
Why AI, done properly, matters
The distance between "we use AI" and "we've built a production AI system" is operational maturity. Every team has someone prompting ChatGPT now; very few have systems that can classify 500,000 records reliably inside a budget, run every night, alert when quality drops, and not set fire to the context window on the first retry. The work that sits between a demo and a dependable production pipeline is where most of my time lives.
I've been building AI-leveraged systems since well before it was fashionable — content automation, metadata generation, structured-data pipelines, agentic analysts, bulk classification, enrichment, and SEO automation at catalogue scale. The detail below is what I reach for repeatedly.
Agentic AI in production
Agents with read-only tools — an LLM that can call list_tables, describe_schema, sample, query, finish_report style tools — are the pattern I keep reaching for. They turn a free-text question into a self-driving investigation, and if you set the tool surface correctly they're far safer than a "write this SQL for me" one-shot prompt.
The hard problems are usually context management and reliability, not prompting: data-ref caching so that ten-thousand-row results don't poison the prompt; deterministic iteration caps so an agent can't wander; read-only enforcement at parse time so destructive operations never reach the data; and persisted report threading so follow-up questions inherit the previous agent state.
Built this exact pattern for a full internal analytics platform — see the AI Analyst case study.
AI pipelines at scale
Running an LLM over hundreds of thousands of rows is a data-engineering problem dressed up as an AI problem. The LLM call is the easy bit. What actually matters:
- Batching and concurrency sized to provider rate limits without hitting them; exponential backoff and retry on transient failures; dead-letter queues for inputs that fail repeatedly.
- Idempotency and checkpointing so a job can fail halfway through 200k records and resume without double-billing or double-writing.
- Schema-constrained outputs — strict JSON schemas enforced at parse time with one automatic retry on invalid structure. Downstream code never has to parse free text.
- Cost control — per-run budget caps, cheap models for first-pass filtering, expensive models only where accuracy demands it. Token accounting surfaced in dashboards, not buried in a provider bill.
- Evaluation loops — golden datasets, sampled human review, and regression tests so a model upgrade doesn't quietly make outputs worse.
Automating the repetitive
Most of what "ops", "content ops" and "first-line analyst" roles spend the week on is automatable with the right shape of pipeline. Categorising support tickets, extracting entities from unstructured text, tagging content, scoring leads, normalising product data, triaging inbound feedback — the same LLM-plus-schema pattern applies across all of them.
Done well, the shape of the work changes from "human does task, AI assists" to "AI does task, human audits exceptions". In practice that lands as an 80–95% reduction in repetitive headcount time, accuracy staying flat or improving, and team capacity shifting to the work that actually benefits from judgement.
Reviewing data at scale
Frontier LLMs are remarkable at reading. Give one a document, a page of content, a log snippet, a review, a crawl result, a support thread — and ask it to extract, classify, or score. In a well-instrumented pipeline you can process hundreds of thousands of items per night reliably, on a fraction of the budget a human team would cost, with better consistency than a team working at speed.
The trick is not to let the model "think" about things it doesn't need to. Narrow the task. Give it a schema. Give it exactly the context it needs and nothing more. Cache aggressively. Sample to validate. Iterate the prompt from golden examples, not hunches.
Generative Engine Optimization (GEO)
AI Overviews, Perplexity, ChatGPT browsing, Gemini — the new layer sitting above the traditional SERP — cite content, they don't just rank it. That turns SEO into a different problem: instead of optimising for a blue link, you're optimising for being the source an LLM pulls a sentence from.
In practice: answer-first intros, consistent heading hierarchy, entity-dense supporting content, structured data that spells out every fact cleanly, FAQ / how-to structure where intent warrants it, and disciplined topical clustering so the model has a clear "authoritative source on X" signal to attach to. Training content teams to write this way by default is where it stops being an SEO project and starts being an editorial discipline.
Engineering discipline
Production AI systems fail interestingly — slowly, through drift and degradation, rather than hard crashes. The disciplines that keep them honest:
- Evaluation datasets — small, curated, version-controlled; every prompt or model change runs against them before it ships.
- Prompt versioning treated like code — diffs, reviews, rollbacks.
- Cost and latency dashboards — tokens in / out per call, $ per run, p50 / p95 latencies; alerts when any of them drifts.
- Human sampling in the loop — a statistically-sized sample of production outputs reviewed weekly; regressions caught before they compound.
- Deterministic fallbacks — when the model fails a schema check, times out, or violates a constraint, what happens next is known behaviour, not "whatever comes back".
Core capabilities
- Agentic AI Systems (tool-use · read-only enforcement)
- Context Management · Data-Ref Caching
- Batch AI Pipelines (100k+ records per run)
- Schema-Constrained Outputs (strict JSON validation)
- Idempotency · Checkpointing · Dead-Letter Queues
- Cheap-then-Expensive Cascades (cost-aware routing)
- Classification · Enrichment · Extraction at Scale
- Retrieval-Augmented Generation (RAG)
- Automated Content & Metadata Pipelines
- Structured-Data Generation at Scale
- Scalable Internal Linking Systems
- Generative Engine Optimization (GEO)
- AI Overview / LLM Citation Optimization
- Prompt Engineering · Golden Datasets · Evaluation
- Prompt Versioning · Regression Testing
- Token Accounting · Budget Caps · Cost Dashboards
- Production Guardrails · Human-in-the-Loop Sampling
- Frontier LLM Experience (major providers)
- AI-Assisted SEO Audits & Crawl Analysis