Analytics & Data
End-to-end analytics and data platforms — from pipeline and warehouse through BI dashboards and AI-driven self-serve.
Data is only as useful as the decisions it drives
Most "data problems" aren't data problems — they're decision-enablement problems. The data exists; it's just not reaching the person who needs it, in the shape they need it, at the speed they need it. My analytics work lives in that gap: designing pipelines, warehouses, dashboards and increasingly AI surfaces that collapse the distance between a question and a trustworthy answer.
In practice that's full-stack: source systems and event capture, ETL and modelling, a warehouse and a semantic layer, BI dashboards, executive reporting, and — now — agentic AI sitting on top of all of it so non-analysts can ask real questions and get real answers without waiting in a queue.
Building analytics platforms, end-to-end
The most substantial piece I've shipped is a full internal analytics platform — 25+ dashboards spanning every department (Commercial, Marketing, Product, Finance, Pro Success, Ops), a disciplined data model underneath, and an agentic AI Analyst that writes its own queries to answer anything the dashboards don't. It cut BI request volume by over 95% while lifting accuracy, and runs as a public demo on synthetic data.
See the AI Analyst case study for the full walkthrough — problem, architecture, data model, the agent surface, and the live demo link.
Data engineering & pipelines
Clean BI is downstream of clean data engineering. I design source-to-warehouse pipelines that stay honest under real load:
- Source modelling — stable source-of-truth entities rather than "whatever the CRM exports today", with documented contracts and versioned schemas.
- ETL / ELT discipline — idempotent loads, watermark- based incremental ingestion, deduplication, late-arriving-data handling, and explicit failure behaviour on upstream schema drift.
- Warehouse design — partitioning by crawl, by month, by tenant; clustering keys chosen for the hot queries; materialised aggregates where they pay back; and PgBouncer-style connection pooling in front of anything chatty.
- Semantic layer — one place where a metric is defined. "Active user" means the same thing in the Commercial dashboard and the Product dashboard, or it doesn't appear.
- Cost discipline — queries budgeted, scans watched, partition-pruning enforced. Warehouse bills rarely surprise.
Data at scale
CrawlZilla alone generates results at a scale most analytics stacks never have to face — 100M+ URLs per crawl, tens of millions of rows written per run, 40+ SEO data points per row. The architecture under that — partitioned PostgreSQL, PgBouncer pooling, virtualised UI — is on the engineering notes (internal view). The general lesson: at this scale, the shape of your write path and the shape of your query path are two different problems, and both have to be designed intentionally.
SEO analytics
On the SEO side specifically, I build end-to-end pipelines joining GA4, GSC, crawl data, log files, and paid-channel data into a single warehouse surface — because the interesting questions only live in the joins. What landing pages earn impressions but lose clicks? Which crawled URLs Google isn't indexing, segmented by template? Where does paid spend duplicate organic demand the site is already ranking for?
Outputs typically land in Looker, Metabase, or custom dashboards depending on the team's existing stack. The pattern stays the same: normalise at ingest, model for intent, expose clean semantic entities to BI.
Executive reporting & commercial alignment
A dashboard that exec teams don't read is worse than no dashboard. I build reporting that actually gets used: a small number of commercially-meaningful KPIs, tied explicitly to revenue / CAC / margin; clear period-over-period and against-target views; and enough drill-down from the headline number for the reader to answer their own follow-up question without having to ping anyone.
The analytics work only matters if the organisation ends up making different decisions because of it. Every dashboard and every pipeline I ship gets judged against that, not against how many widgets it has.
Core capabilities
- Analytics Platform Design (Source → Warehouse → BI → AI)
- Data Modelling & Semantic Layers
- ETL / ELT Pipelines (idempotent, incremental)
- Warehouse Design (partitioning, clustering, materialised views)
- PostgreSQL at Scale (partitioned, PgBouncer-pooled)
- BigQuery · SQL
- Connection Pooling · Query Cost Control
- Google Analytics 4 (GA4) · Search Console (GSC)
- Log-File Analytics · Crawl Data Joins
- Looker · Metabase · Custom Dashboards
- CrawlZilla (custom enterprise crawler)
- AI Analyst Surfaces (agentic, read-only SQL)
- Opportunity Modelling · Attribution
- Executive Reporting & KPI Frameworks
- Commercial Alignment (CAC / LTV / Margin)
- Data Quality · Schema Drift Detection
- Cost Dashboards · Query Budgets