Ortho-SEO Plugin v3.14.0 — How the Audit Works

Section 01 · Orchestration

How the audit actually runs

17 sub-audits do not run in sequence — they run in three coordinated waves. Wave 1 is massively parallel; Waves 2 and 3 are sequential because they depend on Wave 1's output.

01

Wave 1 — Parallel batch

In a single message, the orchestrator dispatches 16 sub-audits concurrently as independent agents. Each writes its findings to disk as audit_<name>.json.

Includes: backlinks, GBP, real GSC rankings, citations, reviews, technical, schema, page deep-dive (SCHOLAR), content quality, LLM visibility, competitor, content gap, map-pack heatmaps, link-graph, tracking, plus OTTO Phase A (raw pull only).

Wall time: 7–10 min. Sequential would take 40–70 min.

02

Wave 2 — OTTO Phase B

OTTO triage runs after Wave 1 because it scores each OTTO recommendation against the other 14 sub-audits' findings. It literally cannot score corroboration until those envelopes exist on disk.

The triage produces three CSVs: otto_keep.csv (worth deploying), otto_review.csv (human judgment needed), otto_noise.csv (suppressed with reasons logged). See §5.

Wall time: 2–4 min. Sequential by design.

03

Wave 3 — Content roadmap

The content-roadmap synthesizer reads every Wave 1 + Wave 2 output and produces the 90/180/365-day publication plan. New pages, upgrades, consolidations — each with target keyword, parent pillar, estimated traffic, and dependencies.

Pure synthesis. Zero new measurements. Just intelligent reading of what the audits already found.

Wall time: 3–5 min. Final wave.

Failure handling is graduated, not all-or-nothing

1–2 sub-audits fail: continue. Affected sub-criteria get marked "Estimated — confirm via [tool]". Strategy presentation hides those sections via template guards.
4+ sub-audits fail: abort. That's an MCP/auth problem, not a content problem — the operator gets routed to run the test harness rather than waste time troubleshooting individual sub-audits.
Brand routing happens once, upfront (Step 2.4) — a single prompt to confirm HIP or NEON Canvas, written to brand.yaml. Prevents 8 sub-audits from each independently blocking on the same question.

Section 02 · Scoring

The V5 rubric — 1000 points, unpacked

Eight categories, weighted by what actually moves the needle for orthodontic local SEO in 2025–2026. Each category breaks into sub-criteria; each sub-criterion has a max-point value and a deterministic scoring rule.

Cat 1 · Content Strategy & Topical Authority30030%

Cat 2 · Google Business Profile20020%

Cat 3 · On-Page SEO & Internal Linking15015%

Cat 4 · Domain Authority & Links15015%

Cat 5 · Technical SEO757.5%

Cat 6 · Generative Engine Optimization (GEO)505%

Cat 7 · Reviews & Reputation505%

Cat 8 · Local Citations & NAP252.5%

Sub-criteria — how each category's points get distributed

Every category subdivides into 3–7 sub-criteria. Each sub-criterion has a max-point value and a precise scoring rule (e.g., "50 pts if 5+ pillar pages with 2,000+ words and 10+ internal links each"). The rubric document is 502 lines and lives at skills/seo-audit-and-plan/references/v5-rubric.md.

Category	Sub-criteria	Max pts	Primary sub-audit
Cat 1 Content · 300	1.1 Topical Map Completeness (75) · 1.2 Pillar Pages (50) · 1.3 Cluster Content (50) · 1.4 Service Pages (40) · 1.5 Location Pages (40) · 1.6 Content Velocity (25) · 1.7 E-E-A-T (20)	300	`content-gap` · `content-quality` · `page-deep-dive` · `schema-audit`
Cat 2 GBP · 200	2.1 GBP Completeness (50) · 2.2 Heatmap Performance (75) · 2.3 GBP Activity (40) · 2.4 GBP-Website Alignment (35)	200	`map-pack` · `gbp-reviews` · `schema-audit`
Cat 3 On-Page · 150	3.1 Title Tags (25) · 3.2 Meta Descriptions (20) · 3.3 Heading Structure (25) · 3.4 Internal Linking (50) · 3.5 Image Optimization (20) · 3.6 URL Structure (10)	150	`technical-audit` · `link-graph-audit` · `page-deep-dive`
Cat 4 Authority · 150	4.1 Domain Metrics (40) · 4.2 Backlink Quality (40) · 4.3 Backlink Gap (30) · 4.4 Off-Page Building (40)	150	`backlink-audit` · `backlink-gap`
Cat 5 Technical · 75	5.1 Schema Markup (25) · 5.2 Core Web Vitals (25) · 5.3 Crawlability (15) · 5.4 Site Security (10)	75	`technical-audit` · `schema-audit`
Cat 6 GEO · 50	6.1 LLM Visibility (25) · 6.2 Content Extractability (15) · 6.3 Cross-Domain Consensus (10)	50	`llm-visibility`
Cat 7 Reviews · 50	7.1 Review Quantity (20) · 7.2 Review Quality (20) · 7.3 Review Management (10)	50	`gbp-reviews`
Cat 8 Citations · 25	8.1 Citation Presence (15) · 8.2 NAP Consistency (10)	25	`citation-audit`

How the final score maps to a strategy phase

Score band → phase mapping is deterministic. The grade drives the entire roadmap:

Score	Grade	Status	Phase & Priority
900–1000	A+	Market Dominator	Phase 6 — Maintain & Defend
800–899	A	Strong Performer	Phase 5 — Market Domination (GEO, trophy content)
700–799	B	Competitive	Phase 4 — Authority Building
600–699	C	Average	Phase 3 — Competitive Positioning
500–599	D	Below Average	Phase 2 — Core Optimization
0–499	F	Needs Major Work	Phase 1 — Foundation

What the rubric actually looks like in practice — Adirondack Orthodontics

461

out of 1000

Grade F · Phase 1 Foundation

The headline

Strong on-page mechanics (Cat 3: 131/150) and a flagship review reputation (Cat 7: 40/50), but content is undeveloped (Cat 1: 106/300), authority is thin (Cat 4: 26/150), and the GEO trajectory is negative (LLM mentions down 41 in 30 days). Despite excellent map-pack rankings in established cities, Albany "braces" is declining 1.2 → 2.8 and rival Albany Braces now outranks them in their headline market.

Category	Score	Max	%	Why
Cat 1 — Content	106	300	35%	No topical map, no pillar pages, vendor templates, cannibalization. SCHOLAR avg 46.8 across priority pages.
Cat 2 — GBP	66	200	33%	Strong existing heatmap rankings dragged down by missing GBP Galactic data + declining Albany "braces".
Cat 3 — On-Page	131	150	87%	Highest % earned. SA Holistic Technical pillar 81/100 + real GSC CTR 3.52%.
Cat 4 — Authority	26	150	17%	DR 15, spam score 21, 275 broken backlinks, no off-page work.
Cat 5 — Technical	51	75	68%	CWV failing on audited pages, but HTTPS + crawlability solid.
Cat 6 — GEO	28	50	56%	LLM visibility 50/100 but trajectory negative (−11.7 pts, −41 mentions in 30d).
Cat 7 — Reviews	40	50	80%	1,268 reviews, 4.94 avg, 89% reply rate. Best category by % earned.
Cat 8 — Citations	13	25	52%	No SA citation submissions on record; multiple duplicate CIDs flagged.
Total	461	1000	46%	Grade F · Phase 1 Foundation

Section 03 · Methodology

Composite scoring — how 17 envelopes become 1 score

Multiple sub-audits often measure the same category. The composite scorer's job is to merge them honestly without averaging away signal.

The composite scorer runs in two layers. A baseline scorer (score_rubric.py) computes a single-source score from the crawl alone — that's the v3.4 fallback if you run with zero sub-audits. The composite scorer (run_composite_score.py) is a non-breaking overlay on top that pulls in all 17 sub-audit JSON envelopes and merges them.

The merging rule: median-of-N

When two or more sub-audits score the same V5 category at the full point-range (e.g., both report a 0–150 score for Cat 3), the composite takes the median — not the average, not the sum. This is the system's defense against any single source dragging the result wrong. One extreme reading gets absorbed.

Real example: Adirondack Category 3 (On-Page) — 4 sub-audits scored it

real-rankings contributed 131/150 (driven by SA Holistic Technical pillar 81 + real CTR 3.52%).
technical-audit contributed 53/150 (driven by detailed Lighthouse + meta + alt-text measurement).
page-deep-dive contributed 14/150 (SCHOLAR + internal linking on priority pages).
link-graph contributed 22/150 (PageRank + click-depth analysis).
Composite reported: 131 — the median of the full-range contributions, dominated here by real-rankings.

Three aggregation modes — selected automatically per category

median_of_full_scores

When multiple sub-audits each score the full category range. Used for Cat 3, 5, 6, 7, 8 on Adirondack. Median is robust to outliers.

sum_of_partial_ranges

When sub-audits cover complementary sub-ranges. E.g., backlink-audit owns sub 4.1–4.5 (90 pts), backlink-gap owns sub 4.6 (60 pts) — these sum cleanly. Used for Cat 1, 2, 4.

baseline_only

When no sub-audit covered the category, fall back to the crawl baseline. The audit envelope notes "Estimated — confirm via [tool]" so the gap is transparent.

Transparency fields — every score shows its work

score_by_source — how many points each data source contributed per category (site crawl, Search Atlas, Ahrefs, DataForSEO, etc.).
category_contributions — for each category: the baseline crawl score, every sub-audit's named contribution, the aggregation method used, plus cross-category evidence rows from supporting sub-audits.
sub_audits_succeeded / sub_audits_failed — exact list so the operator can see which measurements are missing if any sub-audit errored.
evidence arrays — each sub-audit's evidence strings flow through to the audit report and the strategy presentation. Example: "avg LCP 3550ms, CLS 0.165, 0% pages pass CWV, avg Lighthouse Performance 62/100" — that's the actual evidence string from Adirondack's Cat 5 score.

Section 04 · The audit engine

The 17 sub-audits — what each one actually does

Click any audit to expand. Each one shows its purpose, the data sources it pulls, the actual algorithm, what it contributes to the V5 rubric, and a real finding from the Adirondack or Green Orthodontics audit.

Group 1 — Baseline Crawl1 audit

Baselineseo-site-reviewFoundation (feeds all)

Site crawl + ground truth + pre-flight QA

Finds every content, NAP, schema, and structural error on a live site before any other audit runs — produces the canonical page list every downstream sub-audit reads.

Data sources

Firecrawl — full markdown + rawHtml for every page (headers/footers included so NAP and JSON-LD survive).
DataForSEO OnPage — bulk task crawl: pages, duplicate tags, internal links, redirect chains, non-indexable.
Parallel content-analyzer subagents — fanned out for content QA in batches of ~30 pages.

Methodology

Bulk-only on the happy path — two async jobs do the heavy lifting; no per-page calls until exceptions surface.
A Python "ground-truth" step picks the homepage, contact, and location pages, extracts canonical practice name/phone/address/doctors, and asks the operator to confirm before flagging anything. Wrong ground truth = false positives everywhere.
Pre-flight pattern checks against rawHtml: heading hierarchy, Google Maps URL resolution (must land on a real GBP CID, not an address pin), iframe form privacy/terms link inspection, deep JSON-LD validation, hostname canonicalization (catches Kinsta/WP Engine leaks).
Cross-page Python pass detects cannibalization (exact title/H1 duplicates, neighborhood-doorway templating) and internal-linking issues (orphans, missing footer links, broken links).

Rubric contribution

Not directly scored. Produces crawl_pages.json + ground_truth.json that every other sub-audit consumes. Findings feed Cat 3 (On-Page) and Cat 1 (Content) indirectly.

Real finding · Adirondack

Crawl seeded ground truth and surfaced 1 orphan page, 2 broken internal links that flowed into Cat 3 scoring. Confirmed 7 GBP locations + 4 doctors (Berenshteyn, Boudreaux, Pacheco, Pellettieri).

Group 2 — Content3 audits

Contentseo-content-gap→ Cat 1.3 · 50 pts

Keyword gap vs competitors

Which keywords are our rivals ranking for that we aren't, ranked by how much traffic we'd capture if we won them?

Data sources

Ahrefs — site-explorer-organic-keywords top 1,000 by volume, per domain.
DataForSEO Labs — google_domain_intersection with intersections: false flag = "rival ranks, we don't".
Search Atlas — se_analyze_keyword_gap + se_get_keyword_gap_results (async submit-and-poll, cross-validation).

Methodology

Pulls our top 1,000 organic keywords and each rival's top 1,000 from all three providers; subtracts our set from each rival's set — what remains is the "gap".
Merges results across providers and dedupes. A keyword present in 2–3 sources is high-confidence; single-source gets a caveat.
Scores each gap keyword as highest_volume × rival_position_inverse (position 1 → ×100, position 50 → ×51, position 100+ → ×0). Rewards both volume AND rival proximity to page 1.
Buckets the top 100 by topical category (braces, invisalign, ortho_general, info) via deterministic keyword-token rules — no LLM guessing.
Cross-references against existing internal link graph to tag each gap as coverage: partial (rewrite existing page) vs coverage: none (build new).

Rubric contribution

Cat 1.3 Cluster Content (max 50 pts). Computes a coverage_pct and maps to a delta: ≥90% coverage = 0; <30% = −20 pts off baseline.

Real finding · Adirondack

Scored 25/50 on Cat 1.3, applied a −10 delta. Single rival used: albanybraces.com. Evidence: "DFS domain_intersection adirondackorthodontics.com vs albanybraces.com". Ahrefs and Search Atlas legs didn't surface keywords on this run, so the audit ran in DFS-only mode with the lower confidence noted.

Contentseo-content-quality-audit→ Cat 1.4–1.7 · 125 pts

Cannibalization, vendor content, freshness, pillar architecture

Where is content actively hurting us — duplicate pages competing with each other, boilerplate from old vendors, stale content, or a missing pillar structure?

Data sources

crawl_pages.json from prior site-review (required, ≥5 pages).
DataForSEO Labs — google_ranked_keywords for cannibalization confirmation (Phase 2).
Search Atlas GSC — gsc_get_keyword_performance for canonical Google truth (Phase 3).

Methodology

Cannibalization detection: normalize each page's title + H1, strip brand and geo tokens, cluster pages with identical phrases. Confirm via DFS (≥2 URLs ranking top 50 for same keyword) AND GSC (≥2 URLs accumulating impressions for the same query). GSC is the strongest signal — it's Google literally telling us two pages compete.
Vendor boilerplate: hash every sentence ≥50 chars across the site. Sentences appearing on ≥2 pages indicate template reuse. Flag pages with <40% unique sentences AND >50% boilerplate score.
Freshness: extract dates from JSON-LD first, then OpenGraph, then visible text. Bucket: fresh (≤6mo), current, stale (>12mo), untraceable.
Pillar architecture: build URL tree, identify Mother pages matching ground_truth.services, count Sons and Grandsons. Compare to ideal (each Mother → 2–5 Sons → 1–3 Grandsons each).

Rubric contribution

Cat 1.4 Service Pages (40) + 1.5 Location Pages (40) + 1.6 Content Velocity (25) + 1.7 E-E-A-T (20) = 125 pts. Also emits cross-category cannibalization evidence to Cat 3.

Real finding · Adirondack

Scored 61/125. 1.6 Velocity = 17/25 with evidence "17% pages fresh, 17% stale, avg page age 611 days, 8 pages have no detectable date". 1.5 Location Pages flagged "8 location pages for 2 cities (400% coverage); 4 location pages involved in cannibalization" — the practice over-built city pages that now compete with each other.

Contentseo-content-roadmapEvidence only · synthesizer

90/180/365-day content plan synthesizer

Given everything every other audit found — what specific pages should we build, upgrade, or kill, and in what sequence?

Data sources

Pure synthesizer — zero new measurements. Reads all the other audit envelopes plus ground_truth.json and crawl_pages.json.

Methodology

Generates three candidate types: NEW_PAGE (from content-gap keywords, missing pillar Mothers/Sons, competitor top pages), UPGRADE (page-deep-dive SCHOLAR <60, cannibalization winners, buried pages with topical value), CONSOLIDATE (cannibalization losers, stale vestigial orphans).
Each candidate gets a type-specific impact score. NEW_PAGE blends keyword volume (log-scaled), winnability (inverse of rival position), pillar gap severity, and parent Mother's PageRank percentile.
Sequences into phases with dependencies: Phase 1 (90d) = all consolidations + top 3 upgrades + top 3 new pages. Phase 2 (180d) = remaining upgrades + Son/Grandson buildout. Phase 3 (365d) = depth content. A Son can't ship before its Mother.
Generates a deployable brief for each item: title, slug, primary/secondary keywords, target word count, required schema types, H1/H2 skeleton, FAQ prompts, internal-link plan, E-E-A-T requirements.

Rubric contribution

Evidence only — informational under Cat 1. The output is the deliverable; it doesn't change the score.

Real finding · Adirondack

Roadmap proposes 2 new pages + 3 upgrades + 2 consolidations. Phase 1 (90d): 7 actions. Phases 2 + 3 empty because content-gap surfaced few keywords on this run (DFS-only mode) — synthesizer compressed everything into Phase 1 rather than padding without evidence. Estimated additional keyword capture: 2 keywords.

Group 3 — Technical & Schema3 audits

Technicalseo-technical-audit→ Cat 5 (75) + Cat 3.1/3.2/3.4 (70)

Crawlability, Core Web Vitals, indexability, meta validation

Is the site technically crawlable, indexable, fast, and clean — with one cloud crawl instead of running Screaming Frog?

Data sources

DataForSEO OnPage task crawl — task post, summary, pages, duplicate tags, redirect chains, non-indexable, links.
DataForSEO Lighthouse on top 10 pages — Performance, SEO, Accessibility, Best Practices + Core Web Vitals.
Ahrefs Site Audit — independent error/warning verification.
Firecrawl JS-render diff on 3 pages — detects hydration issues where content only appears post-JS.

Methodology

One task crawl, then ~5 GET calls retrieve the whole site (versus 150 sequential page hits).
Per-page checks.* boolean flags are the authoritative issue source — no inference, no LLM guessing.
Severity is rule-based: Critical = 404/5xx, missing title, missing H1, HTTP on HTTPS site. Warning = duplicate tags, thin content (<300 words), missing alt, canonical mismatch, orphan. Info = LCP/INP, render-blocking, no schema.
v3.13.0+ meta-conventions pass validates every title (50–60 char band, no YMYL superlatives, no "near me", no year), every meta description (120–160 chars, CTA before mobile cutoff, no phone numbers), and Title↔H1 token similarity ≥80%. State-board awareness layer flags FTC/state-dental-regulator risk for YMYL violations.

Rubric contribution

Full Cat 5 Technical (75 pts): 5.1 Schema (25), 5.2 CWV (25), 5.3 Crawlability (15), 5.4 Security/HTTPS (10). Also feeds Cat 3 sub-scores 3.1 (25) + 3.2 (20) + 3.4 (25) = 70 pts via title/meta/alt-text data.

Real finding · Adirondack

Cat 5 earned 41/75. Evidence: "avg LCP 3550ms, CLS 0.165, INP 270ms, 0% of audited pages pass all 3 CWV, avg Lighthouse Performance 62/100, 3 redirect chains, 2 non-indexable pages, 8 Ahrefs errors."

Technicalseo-page-deep-dive→ Cat 1.3/1.4 + Cat 3.5/3.6 (~110 pts)

Per-page SCHOLAR scoring on priority pages

Score the 5–10 most important pages page-by-page so the audit can say exactly which URLs need a rewrite vs. a minor tweak.

Data sources

crawl_pages.json from site-review (no re-crawl).
DataForSEO — on_page_instant_pages in parallel batches of 5; on_page_lighthouse on top 5.
Firecrawl as JSON-LD fallback when rawHtml is missing.

SCHOLAR framework — 7 letters × 0–10 each, scaled to 0–100

S — Specific: title / H1 / URL contain the primary service from ground truth + a tracked city.
C — Conversational: ≥3 question patterns, FAQ schema present, 2nd-person voice ≥5 occurrences, dl/details blocks.
H — Helpful: ≥2 CTAs ("book", "schedule", "consult", "call"), price or before/after content, descriptive alt text, clear next-step section.
O — Original: practice-specific signals (doctor names, local landmarks), low vendor-boilerplate ratio (<0.6 of phrases like "state-of-the-art technology"), internal link to a related Mother page.
L — Long: word-count tiers (≥400 / ≥800 / ≥1500) plus well-formed H2/H3 hierarchy.
A — Authoritative: doctor credentials (DDS, DMD, AAO), Author/Person schema, outbound link to .gov/.edu, Reviewed/Updated date <12 months.
R — Relevant: title keyword matches a tracked rank-tracker keyword, no sibling cannibalization, semantically-aligned inbound anchors, schema type matches content.

Interpretation: 80–100 pillar-ready · 60–79 needs work · 40–59 rewrite candidate · <40 failing.

Rubric contribution

Cat 1.3 Content Quality (30) + 1.4 Content Depth (30) · Cat 3.5 Internal Linking (25) + 3.6 Schema Coverage (25) = ~110 pts.

Real finding · Adirondack

4 priority pages audited, avg SCHOLAR 46.8/100. Drove 1.3 Content Quality to 8/30 and 1.4 Content Depth to 2/30 ("avg word count 63; 50% of pages have ≥3 H2s"). Internal linking on priority pages: avg inbound 1.2, anchor diversity 1.0, 1 orphan → 4/25 on 3.5.

Technicalseo-schema→ Cat 1.5 + Cat 2.4 + Cat 5.1 (~70 pts)

Structured data validation + NAP cross-reference

Does every page have the right JSON-LD, does it match visible content + GBP, and is it valid?

Data sources

crawl_pages.json rawHtml — regex-extract every <script type="application/ld+json"> block.
DataForSEO — onpage_pages.checks.has_json_ld / has_microdata for the "which pages have schema" baseline.
Firecrawl fallback when rawHtml is missing.
Ground truth NAP + GBP data from audit_map-pack.json for cross-validation.

Methodology

Parse every JSON-LD block; validate JSON syntax, required fields per @type, and correct subtype (Dentist/Orthodontist array form, not bare LocalBusiness).
Determine expected schema by page_type: homepage → LocalBusiness + Organization; service → MedicalProcedure; provider → Person/Physician; location → LocalBusiness per office; blog → Article + Person author; FAQ section detected → expect FAQPage.
Cross-reference NAP three ways: schema-vs-visible content, schema-vs-GBP, visible-vs-GBP. Mismatches go to the action list.
Stale/vendor schema detection: out-of-date phone, wrong address, generic vendor strings injected by old plugins.
@id cross-reference validation — every {"@id": "..."} must resolve to an entity in the @graph; broken refs are removed, never emitted.

Rubric contribution

Cat 1.5 Schema Completeness (20) + Cat 2.4 GBP-Website Alignment / Schema NAP (25) + Cat 5.1 Schema Markup (25) = ~70 pts.

Real finding · Adirondack

50% of pages have type-appropriate schema; 50% have any schema; 75% have no validation errors; NAP in schema matches visible content on 0/1 pages (0%). Scored 1.5 = 10/20 and 3.6 = 10/25. Recommended: deploy MedicalProcedure schema on all service pages, fix NAP-in-schema on the homepage.

Group 4 — Local Presence4 audits

Localseo-map-pack→ Cat 2.2 · 75 pts

Map-pack heatmap grid rankings

Where in the city does the practice actually show up in the Google Map Pack, and where does it disappear?

Data sources

Search Atlas Local SEO Heatmaps (primary) — list_businesses, list_businesses_heatmaps, get_heatmap_details, single_competitor_versus_report.
DataForSEO — serp_organic_live_advanced with location_coordinate for cross-validation at the address center.

Methodology

Match each practice location to a SA heatmap business by address/lat-lng (within ~100m).
For each (location × tracked keyword), pull the existing grid — every cell has a GPS coordinate and the practice's rank at that point. Typically 39 cells per grid.
Compute four KPIs per grid: average rank, top-3 share %, the largest concentric radius where median rank ≤3 (proximity radius), and dead-zone clusters (≥3 contiguous cells where rank >20).
Cross-validate with DataForSEO at the address center — if the practice doesn't appear at its own front door, SA data is suspect.
Optional rival comparison runs single_competitor_versus_report against the primary rival domain.

Rubric contribution

Cat 2.2 GBP Heatmap Performance (75 pts): 2.2.1 avg rank (25), 2.2.2 top-3 share (25), 2.2.3 proximity radius (15), 2.2.4 dead-zone coverage (10). Returns null if no heatmap project exists — "data unavailable" is not "failed."

Real finding · Adirondack

6 of 7 GBP locations average 1.2–2.8 across braces / orthodontist / invisalign. Latham "orthodontics" = 1.2 across all 39 grid pins. The killer find: Albany "braces" declining from avg 1.2 (Jan 2026) to 2.8 — and rival Albany Braces now ranks 2.6, outranking the practice in its own headline market. Score: 64/75 with a −10 negative-trend penalty.

Localseo-citation-audit→ Cat 8 · 25 pts

NAP consistency across directories

Is this practice's Name/Address/Phone consistent across the directory ecosystem, and is it listed where it matters?

Data sources

Search Atlas — gbp_list_citation_submissions + gbp_get_aggregator_details (covers ~30–50 directories via 5 aggregators: Data Axle, Localeze, Foursquare, Factual, Acxiom).
DataForSEO — business_data_business_listings_search, google_my_business_info, google_reviews.

Methodology

Path A (preferred): read prior SA citation submissions if they exist.
Path B (fallback): initialize a citation draft populated from ground_truth.json and submit a fresh scan; poll up to 5 minutes.
Normalize every found listing: phone to E.164, address with abbreviation collapse, name with legal-suffix stripping, website with host-only.
Compute Levenshtein distance per field vs canonical NAP — matches must be exact or ≤2 character edits (single-typo tolerance for name/address; exact match for phone/website).
Bucket each mismatching listing into name / phone / address / website / hours discrepancies.

Rubric contribution

Cat 8 (25 pts total): 8.1 aggregator presence (10), 8.2 NAP consistency % (10), 8.3 niche directories — AAO / ADA / Healthgrades / Zocdoc (5).

Real finding · Adirondack

Score 13/25 — partial because no SA citation scan history existed for any of the 6 locations. Baseline ran from GBP location data alone: 5/6 locations verified, 85.7% profile completeness, missing Healthgrades / Zocdoc / WebMD / RealSelf / AAO / Yelp. Map-pack data revealed duplicate CIDs — strong evidence a real scan would surface NAP fragmentation. Next action: gbp_init_citation_draft per location.

Localseo-real-rankingsCross-category evidence

Real GSC rankings — clicks, impressions, CTR, position

What does Google itself say about this practice's actual organic performance — replacing third-party rank estimates with first-party GSC truth?

Data sources

Search Atlas GSC integration — 9 tools including gsc_get_site_property_performance, gsc_get_keyword_performance, gsc_get_page_performance, gsc_compare_performance.
Search Atlas KRT (Keyword Rank Tracker) — daily SERP-position history beyond GSC's 16-month window.
DataForSEO Labs — google_ranked_keywords for cross-validation of estimate accuracy vs GSC truth.

Methodology

Hard gate: gsc_get_sites must return the practice domain. If GSC isn't connected, the audit returns status: error — it deliberately does NOT fall back to estimates.
Pulls two 30-day windows in parallel (current and prior) for period-over-period delta.
Computes branded vs non-branded split by matching keywords against practice name + doctor surnames from ground truth.
Identifies "winnable" keywords: position 4–10 with ≥50 impressions — small CTR/meta tuning lifts them to page 1.
Computes mean absolute position delta between DFS estimates and GSC reality for top 50 keywords — feeds confidence intervals on the other audits.

Rubric contribution

Does not own a category. Emits a cross_category block that the composite scorer overlays onto Cat 1.7, Cat 3.6, and Cat 6.3 — plus the SA Holistic Technical pillar feeds Cat 3 and Cat 5.

Real finding · Adirondack

8,289 ranking keywords, 1,330 clicks / 86,043 impressions / 3.52% CTR / avg pos 10.3 over 40 days. KRT shows 35/42 tracked keywords in top 3 (83%) across 5 location projects. The killer find: /how-much-do-braces-cost/ has 9,671 impressions at 0.07% CTR (position 16) — the single largest CTR opportunity site-wide. Homepage ranks for 189 keywords at avg position 26 — page-3 cliff, massive upside.

Localgbp-reviews→ Cat 7 · 50 pts

Google review reputation across all locations

Is this practice's Google review reputation a competitive asset or a liability, and are they actively managing it?

Data sources

Search Atlas — gbp_list_locations (enumerate verified GBP profiles) + gbp_get_review_stats per location (total reviews, weighted rating, reply count, rating distribution).
Optional: gbp_list_reviews for individual review pulls when negative-handling drill-down is needed.

Methodology

List all GBP locations under the practice's SA account; flag any unverified locations as warnings.
Per location, pull review stats and compute reply rate (replied / total).
Weights aggregate rating by review volume across locations — a 5.00 location with 118 reviews shouldn't dominate a 4.90 location with 405.
Flag any location with reply rate <90% as a "reply gap"; flag any location with ≥5 one-star reviews as needing sentiment investigation.

Rubric contribution

Full Cat 7 Reviews (50 pts): 7.1 review volume (20), 7.2 avg rating (20), 7.3 review management/response rate (10). Cross-category contribution to Cat 2.5 GBP Review Velocity.

Real finding · Adirondack

Score 40/50. 1,268 total reviews, 4.94 weighted avg, 89.4% reply rate across 6 GBP locations. Clifton Park has the biggest gap — 70 unreplied of 429 reviews. Albany has 6 one-star reviews (1.5% of 405) needing sentiment investigation. Glens Falls is the flagship: perfect 5.00 across 118 reviews. One stale unverified location (Moe Rd Clifton Park, ID 54025) needs claim/redirect.

Group 5 — Authority & Links4 audits

Authorityseo-backlink-audit→ Cat 4.1–4.5 · 90 pts

Practice's own backlink profile + velocity

How many real websites link to the practice, how trustworthy are those links, and is the link program growing, flat, or in decline?

Data sources

Ahrefs Site Explorer (primary) — referring-domains history, all-backlinks top-500, anchors, DR history, URL Rating history, broken backlinks.
DataForSEO Backlinks API (cross-validator) — backlinks_summary, backlinks_anchors, backlinks_timeseries_new_lost_summary, backlinks_bulk_spam_score.

Methodology

Pulls 12 months of referring-domain history from both sources; computes new / lost / net velocity at 30 / 90 / 365 day windows.
Classifies every anchor (weighted by referring domain, not raw link count) into five buckets: brand, exact-match, partial-match, naked URL, generic.
Flags a referring domain "toxic" if DFS spam score ≥30, OR if DR ≤5 with exact-match anchor (anchor-bait pattern), OR if it's on a low-trust TLD with DR ≤10.
Reconciles Ahrefs vs DFS — if they disagree by >50% on net velocity or >15pp on any anchor bucket, logs a warning and surfaces both numbers.

Rubric contribution

Owns sub-criteria 4.1–4.5 in Cat 4 (90 of 150 pts): Domain Power proxy (20), ref-domain count (20), velocity (20), anchor diversity (15), link quality/toxicity (15).

Real finding · Adirondack vs Green Ortho

Adirondack: Ahrefs DR 15, 115 referring domains, 341 total backlinks but 275 broken (80% link rot), DFS spam_score 21 (high — toxic), Wildfire 2:1 ratio violated (822 external vs 9,130 internal links). Cat 4 contribution: 26/90.
Green Ortho: DR 16, 179 ref-domains, only 1 broken backlink, spam_score 5 → 47/90. Same emit logic, much healthier profile.

Authorityseo-backlink-gap→ Cat 4.6 · 60 pts

Link prospects — domains that link to rivals but not us

Which referring domains link to the practice's rivals but not the practice — the acquisition pipeline of plausible link prospects already proven willing to link to a similar business?

Data sources

Ahrefs batch-analysis (primary) — one call returns ref-domain lists for practice + up to 3 rivals, aligned for diffing.
DataForSEO — backlinks_bulk_referring_domains for cross-validation.

Methodology

Resolves rivals from ground_truth.json or, failing that, from DFS google_competitors_domain. Zero rivals → fail fast.
Normalizes every ref-domain, builds sets per target, computes gap = union(rivals) – practice.
Scores each gap domain: acquisition_value = DR × rival_count × topical_relevance. Topical relevance: 2.0 for medical/dental, 1.7 for local-news (*-times.com, *.patch.com), 1.5 for chamber/rotary/community/.org, 1.0 otherwise.
Sorts, keeps top 30 for the deliverable. Buckets targets into medical / local-news / community / general.

Rubric contribution

Cat 4.6 Authority gap closure pipeline (60 pts): pipeline depth (30), topical relevance ratio (15), tier mix of DR50+ targets (15).

Status note

Sub-audit fully implemented but the e2e smoke harness hasn't driven it yet for either Adirondack or Green Ortho — envelope is wired and ingestion-verified; live findings pending the next full audit pass. The methodology and emit script are complete.

Authorityseo-competitor-auditCross-category evidence

Full-site profile of the primary rival

Build a full-site profile of one named rival — topical authority, brand-signal score, top traffic pages, blog cadence, schema — so the strategy can ground its competitor narrative in evidence.

Data sources

Search Atlas Site Explorer v2 (primary) — holistic SEO scores, brand signals, organic keywords, organic pages, position distribution, indexed pages.
Ahrefs — top-pages, DR history, traffic history.
DataForSEO Labs — google_competitors_domain for rival selection cross-validation.
FireCrawl — light crawl of rival homepage + top-5 pages for architecture / schema / blog signals.

Methodology

Auto-detects the rival if not specified: top DFS competitor with >50 keyword intersections and avg position <30 (skipping aggregators like Healthgrades, WebMD).
Pulls rival's SA holistic scores; pulls the same for the practice; computes deltas across topical authority, brand signal, top-10 page traffic, indexed-page count, blog cadence.
Extracts cities-served from each domain's top-200 organic keywords — surfaces "cities the rival ranks for that we don't" for market-expansion strategy.

Rubric contribution

Diagnostic only — does not own a sub-criterion. Emits rubric_contributions.evidence_only.rival_diagnostics which the composite scorer overlays onto Cat 1 and Cat 4 evidence. The skill explicitly will not invent a category score; it informs, doesn't grade.

Real finding · Adirondack vs Green Ortho

Adirondack: Top rival auto-detected as diamondbraces.com (61 keyword intersections, $261 ETV — strongest geographic match). Secondary rivals smileworksnyc.com (NYC) and coastlineorthodontics.com (FL) flagged as not true local rivals.
Green Ortho: Top rivals all surfaced as Atlanta-area practices — the audit raised a geography_note warning the intake address may be wrong (claimed NJ/Westwood but ranking like an Atlanta practice).

Authorityseo-link-graph-audit→ Cat 3.5 · 50 pts

Graph-theoretic internal linking (PageRank, click-depth, cohesion)

Treat the website as a directed graph of pages and links — then run the same algorithms Google uses to decide which pages are important.

Data sources

None external. All math runs locally in Python on crawl_pages.json. Sub-30 seconds on small sites, 1–3 minutes on a 300-page site.

The 5 algorithms — in plain English

Internal PageRank. Every page starts equal. On each iteration, a page hands a share of its authority to every page it links to. After ~100 passes the numbers stabilize — you have a ranking of which internal pages are most "trusted" by the site's own link structure. Top of the list should be homepage + Mother pillars. If not, the link graph is misallocating equity.
Click-depth. Walk outward from the homepage in breadth-first order: depth 1 = one click away, depth 2 = two clicks away. Anything at depth ≥4 is buried — Google re-crawls those less often and weights them lower.
Anchor distribution. For every target page, classify each incoming anchor (brand / exact-match / partial / generic / naked URL). Targets where exact-match >50% are over-optimization risks (Penguin territory). Where generic ("click here") >60% the anchor carries no semantic signal to Google.
Cluster cohesion. For each Mother service, find the Mother page and its Son pages. Ask: of all possible Son-to-Son pairs, what fraction actually cross-link? Score 0.0 (isolated silos) to 1.0 (every sister links to every sister). Healthy ortho clusters: 0.4–0.7.
PR concentration. What % of total PageRank do the top 5 pages hold? 40–60% is healthy. >70% means the site is so homepage-heavy nothing else can rank. <30% means equity is so diffuse no page is strong enough to win competitive keywords.

The audit then proposes the top 15 high-leverage link moves — specific source page → target page → suggested anchor — ranked by source_PR × (1/target_PR) × topical_relevance × priority_bonus. "This page has lots of authority to spare, this other page is starved despite being important, and they're topically related — add the link with this anchor."

Rubric contribution

Cat 3.5 Internal Linking (50 pts): orphan rate (12), click-depth (12), anchor distribution (8), cluster cohesion (10), PR concentration (8). Replaces the simpler "inbound link count" metric the older page-deep-dive skill used.

Status note

Sub-audit fully specified; emit script complete. Hasn't been driven end-to-end on either example practice yet. Expected envelope shape on a typical 124-page site: "542 edges, 2 orphans, avg depth 2.3, top 5 hold 52% of PR (healthy)."

Group 6 — AI Visibility & Tracking Setup2 audits

AI / LLMseo-llm-visibility→ Cat 6 · 50 pts

Brand mentions + share-of-voice across LLM responses

When real people (and AI engines on their behalf) ask LLMs questions in this practice's space, how often does the brand show up — and is it gaining or losing ground?

Data sources (triangulated)

Search Atlas Brand Radar (primary, 8 endpoints) — mentions overview + history, share-of-voice overview + history, cited-pages, cited-domains, impressions overview + history.
Ahrefs Brand Radar (cross-validation, 7 endpoints) — AI-response samples, mentions, SoV, cited-pages/domains.
DataForSEO LLM Mentions (third triangulation) — aggregate metrics, searchable mention list, top cited pages/domains.
DataForSEO SERP Live — AI Overview parsing on 5–10 focal queries (e.g., "best orthodontist in Albany", "Invisalign Albany").

Methodology

Pulls a 30-day window from all three providers in parallel (90 days if a recent rebrand).
Triangulates mention counts; if sources disagree by >50%, takes the median (no single-source scores).
Detects share-of-voice by comparing the practice to a resolved rival domain (from ground_truth.json or auto-detected via SA's organic competitors, filtering out aggregators).
Parses each focal SERP's ai_overview.references[] to count how many AI Overviews actually cite the practice's domain.
Classifies every cited URL by path (homepage vs service vs location vs about) so the rubric can reward depth, not just homepage hits.
Computes 30-day mention slope (last-7-day avg vs first-7-day avg) for trajectory.

Key concepts in plain English

Mention share = what % of LLM responses for queries in your space include your brand name at all. 50% means an LLM mentions you in half the answers it generates for in-category prompts.
Share of voice = of all brand mentions in those responses, what % are yours vs competitors'.
AI Overview citations = Google's AI Overview shows ~3–8 source links per answer; this counts how many of your focal queries cite your domain as a source. This is the AI-era equivalent of position 1.

Rubric contribution

Drives Cat 6 GEO in full — 50 pts: 6.1 Mention Presence (15), 6.2 SoV vs Rival (10), 6.3 AI Overview Citations (10), 6.4 Cited-Pages Quality (10), 6.5 Sentiment & Trajectory (5).

Real finding · Adirondack

Scored 28/50. 208 total mentions across ChatGPT / Gemini / Perplexity / Copilot / Google AI Mode / Grok over 30 days, sentiment 71.6 (positive). But visibility score down 11.7 points from 61.7 and mention volume down 41 (249 → 208) — trajectory negative. Per-platform visibility ranges from 40% (ChatGPT) to 80% (Copilot, Google AI Mode, Grok).

Trackingseo-tracking-auditEvidence only (hygiene)

STANCE compliance for SA tracking setup

Is the agency's Search Atlas tracking setup for this practice actually disciplined, or is it quietly burning credits on garbage data?

Data sources

Search Atlas only (brand-routed) — KRT projects + keyword details, Local SEO heatmap businesses + heatmaps, LLM Visibility projects + queries/topics, GBP citation submission history, OTTO engagement, GBP location detail.

Methodology

Detects the practice's vertical (orthodontics, dermatology, periodontics, chiropractic, etc.) and generates a candidate keyword set from the per-vertical YAML template — what should be tracked.
Pulls actual tracked state from Search Atlas across KRT, heatmaps, LLMV, citations, OTTO, GBP.
Diffs candidate vs actual: flags gaps (recommend add), excess (consider remove), matches (good).
Runs 10 anti-pattern checks against the STANCE rules (word-order over-tracking, near-me in KRT, mobile-desktop doubling, cross-city tracking, missing brand, high NR rate, radius-market mismatch, multi-specialty in one project, stale heatmaps, no citation history).
Builds a credit-costed action list (rough $/month per add or remove) so operators can weigh tradeoffs.

What "STANCE" is

Not an acronym — shorthand for the agency's source-of-truth doc "Agency Tracking-Setup Standards — Canonical Stance" (setup/research/STANCE.md), which codifies 10 numbered rules compiled from Whitespark, Sterling Sky, BrightLocal, Local Falcon, SearchAtlas docs, and a 41-project NEON/HIP audit. Headline rules: canonical word-order {service} in {city} {state}, near-me → heatmaps not KRT, one KRT project per location, mobile is source of truth, heatmap radius bounded by ~20-min drive-time.

Rubric contribution

Evidence-only — does NOT move the V5 score. STANCE compliance is methodology hygiene. Writes evidence rows into Cat 2 / Cat 6 / Cat 8 for internal review only. Never appears in client-facing strategy presentations.

Real finding · Adirondack

Vertical = orthodontics, 16 tracked kws vs a 36-kw candidate set (35 gap, 12 extras). 7 anti-patterns detected including 2 critical: word-order over-tracking ("orthodontist albany" × 3 variants — wastes $2.50/mo) and cross-city tracking (5 of 16 kws / 31% target Rochester or other non-served cities). Brand name "Adirondack Orthodontics" is completely untracked — no anchor for NR debugging. 2 of 4 expected heatmap businesses set up. Zero citation scans run.

Section 05 · OTTO triage (Wave 2)

OTTO will make suggestions. We validate every single one.

Search Atlas OTTO is a recommendation engine — useful but noisy. The plugin scores every OTTO rec 0–100 against the other 14 sub-audits' findings and bucketizes it before anything enters the action plan.

OTTO surfaces hundreds of "fix this" items per site. Without triage, we'd drown in 184 recommendations with no way to distinguish a true critical fix from "add a meta description to your privacy page." Worse: we'd deploy noise to clients. OTTO-overlay validates against everything else we know about the site before recommending action.

The 5 scoring dimensions (100 pts before penalties)

Dimension	Max	What it measures (and why this weight)
Corroboration	40	Most important signal. Does another independent sub-audit recommend the same fix on the same URL? +40 = technical-audit AND page-deep-dive both flagged this exact page for this exact issue. +20 = another audit flagged the same category of issue on a different URL. 0 = OTTO is alone. Three independent measurement systems agreeing → the fix is real.
Page priority	20	Homepage = +20. Pillar Mother page = +16. Top-10 inbound-link page = +12. Generic valid page = +4. A fix on a money page is worth more than the same fix on a buried page.
Specificity	15	Three +5 increments: has a real URL, has an actionable verb ("add" / "fix" / "remove"), has a measurable target ("compress to <100KB" beats "improve speed"). Vague recs get filtered.
V5 weakness	15	Pulls from v5_score.json — if the rec addresses a category currently scoring <50% of max, +15. 50–75%, +8. >75%, +2. Prioritize fixes in the categories where the practice is actually weak.
OTTO confidence	10	OTTO's own severity flag — critical +10, warning +6, info +2. Weighted lightest because it's vendor self-rating.

The 4 noise penalties (subtractive)

Penalty	Pts	Trigger
Low-value page target	−30	URL matches `/privacy/`, `/terms/`, `/sitemap/`, `/thank-you/`, `/404/`. OTTO will happily tell you to optimize the meta description of your privacy policy. No.
Conflict with another audit	−20	Another sub-audit explicitly disagrees. Example: OTTO says "expand content" but page-deep-dive SCHOLAR-L scored 9/10 — page is fine; OTTO is wrong.
Already fixed	−30	SCHOLAR criterion this rec addresses already scores ≥8/10 in the crawl. The fix was done; OTTO hasn't recrawled.
Stale OTTO data	−10	OTTO's `last_recrawl_at` is older than 30 days. Confidence decays with age.

Three output buckets

Keep · 70–100

Worth deploying

Strong corroboration + good page + specific. Flows into the action list and Gold Modules.

Review · 40–69

Human judgment

Maybe valid, maybe not. Strategist decides case-by-case. Goes into the review CSV.

Noise · <40

Drop, with reasons logged

Explicit reasons listed (e.g., "conflicts with content-quality finding X"). Suppressed but auditable.

What this looks like in practice — same issue, different bucket

KEEP · score 84

"Add meta description to /braces/"

Corroboration: +40 (technical-audit AND page-deep-dive both flagged this URL)
Page priority: +18 (pillar page)
Specificity: +11
V5 weakness: +8
OTTO confidence: +6
Penalties: 0

NOISE · score 28

"Add meta description to /privacy/"

Corroboration: 0 (no other audit flagged)
Page priority: +4
Specificity: +11
V5 weakness: +8
OTTO confidence: +6
Penalty: −30 low-value page target

Suppressed. Reason: low_value_page_target.

Why OTTO-overlay is evidence-only — doesn't move the V5 score

OTTO is a recommendation source, not a measurement. Scoring V5 with OTTO data would be circular — you'd be scoring the practice on the volume of OTTO's suggestions rather than the underlying signal. The Keep bucket flows into the Gold Modules priority list and the internal audit report's Section 10, but never into the score itself, and never into the client-facing strategy presentation.

Section 06 · Deep dive

Page titles and meta descriptions

The receipts on why the audit's recommendations land where they do.

This section exists because you sent a proposed OTTO prompt asking for 40–50 character titles with "Best [Service]" required and "Fast Braces" allowed. The audit's recommendations land differently. Here's why.

The TL;DR rules the audit enforces

Title: 50–58 characters, primary keyword forward, no superlatives on YMYL pages, no year, no "near me", state as 2-letter abbreviation, ≥80% token similarity to H1.
Description: 150–160 characters, primary keyword in first 90 chars, CTA in first 120 chars (mobile-safe), no phone numbers, no hype.

Why 50–58 chars — Tier 1 empirical

Zyppy ran a 2025 study of 80,959 titles across 2,370 sites measuring how often Google rewrites a title by length:

Title length	Google rewrite rate	Implication
< 50 chars	50–96%	Google replaces our title — usually with H1 + brand
51–55 chars	~40%	Lowest rewrite rate of any bucket
56–60 chars	~45%	Still in the low-rewrite zone
61–70 chars	~70%	Often truncated on desktop
> 70 chars	~100%	Always rewritten

Mobile SERP truncation cuts at ~500 px ≈ 50 chars. So 50–58 is the sweet spot — in Zyppy's lowest-rewrite bucket AND inside mobile display. The 40–50 cap in the original prompt would push most pages into the 50–96% rewrite bucket. Empirical loss of brand control, not a judgment call.

Why no superlatives — what Google actually says

Google Search Quality Rater Guidelines · §2.6.1 · Sept 11, 2025

"Trust is the most important member of the E-E-A-T family because untrustworthy pages have low E-E-A-T no matter how Experienced, Expert, or Authoritative they may seem."

Orthodontics is YMYL (Your Money or Your Life) by definition. YMYL pages get extra Trust weighting in raters' assessments.

What we infer (our professional judgment, named honestly):

Unsubstantiated superlatives ("Best", "#1", "Top-Rated") create Trust-debit signals — even though the QRG doesn't specifically call out superlatives. That's our read.
FTC §5 truth-in-advertising applies broadly. "Best orthodontist" is a comparative claim needing substantiation — no industry-standard ranking methodology supplies it.
State dental boards typically prohibit "false, misleading, or deceptive" advertising and require substantiation of comparative claims (see §7).

Honest caveats — what we cannot cite

The 2025 FTC max civil penalty under §5 is $53,088 per violation — but only after specific procedural triggers. NOT an automatic per-superlative fine.
There is no FTC enforcement action against an orthodontic practice for "Best Orthodontist" titles we can cite. OrthoAccel v. Propel (2016) was a private civil suit between manufacturers.
The Google QRG does NOT specifically call out superlatives. That linkage is our inference.

The risk is real but indirect. We present it as such — which is why the audit's recommendations are tiered, not absolute.

Section 07 · NEW in v3.14

State dental board awareness

When a YMYL violation fires, the audit names the relevant state regulator and the governing statute citation.

The audit knows the practice's state from ground_truth.address.state. For superlative or accelerated-treatment violations, the envelope now surfaces the relevant state dental board, the statute citation pattern, and a verification URL.

This is an awareness flag — not a legal opinion

We name the regulator + the statute + the verification URL. We don't interpret whether specific phrasing violates the current rule. That's between the client + their attorney + the dental board.

What's seeded

4 verified states

Full citations on file

California (BPC §651), Texas (22 TAC §108), Florida (FAC 64B5-13.0046 + FS 466), New York (NY Educ. Law §6509 + 8 NYCRR §29).

11 major-market states

Regulator + URL seeded

PA, IL, OH, GA, NC, MI, VA, AZ, MA, WA, NJ — flagged needs_verification until we confirm rule text.

All other states

Default fallback

Names "the state dental board for [state]" and points to a directory. No specific claims made about rule text.

Section 08 · NEW in v3.14

SERP context — when competitors normalize "best"

If the live SERP for the practice's primary keyword rewards superlatives, our default may be counter-productive. The audit pulls competitor titles and surfaces a recommendation bucket.

< 20%

Hold the line

SERP isn't rewarding the pattern. Keep no-superlatives default. Differentiate with a substantiable claim.

20–50%

Differentiate

Mixed signal. Differentiate with a substantiable claim (years in practice, specific tech, board cert, patient volume).

> 50%

Match or differentiate

SERP normalizes superlatives. Surface both paths: (a) match with signed client ack, or (b) differentiate with a substantiable claim competitors aren't making.

The audit does NOT pick for you

The bucket gets surfaced; the strategist makes the call with the client. If we pick "match with client ack," client_overrides.yaml records the exception so the audit history tracks it cleanly. The trail is what protects the agency if a regulator ever asks.

Section 09 · v3.15 Roadmap

Coming next — Apify NAP, SEMrush, Moz Pro

Adding three more data sources so the audit accounts for whatever tool a client points at our work.

Clients run their own audits with Claude using Ahrefs, SEMrush, or Moz Pro. When they come back with issues we didn't surface, that's a credibility problem. The fix is to use all of them — so our plan already accounts for whatever any tool they have could possibly raise.

Apify NAP citation audit

Today's citation-audit uses Search Atlas's citation scan. Apify gets us coverage on directories SA doesn't index, plus niche orthodontic-specific directories. Ground truth NAP becomes the truth set; Apify-scraped citations get diffed against it.

Status: Apify MCP already wired. Sub-audit skill planned for v3.15.

SEMrush API integration

Unique strengths in keyword-difficulty modeling and competitor traffic estimation. SEMrush feeds the median-of-N composite scorer for content-gap, real-rankings, and backlink-gap.

Status: Planned for v3.15. MCP config + token storage being finalized.

Moz Pro API integration

Domain Authority + Page Authority + Spam Score — metrics clients reference. Lets us run their report against ours and resolve discrepancies before they surface them.

Status: Planned for v3.15. Rate-limit modeling underway to keep audit under 45 min.

The strategic point

By v3.15 the audit uses 7 data sources in parallel: Search Atlas (×2 brands), Ahrefs, DataForSEO, Firecrawl, Apify, SEMrush, Moz Pro. Whatever tool a client points at our work, our plan already accounts for what that tool would surface. No more reactive audits where a client's SEMrush report blindsides us.

Section 10 · Quick reference

How to run an audit

Inputs, command, what to look at first when it finishes.

Inputs needed

• Practice domain (adirondackorthodontics.com)
• GBP location ID (or address)
• Primary keyword (orthodontist albany ny)
• Brand: HIP or NEON Canvas

Command

# From a Claude Code session
# in /seoaudit/:
/seo-audit-and-plan \
  adirondackorthodontics.com

Wall time

Typical: 30–45 min end-to-end. ~80% is parallel sub-audit dispatch (limited by API rate limits). Larger sites (200+ pages): 60–75 min.

What to read first

1. V5_AUDIT_REPORT.md — start at §11 (KPIs) and §4 (Gold Modules)
2. strategy-presentation.html — what the client sees
3. v5_score.json — composite scoring per-source
4. action_items.csv — what we'd queue

How our SEO audit actually works — under the hood.

How the audit actually runs

Wave 1 — Parallel batch

Wave 2 — OTTO Phase B

Wave 3 — Content roadmap

The V5 rubric — 1000 points, unpacked

Sub-criteria — how each category's points get distributed

How the final score maps to a strategy phase

What the rubric actually looks like in practice — Adirondack Orthodontics

The headline

Composite scoring — how 17 envelopes become 1 score

The merging rule: median-of-N

Three aggregation modes — selected automatically per category

median_of_full_scores

sum_of_partial_ranges

baseline_only

Transparency fields — every score shows its work

The 17 sub-audits — what each one actually does

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

SCHOLAR framework — 7 letters × 0–10 each, scaled to 0–100

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

Methodology

Rubric contribution

Data sources

The 5 algorithms — in plain English

Rubric contribution

Data sources (triangulated)

Methodology

Key concepts in plain English

Rubric contribution

Data sources

Methodology

What "STANCE" is

Rubric contribution

OTTO will make suggestions. We validate every single one.

The 5 scoring dimensions (100 pts before penalties)

The 4 noise penalties (subtractive)

Three output buckets

Worth deploying

Human judgment

Drop, with reasons logged

What this looks like in practice — same issue, different bucket

KEEP · score 84

How our SEO audit
actually works — under the hood.