Skip to main content

Link audit

Every published page and navigation menu can have internal /path references — hero CTAs, nav links, body links inside rich-text blocks, footer columns. When slugs change, those references break silently and the deploy ships them.

The SpiderPublish extension surfaces broken internal links before the push fans out, with inline diagnostics at the exact JSON line and a one-click "Accept redirect" Code Action that appends the audit's proposed redirect to redirects.json.


Two ways the audit fires

1. Pre-push hook (automatic)

Phase 0 of SpiderPublish: Push Content runs the audit before any dry_run fan-out. If broken links are found, a 3-outcome modal blocks the push:

OutcomeEffect
ReviewFocus the Problems pane (showing all findings) and abort the push. You fix the source slugs or accept the proposed redirects, then re-run Push.
Push anywayLog the broken-link count loudly to the output channel, clear stale audit diagnostics, continue with the push. The escape hatch — for cases you've already decided are intentional (e.g. a redirect about to be wired up).
Cancel (X or Esc)Full teardown. No push. Audit findings remain visible in the Problems pane.

The modal only fires when broken > 0. A clean audit is silent — push continues to the dry-run fan-out without any prompt.

Failure mode: if the audit endpoint itself errors (network blip, 500), Phase 0 logs the error to the output channel and continues — the audit is non-fatal. The dry-run / confirm flow stays as a safety net even when the audit can't run.

Standalone run from the command palette. Same audit, same outputs — just no modal, no push. Useful for:

  • Periodic checks during long edit sessions
  • Pre-commit hygiene before opening a PR
  • After bulk slug changes, to see what broke

The manual command also clears any prior audit diagnostics first, so you always see a fresh state.


What gets audited

Server-side SQL walk over:

  • Every published page's blocks tree (pages.blocks[*], recursively inside columns, sections, etc.)
  • Every navigation menu (header, footer, mobile, custom locations)
  • Every post's body and meta (where applicable)

Skipped:

  • Draft pages (drafts can have any links — only published and staged count)
  • External URLs (http://, https://, mailto:, tel:)
  • Hash-only fragments (#section-id)

Walk time: ~50–200 ms for a 50-page site. Idempotent — read-only on the server side.


Diagnostics format

For each broken link, the extension emits a vscode.Diagnostic into the spiderpublish.audit DiagnosticCollection (separate from spiderpublish which holds Push/Deploy validation errors — so an audit re-run clears its own findings without wiping still-valid push errors).

Severity:   Warning
File: <workspace>/pages/home.json
Range: line 47, col 14 → line 47, col 38
Message: "Broken internal link: /en/old-pricing → 404"
Source: spiderpublish.audit
Code: broken-internal-link

The line/col come from the source string the audit endpoint returns — something like page:home/blocks[2].cta_primary.url. The extension decodes that JSONPath through the same jsonc-parser AST walk that powers Push diagnostics, so the squiggle lands on the actual "url": "/en/old-pricing" line.

Navigation findings log to the output channel instead of inline diagnostics, because navigation isn't registry-tracked in 0.1.x — there's no navigation/*.json file on disk to attach a diagnostic to. The output channel entry includes the audit's source string so you can find the menu entry in the dashboard manually. Fix queued for 0.2.0 (navigation registry type).


"Accept redirect" Code Action

For each broken-link diagnostic, the audit also returns a proposed redirect:

{
"from_path": "/en/old-pricing",
"to_path": "/pricing",
"status_code": 301
}

The proposal is heuristic — based on slug similarity, navigation neighborhood, and historical 301s. It's right most of the time but worth a glance.

The Code Action fires as a yellow lightbulb on the diagnostic line:

SpiderPublish: Accept redirect /en/old-pricing → /pricing

Clicking it appends the redirect to your workspace's top-level redirects.json file. The append is idempotent — if a redirect with the same from_path already exists, the Code Action is a no-op.

redirects.json shape:

[
{
"from_path": "/en/old-pricing",
"to_path": "/pricing",
"status_code": 301
},
{
"from_path": "/blog/old-post",
"to_path": "/blog/new-post",
"status_code": 301
}
]

After accepting, the file shows up as a changed entry in the Changes view — you stage and push it like any other record.


Wave 0.1.x limitations

These are tracked for 0.2.0:

  • No bulk Accept Redirect — Quick Fix is per-diagnostic. Right now, accepting 20 proposed redirects is 20 lightbulb clicks. (Workaround: bulk-edit redirects.json directly.)
  • No "Mark as intentional" — sometimes a /old-link is a known-broken reference you want to silence without adding a redirect. 0.1 has no way to acknowledge a finding without acting on it. (Workaround: add the Code Action's proposed redirect manually as to_path: <same as from_path> to suppress.)
  • Navigation findings are output-channel only (see Diagnostics section).

Why this lives in the CMS, not in a crawler

Most CMSes leave link auditing to a third-party crawler that polls the live site post-deploy:

  • 5+ min latency between push and finding broken links
  • No source-position info — crawler sees /old-pricing 404 but not pages/home.json:47 references /old-pricing
  • Can't run pre-deploy — by the time the crawler sees the issue, your users already saw the 404

SpiderPublish surfaces the data the CMS already has — server-side, pre-deploy, exact JSONPath. ~200 ms server-side SQL walk vs ~5 min third-party crawl.

The same audit runs from the CLI (spideriq content audit-links) and from the MCP tool surface (content_audit_links) — so AI agents in chat get the same guarantees the extension's pre-push hook gives you.