Part 7 of 10

/debug — Systematic Debugging

ai cagents claude debug debugging troubleshooting

Created: March 27, 2026 | Modified: May 3, 2026

How /debug works under the hood

Unlike /run or /team, /debug doesn't have its own standalone pipeline. It routes through the cagents:debug agent type — a specialized subagent that Claude Code invokes via /run or /team when debugging is the task at hand. You can also trigger it directly with /debug as a skill shortcut. The four-phase framework described below reflects how the debug agent approaches problems, not a separate infrastructure from the rest of cAgents.

Some bugs are obvious. You read the error, find the line, fix it. /run handles those fine.

Then there are the bugs that aren't obvious. You've tried two things. Neither worked. The error message is useless or there isn't one. You're not sure if the problem is in your project, your configuration, a dependency, or something in the environment. You're starting to guess.

That's when you reach for /debug. It takes a systematic four-phase approach — root cause analysis, pattern recognition, hypothesis testing, and implementation — instead of trying fixes at random. It won't move to solutions until it understands the problem. I've had it catch issues I'd have spent hours on — a stale DNS record, a race condition in a queue worker — because it checks the things I'd skip when I'm convinced I already know the answer.

When to reach for /debug

Reach for /debug when:

You've tried at least one fix and it didn't work
There's no error message, or the error message isn't pointing to anything useful
The bug is intermittent — it happens sometimes but not always
You've changed something and can't figure out what broke
Multiple symptoms suggest the same underlying issue, but you can't identify it

Use /run instead for straightforward bugs where you already know roughly what's wrong. /debug is slower and more thorough — that's intentional. Don't reach for it first.

A good rule of thumb: if you can describe the fix you'd make, use /run. If you can only describe the symptom, use /debug.

How the four phases run

/debug works through four stages in sequence. It won't skip ahead to a fix until it's built a clear picture of the problem.

Phase 1 — Root Cause Investigation: Examines the system holistically. Reads logs, traces execution paths, maps what's connected to what. The goal is to understand the full failure surface before touching anything.

Phase 2 — Pattern Analysis: Looks for patterns in the failure. When does it happen? What conditions are present? What changed recently? Intermittent bugs often have identifiable triggers that aren't obvious at first.

Phase 3 — Hypothesis Testing: Forms a ranked list of likely causes and tests them systematically. This is where most manual debugging goes wrong — people test their first hypothesis and stop. /debug works through candidates until one is confirmed.

Phase 4 — Implementation: Once the root cause is confirmed, fixes it cleanly and verifies the fix actually resolves the original symptom.

Walkthrough — Debugging Email Delivery

The site launched, the optimization ran, and everything was looking good — until someone mentioned they'd filled out the contact form three times and never heard back.

You check the form. It submits without errors. The success message appears. But no emails are arriving. No server errors. No logs showing failures. The emails are just gone.

You try the obvious things: check the email configuration, verify the SMTP credentials, send a test email from the server. Everything looks fine. But the form still doesn't deliver.

Time for /debug.

/debug "Contact form submissions succeed in UI but no emails are delivered. No errors in logs. SMTP credentials verified. Started happening after DNS migration last week."

Phase 1 — Root cause investigation:

[debug] Analyzing failure surface...
  → Form handler: no exceptions thrown, submissions logged as "sent"
  → SMTP connection: establishing successfully, auth passing
  → Email service: no bounces, no delivery failures recorded
  → DNS records: checking SPF, DKIM, DMARC configuration...

[debug] Anomaly detected: SPF record references old mail server IP
  Old IP: 192.0.2.10 (pre-migration)
  Current IP: 203.0.113.45
  SPF record: "v=spf1 ip4:192.0.2.10 include:_spf.example.com ~all"

Phase 2 — Pattern analysis:

[debug] Failure pattern: 100% of outbound mail from new server
  SPF check result: SOFTFAIL (emails are sent but receiving servers may reject)
  DKIM: passing (not affected by migration)
  Timeline: DNS migration completed 2026-03-21, complaints started 2026-03-22
  Correlation: 100% match

Phase 3 — Hypothesis testing:

[debug] Primary hypothesis: SPF record not updated after IP migration
  → Test: send from old IP - would pass SPF check
  → Test: send from new IP - produces SOFTFAIL
  → Hypothesis confirmed. Emails are being silently discarded by receiving mail servers
    acting on SOFTFAIL SPF results.

Phase 4 — Implementation:

[debug] Updating SPF record...
  Old: "v=spf1 ip4:192.0.2.10 include:_spf.example.com ~all"
  New: "v=spf1 ip4:203.0.113.45 include:_spf.example.com ~all"

[debug] Verification: test email from contact form → delivered successfully
[debug] Fix confirmed. DNS propagation may take up to 24h for all receivers.

A silent deliverability failure caused by a DNS configuration detail that wasn't obvious anywhere in the stack. /run would have fixed the symptoms it could see — /debug traced the actual cause.

Walkthrough — Debugging Email Open Rates

The email newsletter had been performing consistently — 28% average open rate across the last six weeks. Then one campaign went out and the open rate came back at 17%. The next one: 16%. Something changed overnight, and it wasn't immediately clear what.

You check the obvious things: the subject lines seem fine, the send time was the same, the list size is normal. Nothing jumps out.

/debug "Email newsletter open rates dropped from ~28% to ~16-17% over the last two sends. No obvious changes to subject lines or send schedule. List size unchanged. Need to identify root cause."

Phase 1 — Root cause investigation:

[debug] Analyzing deliverability and engagement factors...
  → Subject line sentiment: neutral, consistent with previous sends
  → Send time: Tuesday 9am, same as prior high-performing sends
  → List hygiene: 2.1% bounce rate (up from 0.8% - flagged)
  → Spam complaint rate: 0.4% (above 0.3% threshold - flagged)
  → Sender reputation score: dropped from 91 to 74 in last 7 days

Phase 2 — Pattern analysis:

[debug] Investigating reputation drop...
  → Bounce spike: 47 hard bounces on campaign of 2,200 (2.1%)
  → Source: imported list segment from trade show (added 2026-03-18)
  → These addresses have not been validated or warmed
  → Spam complaints: correlate with same imported segment
  → Pattern: reputation damage from unvetted list import, not content changes

Phase 3 — Hypothesis testing:

[debug] Primary hypothesis: cold list import degraded sender reputation,
  triggering increased inbox filtering at major providers

  → Test: check deliverability to Gmail vs. other providers
    Gmail: 61% inbox placement (down from 94%)
    Outlook: 78% inbox placement (down from 89%)
  → Hypothesis confirmed. Inbox placement degraded across providers,
    consistent with reputation-based filtering.

Phase 4 — Implementation:

[debug] Remediation plan:
  1. Suppress the imported trade show segment immediately
  2. Run a re-engagement campaign to warm active subscribers and rebuild engagement signal
  3. Set up list validation for any future imports before sending
  4. Monitor sender score weekly for next 30 days

[debug] Immediate action: segment suppressed (412 addresses)
[debug] Next send should target core engaged list only until reputation recovers.

The open rate drop wasn't a content problem — it was a deliverability problem caused by a list import that looked routine. /debug traced it back to the source instead of optimizing subject lines that weren't the issue.

Walkthrough — Debugging Canvas Quiz Scores Not Syncing to Gradebook

Maya built a formative quiz in Canvas for Unit 2. Students take it, see their scores on the quiz page, but the scores aren't appearing in the Canvas gradebook. She's checked the assignment settings, verified the quiz is published, and confirmed the gradebook column exists. The scores are just not syncing. Some students show scores; most show blanks. Unit 1 quizzes synced fine using the same process.

She's tried republishing the quiz, manually refreshing the gradebook, and having a student resubmit. Nothing.

/debug "Canvas LMS quiz scores not syncing to gradebook. Quiz is published,
assignment column exists in gradebook. Some students show scores, most show
blanks. Started with the Unit 2 formative quiz. Unit 1 quizzes synced fine.
Already tried republishing and manual gradebook refresh."

Phase 1 — Root cause investigation:

[debug] Comparing Unit 1 and Unit 2 quiz configurations...
  → Unit 1 quiz: created from scratch, assignment group = "Formative Assessments"
  → Unit 2 quiz: duplicated from department template quiz
  → Duplication log: assignment group association not carried over
  → Unit 2 quiz currently assigned to group: "Unassigned"
  → Gradebook calculates grades per assignment group — "Unassigned" is excluded

Phase 2 — Pattern analysis:

[debug] Investigating partial sync pattern...
  → 7 students show scores — all submitted before Maya edited quiz settings
  → 18 students show blanks — all submitted after the settings edit
  → The post-duplication edit broke the gradebook association for new submissions
  → Pre-edit submissions had already synced before the association was lost
  → Pattern: template duplication + post-duplication edit = orphaned assignment group

Phase 3 — Hypothesis testing:

[debug] Primary hypothesis: quiz is in "Unassigned" group, which doesn't
  calculate grades
  → Test: checked another quiz duplicated from the same template — same issue,
    also sitting in "Unassigned"
  → Test: manually moved a test quiz from "Unassigned" to "Formative Assessments"
    — scores appeared in gradebook immediately
  → Hypothesis confirmed. Template duplication drops assignment group association.

Phase 4 — Implementation:

[debug] Reassigning Unit 2 quiz to "Formative Assessments" group...
[debug] Triggering gradebook resync for all 25 student submissions...
[debug] All scores now visible in gradebook.

[debug] Recommendation: after duplicating any quiz from a template, verify
  assignment group before publishing. Checklist:
  1. Open quiz settings → look for Assignment Group field
  2. Confirm it's not "Unassigned"
  3. Save and check gradebook column is active

An LMS configuration quirk that only surfaces when you duplicate from templates — the kind of thing that eats an evening of a teacher's time and makes them distrust their gradebook. The partial sync (some students showing scores) made it look intermittent, but the cause was completely deterministic once /debug compared the two quizzes.

Walkthrough — Debugging Friday Dinner Service Falling Behind

Every Friday, David's kitchen falls 15-20 minutes behind by 7:30pm. Tickets stack up, food comes out late, servers are apologizing, customers are waiting. He's watched the line, timed the stations, and can't figure out why — the same team handles Saturday volume just fine. He's tried adding an extra prep cook on Fridays. No improvement.

Five weeks of this pattern. Same staff, same menu, similar volume.

/debug "Friday dinner service consistently falls behind by 15-20 minutes
starting around 7:30pm. Saturday handles equal or higher volume without
issues. Same staff, same menu. Tried adjusting prep quantities and adding
an extra prep cook — no improvement. 5 weeks of this pattern."

Phase 1 — Root cause investigation:

[debug] Comparing Friday and Saturday service data...
  → Total covers: Friday avg 78, Saturday avg 82 — volume is not the issue
  → Ticket times: Friday avg 18 min after 7pm, Saturday avg 11 min — confirmed
  → Order composition: Friday has 40% more takeout orders than Saturday
  → Takeout spike: 6:30-7:15pm, coinciding with commuters picking up on way home
  → Saturday takeout: steady trickle, no spike

Phase 2 — Pattern analysis:

[debug] Analyzing takeout impact on kitchen throughput...
  → Takeout packaging time: 2-3 minutes per order (boxing, labeling, bagging)
  → Friday 6:30-7:15pm: ~18 takeout orders in 45 minutes
  → Packaging backlog consumes 36-54 minutes of cumulative kitchen bandwidth
  → Same line cooks handling packaging AND plating dine-in orders
  → By 7:30pm, the packaging debt has compounded into a 15-20 min delay
  → Saturday: lower takeout volume spreads packaging evenly, no bottleneck

Phase 3 — Hypothesis testing:

[debug] Hypothesis: takeout packaging is consuming kitchen bandwidth during
  the Friday commuter rush
  → Test: cross-referenced POS timestamps with ticket delay onset
  → Delay begins exactly when takeout orders exceed 4/hour — the tipping point
    where packaging time starts stealing from dine-in plating
  → Saturdays never exceed 3 takeout orders/hour in any window
  → Hypothesis confirmed. The bottleneck is packaging, not cooking.

Phase 4 — Implementation:

[debug] Recommended fix: dedicated takeout staging during Friday dinner
  → Reassign one server (already on shift) to a packaging station from 6-8pm
  → Server handles boxing, labeling, and bagging — kitchen cooks never touch it
  → Cost: zero additional labor (server reassigned from floor during lower
    dine-in period)
  → Revised Friday evening staffing allocation provided

[debug] Secondary: consider a 5-minute takeout time buffer on Friday orders
  placed through the app to smooth the 6:30pm spike.

A systems-thinking problem disguised as a kitchen problem. The root cause — takeout packaging consuming kitchen bandwidth — is invisible when you're standing on the line watching cooks. David was looking at cook speed. The issue was packaging time. The Friday-versus-Saturday comparison was the diagnostic move that cracked it, and it's exactly what systematic investigation does better than watching the kitchen and guessing.

Walkthrough — Debugging a Refund Spike After a Game Patch

Sadie released patch 1.1 for her narrative puzzle game. Three bug fixes and a quality-of-life feature: adjustable text speed. The bugs are fixed. The new feature works. Player reviews are positive. No crash reports.

But refund requests tripled — from ~2/day to ~7/day in the 48 hours after the patch. Something in the patch is causing players to quit, and nothing in the error logs explains it.

This isn't a technical bug. It's a product experience problem, and /debug handles those too.

/debug "Refund rate tripled after patch 1.1. Patch fixed 3 bugs and added
adjustable text speed. New feature works correctly. Reviews are positive.
No crash reports. Refund requests jumped from ~2/day to ~7/day in the 48
hours after the patch. Need to find what's causing players to refund."

Phase 1 — Root cause investigation:

[debug] Analyzing patch 1.1 diff for unintended side effects...
  → Bug fix 1 (save corruption): clean, isolated change
  → Bug fix 2 (audio stutter): clean, isolated change
  → Bug fix 3 (dialogue overlap): clean, isolated change
  → Text speed feature: new settings enum added (1=slow, 2=normal, 3=fast, 4=fastest)
  → Anomaly: default text speed value set to 4 (fastest), not 2 (normal)
  → Pre-patch default: hardcoded to "normal" — no settings enum existed

Phase 2 — Pattern analysis:

[debug] Correlating refunds with player behavior...
  → Refund requests: 92% from players who started the game AFTER the patch
  → Returning players (started pre-patch): near-zero refund rate
  → New players are experiencing the entire game at "fastest" text speed
  → Narrative beats, emotional pauses, dialogue pacing — all rushing past
  → The game's core experience is its storytelling. At 4x speed, it feels broken.
  → Players aren't refunding because of a bug. They're refunding because the
    game doesn't feel like it's supposed to.

Phase 3 — Hypothesis testing:

[debug] Checking the settings initialization in the patch commit...
  → File: config/defaults.json
  → Pre-patch: textSpeed not present (hardcoded to "normal" in engine)
  → Post-patch: textSpeed: 4  ← set to developer's personal preference
  → Sadie plays at fast speed. When she added the feature, the default
    inherited her own setting, not the baseline.
  → Hypothesis confirmed. Default value change silently degraded the new
    player experience.

Phase 4 — Implementation:

[debug] Fix: set default textSpeed to 2 (normal) in config/defaults.json
[debug] Hotfix patch 1.1.1 prepared
[debug] Added: one-time notification for existing players —
  "Text speed has been reset to Normal. You can adjust this in Settings."
[debug] Added: settings validation — default values now checked against
  a baseline config during build to prevent future default drift.

Everything "worked." The code was correct, the feature did what it was supposed to, and the test suite passed. But a default-value change silently degraded the core experience for every new player. This is the kind of bug that never shows up in error logs because it lives in the user experience, not the code. /debug found it by analyzing who was refunding (new players, not returning ones) rather than what was crashing (nothing).

The /optimize pass from earlier had already tuned the Steam page conversion rate — now the players arriving through that optimized funnel were hitting a broken first impression. Fixing the default restored the experience those optimizations were built around.

Walkthrough — Debugging a Seating Chart Spreadsheet That Keeps Breaking

Rosa has a spreadsheet-based seating chart she's used for 6 weddings without incident. It auto-generates table assignments based on guest groups, dietary restrictions, and the couple's preferences — including "keep apart" rules for feuding family members. For this wedding, every time she updates the guest list, the assignments shuffle randomly and the conflict-avoidance rules stop working. She's rebuilt the formulas twice. It keeps breaking.

Everyone who's had a spreadsheet betray them at the worst possible moment will recognize this feeling.

/debug "Seating chart spreadsheet breaks every time I update the guest list.
Table assignments shuffle randomly and the 'keep apart' rules stop working.
Spreadsheet uses VLOOKUP and conditional formatting. Has worked fine for my
last 6 weddings. This wedding has 83 guests across 10 tables."

Phase 1 — Root cause investigation:

[debug] Examining spreadsheet structure...
  → Guest list: column A (names), column B (group), column C (dietary), column D (table)
  → VLOOKUP references in the seating logic: based on row numbers, not named ranges
  → "Keep apart" rules: reference specific cells (e.g., "if A14 and A37 are at
    same table, flag") — hardcoded to row positions
  → When a guest is added or removed mid-list, all row numbers shift
  → Every downstream VLOOKUP and every conflict rule breaks on insertion

Phase 2 — Pattern analysis:

[debug] Why did this work for 6 previous weddings?
  → Previous weddings: 40-55 guests, guest lists finalized early
  → Guests were only added at the bottom of the list — row numbers never shifted
  → This wedding: 83 guests, list still changing weekly, insertions mid-list
  → The spreadsheet was never robust — it just never encountered mid-list edits
  → This is a latent bug that finally met its trigger condition

Phase 3 — Hypothesis testing:

[debug] Reproducing the failure...
  → Inserted "Test Guest" at row 47 (between two existing guests)
  → Result: 36 rows of VLOOKUP references shifted
  → "Keep apart" rule for rows 14 and 37 now points to wrong guests entirely
  → Conditional formatting shows no conflict — because it's checking the wrong cells
  → Hypothesis confirmed. Row-number-based references cannot survive insertion.

Phase 4 — Implementation:

[debug] Rebuilding seating logic with insertion-safe references...
  → Replaced row-number VLOOKUPs with named ranges tied to guest identifiers
  → "Keep apart" rules now use INDIRECT() with name-based lookups, not cell addresses
  → Added a validation formula: flags immediately if any "keep apart" pair
    ends up at the same table, regardless of row position
  → Tested: inserted 5 guests mid-list, removed 3 others — all assignments
    and conflict rules held

[debug] Recommendation: for future weddings, start with this structure
  from the beginning. The old spreadsheet will keep working for small,
  finalized guest lists — but it was always one mid-list edit away from this.

A spreadsheet bug. Not a server, not an API, not a codebase. The most universally relatable debug scenario in the series, because everyone has had a spreadsheet break for reasons they couldn't diagnose. The root cause — row-number references that silently break on insertion — is something most spreadsheet users never learn to avoid until it costs them. The fix is structural: named ranges and INDIRECT() references that survive the edits a real guest list goes through.

Walkthrough — Debugging Why Episode 3 Has 40% Fewer Downloads

Jordan's podcast launched strong. Episode 1: 2,400 downloads. Episode 2: 1,900. Episode 3: 1,600. New podcasts typically lose 20-25% between episodes 1 and 3. Jordan is losing 33%. The release schedule is consistent, the marketing effort hasn't changed, and there are no negative reviews. Something beyond normal attrition is happening.

The natural instinct is to blame the content — maybe Episode 3's topic didn't land. But the numbers tell a different story.

/debug "Download numbers dropping faster than expected. Ep 1: 2,400, Ep 2:
1,900, Ep 3: 1,600. That's 33% drop in 3 episodes, expected ~20-25% for
new shows. Release schedule consistent (biweekly). Marketing effort
consistent. No negative reviews. What's causing the accelerated drop-off?"

Phase 1 — Root cause investigation:

[debug] Analyzing downloads by distribution platform...
  → Apple Podcasts: 62% of total downloads (primary platform)
  → Spotify: 24% of total downloads
  → Other platforms: 14%
  → Episode 3 Apple downloads: dropped 44% vs. Episode 2
  → Episode 3 Spotify downloads: dropped only 18% vs. Episode 2
  → Platform-specific discrepancy — this points to distribution, not content

Phase 2 — Pattern analysis:

[debug] Investigating Apple Podcasts delivery...
  → RSS feed inspection: Episode 3 enclosure URL contains a 302 redirect
  → Podcast host migrated CDN providers between Episode 2 and Episode 3
  → Old CDN URLs redirect to new CDN — works in browsers, works in Spotify
  → Apple Podcasts crawler does not follow 302 redirects for enclosure URLs
  → Episodes 1-2: still served from old CDN cache — downloads work normally
  → Episode 3: never cached on old CDN, redirect is the only path — fails silently

Phase 3 — Hypothesis testing:

[debug] Confirming the redirect theory...
  → Direct request to Episode 3 URL: returns 302 → new CDN URL
  → Apple crawler behavior: treats 302 enclosure URLs as broken links
  → Episode appears in Apple search results but download fails for many users
  → Users see the episode, tap download, nothing happens — they move on
  → Spotify handles 302 redirects natively — explains the 18% vs. 44% gap
  → Hypothesis confirmed. CDN migration broke Apple Podcasts delivery.

Phase 4 — Implementation:

[debug] Updating RSS feed with direct URLs on new CDN (no redirects)...
  → Episode 3 enclosure URL: updated to direct new-CDN path
  → Episodes 1-2: also updated preemptively (old CDN cache will expire)
  → RSS feed resubmitted to Apple Podcasts for re-crawl

[debug] Verification: Apple Podcasts re-crawl requested. Downloads should
  normalize within 24-48 hours as the updated feed propagates.
[debug] Recommendation: after any CDN or hosting migration, verify enclosure
  URLs resolve without redirects. Test with Apple's feed validator.

The instinct was to question the content. The data said otherwise: a 44% Apple drop vs. 18% Spotify drop can't be a content problem — listeners on different platforms don't have different taste. The CDN migration created an invisible redirect that Apple's crawler won't follow, silently breaking delivery on the platform that accounts for 62% of downloads. Without the platform-by-platform breakdown in Phase 1, this would have looked like a content quality issue and Jordan would have spent weeks second-guessing Episode 3's topic instead of fixing a URL.

The earlier /optimize work on episode completion rate — restructuring exposition and adding cold opens — only matters if listeners can actually download the episodes. Infrastructure problems masquerading as audience problems are exactly what /debug is built to unmask.

How to invoke it

Unlike the other commands in this series, /debug doesn't have a separate flags table — it shares the invocation model with /run. When you type /debug "description of the problem", cAgents routes the task to the debug agent, which applies the four-phase approach described above.

The most useful thing you can do is front-load context in your prompt: what you've tried, when the issue started, what changed, and any logs or error messages. The debug agent uses all of that to narrow Phase 1 faster.

What to watch for

Give /debug everything you know upfront. Include what you've already tried, when the issue started, and any changes that might be related. The more context in the initial prompt, the faster the root cause investigation moves. A vague "it's broken" makes Phase 1 much slower than "it broke after the DNS migration on Tuesday."

If you have a strong suspicion about the cause, state it in the prompt: /debug "I think the SPF record wasn't updated after the DNS migration — emails are silently failing." The debug agent will test your hypothesis first and either confirm it or rule it out systematically — either way you get a definitive answer faster than an open-ended investigation.

Don't use /debug as your first move on every bug. It's slower and more thorough than /run — that thoroughness has a cost. Save it for when quick fixes have failed or the root cause is genuinely unclear. Most bugs don't need a four-phase investigation.

/debug investigates the system it has access to. If the root cause is in an external service, a third-party API, or infrastructure you haven't given it visibility into, Phase 1 will flag the boundary and tell you what it can't see. You'll need to supply that information manually or investigate that layer yourself.

When you're done

You have a four-phase debug record on disk: a brief that names the symptom and what changed, a transcript of the run, and — if you followed the prompts panel — a lessons-learned note that names the root cause, the diagnostic move that found it, and the canary signal that would catch this class of bug earlier next time. The next time a similar shape shows up on the same surface area, that note is what lets the next debug session reach for the comparison earlier.

Next: Part 8 — /review — Quality Review flips from "find the cause of a failure" to "grade the work against the original ask" — the validator role on the other side of the executor's output.

Part 7 of 10