Step 4: Root Cause AnalysisIntermediate5 min read

Step 4: Root Cause Analysis - What Good Looks Like

By Art Smalley

Setting the Context

Root cause analysis quality depends less on the tool and more on the thinking standard behind it. After years of reviewing hundreds of RCAs across manufacturing, healthcare, and service operations, I've come to rely on one simple guide: AQD — Analytical, Quantitative, Detailed.

These three traits define disciplined reasoning. When one is missing, logic weakens, data lose meaning, and the analysis collapses.


A – Analytical

Meaning: Structure cause → effect so each link is necessary, testable, and reversible (“why / therefore”).

  • 5 Why (qualitative, single variable):
    A proper 5 Why converges down one verified causal chain. Each “why” must produce a defensible “therefore.”
    When both directions work, the reasoning is sound. A 5 Why that branches sideways into vague organizational issues (“lack of training,” “poor management”) isn’t convergent thinking — it’s what I call the “5 Who’s.”

  • Fishbone Diagram (qualitative, multiple variables):
    The categories are only scaffolding. A real Fishbone organizes layered cause-and-effect logic that can later be confirmed.
    When teams simply brainstorm and stick Post-its without verifying relationships, they’re not building a Fishbone — they’re building a Wishbone.

  • Quantitative Tools (Control Charts / DOE / Regression):
    Analytical strength appears in how the data structure mirrors process structure — rational subgrouping, clear factor definitions, valid interaction logic.

Common failures

  • 5 Why trees that drift sideways and end in system or people blame.
  • Fishbones created from free-association sessions with no causal reasoning.

Q – Quantitative

Meaning: Use measurement only where it helps establish or disconfirm cause and effect.
Quantification serves the logic; it doesn’t replace it.

  • 5 Why / Fishbone:
    These are logic tools. Measurement enters only when a link in the chain requires confirmation — for example, verifying a physical tolerance or a timing dependency.

  • Control Chart:
    Indicates when instability exists, never why. Quantitative quality here lies in proper control limits, rational subgrouping, and stratified interpretation — not in claiming causal proof.

  • DOE / Regression:
    Quantification becomes the method itself. The key mistakes are measuring too many things or waiting too long to run the DOE. It takes more effort up front but pays dividends later in clarity and confidence.

Common failures

  • Collecting every possible metric “just in case.”
  • Delaying a designed experiment until frustration forces shortcuts.

D – Detailed

Meaning: Work at the right level of resolution for the process you’re analyzing.
You can’t solve what you haven’t seen precisely — and you can’t think more finely than you can measure.

Manufacturing Examples

  • Human work often requires observation at the work-step level (seconds), and sometimes deeper — the sub-motions or Therbligs that reveal waste and ergonomic strain.
  • In machining or precision assembly, if the specification is ± 0.020 mm, true understanding may demand analysis at 1–2 µm using special gauges, R&R studies, and environmental control.
    The human sense isn’t accurate enough; instruments extend perception.

Healthcare Analogues

  • Medication Errors: “Nurse gave wrong dose” is too coarse. The real mechanism might be label layout differences under low light at 5 a.m. The detail level shifts from the person to the interface.
  • Surgical Delays: “Missing instrument” is surface noise. At detail, the mismatch lies between count-sheet terminology and autoclave identifiers. The equivalent of a micron is a field name.
  • Emergency Department Flow: Instead of saying “understaffed,” measure the seconds per handoff between triage and first exam; that’s the operational resolution where causes live.

Service / Office Analogues

  • Billing Error: “Data entry mistake” is vague. Detailed means tracing the interface behavior — a dropdown auto-fills the wrong ID when tabbing quickly.
  • Claim Delay: The “microns” are system latency and rule-mismatch seconds per transaction.
  • Call Center Escalations: The issue isn’t “training”; it’s a 4-second CRM lag that drives customer frustration. Measure that, not motivation.

Common failures

  • Analyzing too high to see the mechanism.
  • Accepting approximation as fact.

AQD as a Checklist for Quality

Trait What Good Looks Like Typical Mistake
Analytical Single, convergent 5 Why chain; multi-level Fishbone with logic links “5 Who’s,” “Wishbones,” or broad blame statements
Quantitative Data collected only where it tests cause; DOE used proactively Measuring everything or delaying the DOE
Detailed Resolution matched to mechanism — seconds, microns, or clicks Vague, high-level conclusions or unaudited measurement

Why AQD Matters

The Flying Shaft story from my earlier article remains the cautionary example. That 5 Why looked clean on paper but failed all three elements of AQD:

  • Not Analytical: Skipped necessary conditions.
  • Not Quantitative: No verification or data.
  • Not Detailed: Stopped at the assembly level; the real cause was metallurgical.

AQD prevents that kind of elegant-looking error.


Closing Reflection

Good root cause analysis rarely looks dramatic. It looks sound.
A correct 5 Why ends in a verifiable physical or procedural mechanism — not “training” or “management.”
A correct Fishbone evolves into tested hypotheses — not sticky-note mosaics.
A correct DOE starts early and ends with proof.

Across manufacturing, healthcare, and services, the pattern of excellence is the same:
Analytical, Quantitative, Detailed.
That’s what good looks like. Everything else is just paperwork.

© 2025 Art Smalley | a3thinking.com