Reviewing the Review: A Self-Evaluation Framework for Hardware Design Reviews

Over thirty-five years in hardware development, across companies ranging from medical devices to aerospace to consumer electronics, I have sat in or led somewhere north of a thousand design reviews. The teams have varied. The standards have varied. The technology has shifted dramatically. One observation has held: organizations spend an enormous amount of energy reviewing their designs and almost none reviewing the reviews themselves.

That has always struck me as a strange asymmetry. We design controlled experiments to verify component performance. We write detailed validation plans for firmware. We invest in modeling and simulation infrastructure for our circuits. And then, when it comes time to inspect the schematic, we reach for a process that is often informal, inconsistent, and rarely measured.

The design review is itself a system. It has inputs, processes, outputs, failure modes, and a measurable yield. We would never ship a power supply without specifying its operating envelope. The review process deserves the same engineering attention as the rest of the design.

"The design review is itself a system. It deserves the same engineering attention as the rest of the design."

What follows is a framework you can use to assess your team's current review process honestly. It is industry-agnostic. The principles apply equally to a medical infusion pump, an automotive ECU, a satellite payload, an industrial sensor, or a consumer wearable. The differences are in the specific failure modes you care about. Reviewing the review is the same job everywhere.

Some of this framework will probably make uncomfortable reading. Almost every review process I have ever observed (including more than a few I designed myself) has dimensions where the honest answer is "we are not doing this well." That is not a knock on the engineers doing the work. It is the natural result of running a complex cognitive process without ever stopping to engineer it.

Why review is hard, even when we are good at it

Before assessing your process, it helps to be honest about what kind of cognitive task review actually is. Decades of human factors research, across domains as varied as quality control, radar monitoring, surgery, and air traffic, all converge on the same uncomfortable conclusion. Design review is a near-perfect adversarial environment for the human mind.

Four findings out of that literature are worth keeping in the back of your head. They are stubborn enough that I have given them shorthand names. They show up enough in this article that the names will save us all a lot of paragraph real estate.

The Inspection Floor

A skilled inspector, given a population of mostly correct items with a small number of defects to find, will reliably miss between 20 and 30 percent of the defects on the first pass. This number is roughly stable across decades of replication, across industries, and across experience levels [1]. It is, in essence, the floor below which manual visual inspection cannot reliably perform. You do not push past the Inspection Floor by trying harder. You push past it by changing the task.

The Thirty-Minute Wall

Sustained attention to a monitoring task degrades by 10 to 15 percent within the first thirty minutes, and continues to decline thereafter [2]. The deterioration is invisible from the inside. The reviewer feels exactly as focused at minute 60 as they did at minute 5. (This is the "this is fine" stage, in the now-canonical sense, where the engineer politely tells themselves the building is not on fire.) Modern neuroimaging confirms what the original radar-watch studies suspected: vigilance is real cognitive work, expensive in a way the reviewer does not consciously perceive [3].

The Four-Item Cap

Adult working memory, the set of distinct items a person can hold in active attention at once, has been measured at roughly four [4]. Not seven. Four. A schematic page typically presents tens to hundreds of distinct relationships at the same time. The reviewer cannot literally hold all of them in mind. They are either externalizing them onto paper, or they are pruning silently and hoping they pruned the right ones.

The Comprehension Gap

The most cited model of situation awareness in human factors research breaks awareness into three levels: perceiving the elements in front of you, comprehending what they mean, and projecting how they will behave [5]. The first level is the easiest. The third is where engineering judgment lives. A reviewer operating at Level 1 ("yes, that is a capacitor") is technically reviewing. They are not yet doing the review you actually wanted them to do.

"You do not push past the Inspection Floor by trying harder. You push past it by changing the task."

These four constraints are non-negotiable. A review process can be excellently designed and still produce a 20 to 30 percent baseline miss rate per pass. A reviewer can be brilliant and well-rested and still see attention degrade after the first half hour. Every reviewer can hold four items at a time, no matter how senior. The question is not whether to fight these constraints. The question is whether to design the review process around them or pretend they do not apply to your team.

Line graph showing detection performance declining over time during sustained inspection sessions, with reviewer perception remaining high after the thirty-minute mark.

Not all review tasks are the same kind of work

It is also worth noticing that not all review tasks are the same kind of work.

Map the work onto two axes: the volume of instances of each task in a typical design, and the semantic difficulty of any one instance.

In the high-volume, low-difficulty corner sit the chores. Pin-by-pin verification of every IC against its datasheet pinout. Confirmation that every passive's part number matches the value annotated in the schematic. Application of derating tables to every component dissipating measurable power. DC-bias-corrected effective capacitance for every Class II MLCC on every rail. Bus-address-conflict scans across every I²C and CAN segment. UART, SPI, and bus-to-bus crossover verification at every connector and through every cable harness. Reset, strap, and boot-pin disposition for every IC on the board. Each instance is individually trivial. The population is large. The cost of missing any one of them is potentially severe. They are also the tasks where the four findings above hit hardest. They are exactly the conditions Mackworth's clock test was designed to study, dressed up in 0402 packages.

The low-volume, high-difficulty quadrant is the opposite environment. Architecture decisions across power, signal integrity, thermal, and EMC tradeoffs. Topology selection for a novel sensor front end. Multi-rail power sequencing strategy for a heterogeneous SoC. The choice of how to defend a long cable run against transients in a vehicle environment. Each of these is rare in a given design, requires deep judgment, and benefits enormously from human experience.

A third quadrant, low volume and low difficulty, covers the housekeeping items: sheet titles, revision blocks, silkscreen conventions. Important, easily standardized, and almost always handled fine.

The fourth quadrant, high volume and high difficulty, is the genuine review work. Most teams (rightly) concentrate their best people here, on the worst-case corner analyses, the interface compatibility verifications, the multi-board pinout mirroring exercises where judgment is irreducible.

Four-quadrant diagram mapping hardware design review tasks by difficulty and volume, highlighting the “Drudgery” quadrant as the highest-leverage area for process improvement.

The four quadrants have very different yield curves under the constraints described in the previous section. Architecture review benefits from a fresh, focused engineer applying decades of experience to a small number of decisions. Pin-by-pin verification gets worse the longer it goes on, regardless of who is doing it.

The most expensive escapes I have ever debugged trace back, almost without exception, to the high-volume, low-difficulty quadrant. A single missed strap pin. An MLCC whose effective capacitance was a third of the datasheet number. A divider whose worst-case ratio violated a comparator threshold. An I²C pull-up sized for two devices on a bus that ended up with seven. None of these failures were beyond the reviewing engineer's competence. All of them were below the floor of what sustained human attention can reliably catch.

"The work that costs the most when it escapes is, almost always, the work nobody wanted to do in the first place."

This is the central observation that should shape the framework that follows. The job of a review process is not to catch every error through heroic effort. It is to be designed such that the kind of errors humans are bad at catching are not the ones that have to be caught manually.

A self-evaluation framework: the six dimensions

If you sit down with the four findings above and an honest list of the kinds of escapes that actually keep happening on your projects, six dimensions tend to emerge. Not because they are exotic. Because they are the dimensions on which a review process rises or falls.

Each contains a small number of questions a team can sit down and answer honestly in twenty minutes. The output is not a score. It is a map of where the review process is structurally strong, and where it is implicitly relying on heroic effort to fill structural gaps.

Dimension 1: Structure

The review either is or is not a defined gate in the development flow. Most teams I have worked with believe they have a defined gate. Few of them actually do.

Is the review a named milestone with an owner, an entry checklist, and an exit criterion, or is it an event that happens "when there is time"?
Are the roles defined: author, moderator, reviewer, recorder? Software has codified those four roles in IEEE Standard 1028 for thirty years; the principles transfer cleanly to hardware [6].
Is review time explicitly budgeted in the project plan, or is it whatever time is left over after layout and BOM finalization?
Are findings logged in a way that survives the meeting? Where do they live a month later?
Is closure of every finding verified against the actual change in the schematic, or just trusted?

Dimension 2: Cognitive load

The dimension teams almost never examine, and the one where the largest yield gains tend to live.

How long are review sessions? The Thirty-Minute Wall is real. Rigor that requires holding focus for 90 straight minutes is rigor that will quietly degrade in the back half.
How many items does a reviewer hold simultaneously? The Four-Item Cap means everything beyond the active set has been externalized into a checklist, a notebook, or, more often, lost.
Are there breaks? Even brief interruptions restore vigilance to baseline. Modern follow-on work confirms what the original sustained-attention studies first showed [3].
Is feedback present during the review? Performance feedback, even informal, was the only intervention in the original vigilance studies that consistently reduced the decrement.
Is review done at the team's high-attention time of day, or whenever the calendar happens to allow?

Dimension 3: Coverage

Coverage is what most teams think they are tracking when they say they have a review process.

Coverage is what most teams think they are tracking when they say they have a review process.
Is every interface (every bus, every cable, every board-to-board connector) reviewed for crossover, voltage compatibility, and protocol conformity?
Is every passive value cross-checked against the part number actually selected, not the symbol value entered three months ago?
Are tolerance, derating, DC-bias, and worst-case-corner checks applied uniformly, or only to circuits the engineer remembers to flag?
Is what was not checked written down? In my experience, the discipline of explicitly recording "we did not verify X" is the single highest-leverage practice a team can adopt. It converts an unknown unknown into a known unknown, which is the only kind of risk that can be managed.

Dimension 4: Independence

A reviewer who is too close to the design is reviewing a different artifact than the one in front of them.

Is the schematic author also the primary reviewer? Software inspection literature, going back to the foundational structured-inspection work of the mid-1970s, has solid evidence that author-led review captures meaningfully fewer defects than independent inspection [7].
How many independent eyes touch the design before fab?
If reviewers are internal, how recently have they been steeped in adjacent designs that share assumptions? Confirmation bias travels in cohorts.
Is there an explicit "reviewer of last resort" with authority to block release, or does pressure flow against thorough review?

Dimension 5: Repeatability

Rigor that holds together under deadline pressure is the only kind that compounds across projects.

Does the same checklist run on every project, every revision?
Does rigor change under schedule pressure? Is anyone watching for that?
Is there a written record of what was reviewed, by whom, against what standard?
Could a new engineer on your team reproduce the review you ran on your last design without asking anyone what to do?

Dimension 6: Improvement

The dimension that compounds.

After every respin or field issue, is root cause traced back to a checklist item that was missing or unenforced?
Does the checklist evolve? The teams that consistently outperform their respin baseline share one habit: every escape becomes a new line on the list. Larry Hurst's now-classic Electronic Design Checklist, which has been circulating among working engineers since the late 1990s, captures the aspiration directly: "for each design error that occurs, add the appropriate item to the list" [8]. It is one of the highest-leverage practices in the discipline, and one of the least often institutionalized.
Are escaped errors shared across projects, or do they live in the head of the team that experienced them?
When a new error category emerges (a new IC family, a new bus protocol, a new packaging technology), is there a path for it to enter the review process other than someone happening to remember?

Radar chart comparing typical and high-functioning hardware review processes across six dimensions including structure, cognitive load, coverage, independence, repeatability, and improvement.

Walking through these six dimensions is uncomfortable in proportion to how seriously a team takes it. The discomfort is itself information. The dimensions where the answer comes back "we don't do this consistently" are the dimensions where the review process is silently relying on heroics to substitute for engineering.

What "good" actually looks like

The most encouraging thing about all of this is that other industries have already worked out how to do better, in domains that were once vastly more dangerous than hardware engineering.

In medicine, a 2006 study deployed a five-item checklist for central-line insertion across 103 ICUs in Michigan. Within three months, the median rate of catheter-related bloodstream infections dropped from 2.7 per 1,000 catheter-days to zero. Eighteen months in, the program had reportedly saved more than 1,500 lives and on the order of $100 million in the state of Michigan alone [9]. The intervention was not a new drug or a new device. It was a process redesign that respected human cognitive limits.

In aviation, the modern pre-flight checklist was introduced in 1935 by the U.S. Army Air Corps after the prototype Boeing Model 299 crashed during a demonstration with one of the most decorated test pilots of the era at the controls. Investigators concluded the airplane was, in the famous phrase of the day, "too much airplane for one man to fly." The fix was not a more talented pilot. It was a checklist. The procedural infrastructure that grew from that incident has been the foundation of safety improvements that turned commercial flight from a frequently fatal activity into the safest mode of mass transportation in history [10].

In software, the foundational 1976 work on structured code inspections at IBM demonstrated 80 to 90 percent defect detection rates in early industrial trials, an improvement that drove IEEE Standard 1028 and helped define the modern QA discipline [7] [6]. The "Swiss cheese" model of safety, formalized in 1990, gives the abstract argument for why this kind of layered approach works. No single layer of defense is reliable, but multiple imperfect layers, with the holes in different places, drastically reduce the probability of an error reaching the customer [11].

Layered “Swiss cheese” review model illustrating how gaps across ERC/DRC, peer review, simulation, signoff, and fabrication review can allow defects to reach manufacturing.

The pattern across these domains is consistent. High-functioning review systems share six characteristics:

The work is decomposed into items small enough that working memory is not the binding constraint.
The work is externalized into checklists and forms so that none of it depends on remembering.
The work is structured with named roles, defined timing, and explicit closure.
The work is measured (yield, escapes, time per item), so improvement is grounded in evidence rather than vibes.
The work is adaptive. Every escape becomes a new item the next time around.
And critically, the work that does not require human judgment is moved off the human as far as is feasible.

That last point is the one most relevant to hardware engineers right now. Aviation moved its routine pre-flight checks onto pilots' formal procedures, then progressively automated the most error-prone parts (autoland, TCAS, GPWS). Medicine moved infection control out of memory and into structured five-item checklists with stop-the-line authority. Software moved syntax checking, style enforcement, and increasingly large categories of behavior verification out of human hands entirely, and reserved its senior engineers for the work where judgment is irreducible.

Hardware design review has not yet had this transition. The design review of a modern complex board still asks a single engineer (often the design's own author) to do, in their head, a job that has the structural profile of a thousand-row spreadsheet of cross-references performed under deteriorating attention. The teams that get the best results from that arrangement have, often unconsciously, externalized as much of it as possible into discipline and process. Even those teams have a ceiling. The industry's stubbornly stable respin rate, around 2.9 per project at an industry-average $44,000 per spin and 8.5 days of slip [12], is what that ceiling looks like in aggregate.

That ceiling is not a failure of engineers. It is a failure of the EDA toolchain to keep up with the complexity engineers are being asked to handle. We have spent thirty years adding features to our schematic-capture tools. We have not, in any serious way, asked those tools to take responsibility for the work the human mind cannot reliably do. That work has remained at the engineer's desk, with a checklist and a deadline, since approximately the Reagan administration.

Closing reflection

After enough reviews, you start to notice something uncomfortable. The work that costs your team the most when it escapes is, almost always, the work nobody wanted to do in the first place.

The architecture decisions, the topology innovations, the elegant simplifications, the moment you realize two seemingly unrelated subsystems can share a resource. Those are why most of us chose this work. They are also the parts of review where human judgment is irreplaceable.

The pin-by-pin checks, the bias-curve corrections, the spec compatibility matrices, the interface crossover verifications. None of us became engineers to do those. They are the chores. And precisely because they are the chores, they are also the work least likely to receive our best efforts and our highest attention.

"The work that wears down the human mind fastest is the work the human mind has to be most reliable at. The two are inversely correlated. They have been for as long as anyone has been measuring."

That inverse correlation is the structural fact behind everything above. The work that wears down the human mind fastest is the work the human mind has to be most reliable at. The two are inversely correlated. They have been for as long as anyone has been measuring.

If reading this framework has felt like a long list of ways your team is falling short, that is not the message. The constraints described here are not a knock on your engineers. They are the inescapable physics of human attention applied to a problem the EDA industry has been content to leave at the engineer's desk for thirty years. The schematic-capture tooling we use today is barely distinguishable from the tooling I was using in 1995. The work has gotten exponentially more complex. The toolchain, in this specific layer, has not.

That is changing now, faster than the software-QA transition took. The teams that rebuild their review process around what humans actually do well, and that let the toolchain take responsibility for what humans do poorly, will not just keep up with their competitors. They will pull ahead of them. That is not a five-year forecast. That is what is happening this year, on your competitors' boards, while you are reading this.

If you would prefer to start with something more concrete than a framework, I maintain my own working pre-fab design-review checklist, grown the way Hurst’s was: one line at a time, every escape adding a new item. It is the artifact the framework above points at, refined across the projects, respins, and escapes behind this article. If your team would find it useful before your next fab release, you can request a copy here:

Send me the checklist

If any of this resonates and you would like to compare notes, send me a message at scott@cadstrom.io. The conversations I find most useful are the ones that happen between two engineers, in the same kind of war room they grew up in, comparing honest answers to questions they had not yet thought to ask out loud.

References

Drury, C. G. and Fox, J. G. (eds.) (1975). Human Reliability in Quality Control. Taylor & Francis, London. The 20 to 30 percent miss-rate finding has been replicated across decades of inspection research; see also See, J. E. (2015), "Visual Inspection Reliability for Precision Manufactured Parts," Human Factors 57(8), and the U.S. Department of Energy review at osti.gov/biblio/1476816.
Mackworth, N. H. (1948). "The Breakdown of Vigilance During Prolonged Visual Search." Quarterly Journal of Experimental Psychology 1, 6-21. The original "clock test" paradigm remains the foundational experimental design for sustained-attention research.
Warm, J. S., Parasuraman, R., and Matthews, G. (2008). "Vigilance Requires Hard Mental Work and Is Stressful." Human Factors 50(3), 433-441. Established that vigilance tasks are cognitively expensive, contradicting earlier under-stimulation theories.
Cowan, N. (2001). "The Magical Number 4 in Short-Term Memory: A Reconsideration of Mental Storage Capacity." Behavioral and Brain Sciences 24(1), 87-114. The four-item estimate has been refined in subsequent work but remains the modern consensus, well below G. A. Miller's earlier seven-plus-or-minus-two figure.
Endsley, M. R. (1995). "Toward a Theory of Situation Awareness in Dynamic Systems." Human Factors 37(1), 32-64. Among the most-cited papers ever published in the journal; defines the perception, comprehension, and projection levels of awareness used throughout human factors research.
IEEE Standard 1028-2008. IEEE Standard for Software Reviews and Audits. IEEE Computer Society. Defines five review types (management, technical, inspection, walk-through, audit) and the procedural roles that have proven necessary for each.
Fagan, M. E. (1976). "Design and Code Inspections to Reduce Errors in Program Development." IBM Systems Journal 15(3), 182-211. The foundational paper on structured inspection. Defect detection rates of 80 to 90 percent have been reported in subsequent industrial use.
Hurst, L. (2002). Electronic Design Checklist Rev. 2. Originally compiled at Analog Devices and circulated widely; approximately 200 schematic, layout, assembly, and process items, with the explicit instruction to grow the list with each escaped error.
Pronovost, P. et al. (2006). "An Intervention to Decrease Catheter-Related Bloodstream Infections in the ICU." New England Journal of Medicine 355(26), 2725-2732. The Michigan Keystone study reduced median CLABSI rates from 2.7 per 1,000 catheter-days to zero within three months across 103 participating ICUs.
Gawande, A. (2009). The Checklist Manifesto: How to Get Things Right. Metropolitan Books / Henry Holt, New York. The synthesis of how aviation, surgery, and construction have used structured checklists to manage cognitive load in high-stakes environments.
Reason, J. (1990). Human Error. Cambridge University Press. The "Swiss cheese" model of layered defenses, distinguishing active failures from latent organizational conditions, has become the standard framework for analyzing how complex-system errors actually propagate.
Lifecycle Insights (2018). PCB design project survey, widely cited across the EDA industry. Reports an industry average of 2.9 respins per project, $44,000 per respin, and 8.5 days of schedule slip per iteration; see for example I-Connect007, "Cutting Respins: Journey to the Single-spin PCB," and Cadence Community, "Avoiding PCB Respins with Better Computational Software."

Reviewing the Review: A Self-Evaluation Framework for Hardware Design Review