We measure everything in healthcare… except what matters

In a recent article in BMJ Digital Health & AI, Graham Walker, MD makes an appeal that feels obvious only in hindsight: before we can seriously talk about “what good looks like” in healthcare, we need more context. Not better slogans. Not more dashboards. Context. The kind of context that captures nuance, environment, and the messy reality of clinical care rather than a sanitized after-action report. Dr. Walker posits that this context must chiefly come from clinicians.

Healthcare’s quality problem isn’t just that we measure the wrong things. It’s that we’ve designed an entire system around context-free abstractions (think codes, notes, and timestamps) and then act surprised when those abstractions fail to describe reality. Digital health didn’t create this problem; it exposed it.

When “good” becomes a proxy instead of a reality

Walker’s argument lands because it quietly challenges a long-standing assumption: that we can define quality by counting artifacts rather than understanding care. In digital health, this is reflected in adoption metrics, clicks, and workflow completion. In quality measurement, it shows up as claims data, readmissions, and process compliance.

All of these are proxies. And proxies are dangerous when they become the thing itself. We say we want “better outcomes,” but rarely agree on whose outcomes, over what time horizon, or under what circumstances. Survival? Function? Experience? Equity? Cost? The answer changes depending on whether you’re a clinician, CFO, patient, or regulator. This ambiguity isn’t theoretical. It has real consequences.

Choosing Wisely: A case study in context collapse

The Choosing Wisely campaign was one of the most earnest attempts to define “good” by exclusion. If we couldn’t agree on what excellent care looked like, perhaps we could at least agree on what care didn’t help patients.

The logic was sound. Specialty societies identified low-value tests and procedures. Clinicians were encouraged to discuss necessity rather than reflexively ordering more. The campaign leaned on professionalism, evidence, and trust. And yet, the impact was modest. At a macro level, utilization patterns barely moved. Many clinicians disagreed with the recommendations in real-world scenarios. Others worried about liability, patient expectations, or local norms. Over time, the campaign lost momentum.

Here’s the uncomfortable lesson: even when presented with evidence, clinicians struggled to act without context. The lists could not account for nuance: the patient in front of them, the setting, the gray zones where guidelines blur into judgment. We tried to define “low value” without acknowledging how deeply medicine depends on situational awareness.

Context is not optional: It’s the substrate of clinical judgment

This is where Walker’s argument becomes far more powerful when paired with a parallel conversation happening in AI: models fail when context is stripped away.

In my last blog post, I agreed with the contention of Mitesh Patel, MD that healthcare AI is fundamentally data-starved, not because we lack volume, but because we lack richness. We ask AI to behave like a clinician while feeding it data that clinicians themselves don’t fully trust. A progress note doesn’t show how a patient breathed while talking. A diagnosis code doesn’t show hesitation, gait, or affect.

That’s why video matters. Video adds context, color commentary, temporal flow: the “how” and “why” behind the “what.” It’s the same reason self-driving cars didn’t advance through better algorithms alone, but through massive volumes of real-world video.

Healthcare has made the opposite bet: we strip context out, then wonder why our definitions of quality feel hollow. The same mistake shows up in quality measurement. We collapse complex care into static fields and expect meaning to survive the compression. It doesn’t.

If AI can’t learn without context, neither can health systems

The parallel is striking. AI struggles when trained on decontextualized data. So do health systems. When we define “good” care using only what’s billable, countable, or auditable, we eliminate precisely the information that clinicians rely on to make decisions. Tone. Timing. Environment. Sequence. Human response.

Walker is right: without context, “what good looks like” becomes an academic exercise. Choosing Wisely struggled because it tried to impose context-free rules on context-dependent practice. This isn’t a failure of evidence. It’s a failure of design.

What healthcare leaders should take from this (no, it’s not “buy more AI”)

 First, stop pretending quality is objective without context. Outcomes divorced from circumstances are meaningless. Measurement strategies must explicitly acknowledge variability rather than pretending it can be engineered away.

Second, treat context as data, not noise. That applies both to AI and to quality improvement. Video, longitudinal observation, patient-reported experience, and clinician narrative all matter, even when they resist clean scoring.

Third, design systems that preserve nuance rather than erase it. Decision support, quality programs, and AI tools should augment judgment, not replace it with brittle rules.

Fourth, align incentives with wisdom, not volume. As long as restraint feels risky and action (e.g., ordering that lab or performing that surgery) feels safe, no campaign—Choosing Wisely included—will meaningfully change behavior.

Finally, be honest about what you’re optimizing for. Cost reduction, safety, equity, experience: these are not interchangeable goals. Saying “quality” without defining the context is managerial theater.

The real takeaway                                

Dr. Walker’s call for context isn’t just a digital health critique. It’s a broader indictment of how U.S. healthcare defines success. We have spent decades trying to engineer “good” out of fragments while ignoring the lived reality of care. Choosing Wisely wasn’t as successful as expected because physicians are stubborn or evidence doesn’t matter; it wasn’t as successful because medicine cannot be reduced to checklists without losing its soul.

If we want better outcomes from clinicians, from AI, and from health systems, we need to stop stripping away the very context that makes judgment possible. Good care is contextual, and it always has been. The sooner our measurement systems catch up, the sooner we can stop arguing about what “good” looks like and start designing for it.

Topics: featured, ai

Module heading text

Get the highest quality chemistry and microbiology testing services aligned closely with current good manufacturing practices (CGMP) for all types of products across all phases of development.

Subscribe to receive blog updates