If you want AI to think like a clinician, show it what clinicians see

When Mitesh Patel, MD, argued that self-driving cars didn’t truly accelerate (pun intended) until Tesla began feeding its algorithms massive volumes of real-world video, he wasn’t just offering a clever analogy; he was diagnosing healthcare’s AI problem. Dr. Patel’s central point is that transformative AI doesn’t emerge from better slogans or shinier interfaces; it emerges from richer data. And nowhere is that gap more glaring than in healthcare, where we keep asking AI to perform miracles using data streams that resemble a sterile textbook more than a living patient. If we genuinely want AI to behave like a seasoned clinician, we need to give it something far more dynamic than structured fields and dictated notes. We need the same substance Dr. Patel alludes to throughout his article: video - human, contextual, messy, and clinically “alive.”

If AI is going to transform healthcare in any meaningful way, it will need something richer than lab values and ICD-10 codes. It will need the same thing that helped autonomous vehicles evolve from an academic curiosity to a commercial product: video. Not because video is trendy or futuristic, but because it’s the closest thing we have to the raw sensory experience of clinical care done by humans.

The irony is that healthcare is already drowning in video. What we lack is the recognition that video isn’t a compliance liability or a storage headache. (Ok, fine, it really is both of those, but stay with me.) Video is the raw material for the next generation of clinical AI: the missing fuel for the revolution we keep predicting but never quite delivering.

AI is only as good as the data we feed it

Healthcare leaders often talk about AI as if it were alchemy. You pour in clinical documentation and patient demographics, and poof, out comes intelligence. But AI isn’t magic; it’s math. And math is picky.

AI systems learn patterns by ingesting enormous amounts of data that reflect the real world in all its messiness. When self-driving cars became viable, it wasn’t because someone finally wrote a better algorithm. It was because companies started feeding those algorithms millions of hours of video showing what actually happens on real roads at real speeds in real weather. The machines didn’t become smarter; they simply had more to learn from.

Meanwhile, in healthcare, we’ve been asking AI to mimic the clinical eye based on data sources that clinicians themselves barely trust. A coded diagnosis doesn’t tell you what the patient looked like. A progress note doesn’t capture how a patient breathed while talking (h/t Graham Walker, MD). A structured field doesn’t reveal the hesitation in a resident’s voice when they’re unsure of a diagnosis.

If you want AI to behave like a seasoned physician, you have to show it what seasoned physicians see. And that means embracing video as the central, not peripheral, data source for the age of AI-enabled care.

Healthcare already has video everywhere; we just don’t treat it like data

Healthcare’s relationship with video borders on paradox. We use it constantly but treat it as if it doesn’t exist.

Operating rooms (ORs) generate terabytes of procedural footage. Endoscopy towers record every square inch of mucosa. Radiologists live in a world of dynamic imaging: echocardiograms, ultrasounds, fluoroscopy. Telehealth visits create high-resolution video streams full of subtle cues about respiratory effort, affect, mobility, and neurologic status. Virtual nursing programs use continuous video monitoring to oversee patient safety 24/7. ICUs have cameras on ventilators, infusion pumps, and bedside monitors capturing endless seconds of physiology, behavior, and clinical interactions.

And yet, unlike self-driving car companies, we discard the vast majority of this material. Not because it lacks value, but because no one thinks of it as data. It exists at the edge of the clinical workflow: the byproduct of care, not the substrate for intelligence.

The missed opportunity is staggering. Every day, health systems generate more dynamic, clinically relevant visual information than Tesla’s entire training system did in its early years. But because it’s not documented, coded, or billable, the data are left to evaporate into the hospital’s digital ether.

Where video could actually change the game

If healthcare treated video the way autonomous vehicles treat video, dozens of high-value use cases would become viable almost overnight.

Subtle diagnostics
A camera in a triage bay can capture the earliest signs of a stroke or opioid intoxication. A hallway camera can reveal the gait instability that precedes falls. Video can detect tremors, breathing patterns, and changes in affect long before a clinician documents anything.

Ambient physical-exam documentation
We tend to think of “ambient documentation” as capturing conversation. That’s useful, but limited. Add video and you get something much more powerful: AI can observe the physical exam itself, understand the sequence of clinical actions, and translate that into structured documentation without anyone staring at a keyboard.

Safety and quality analytics
Wrong-site risks. Breaks in sterile technique. Missed hand hygiene. PPE lapses. Near misses in medication administration. All of these may be visible on video, yet invisible in most datasets. AI can spot the moments humans miss, even the ones we’d prefer not to admit happened.

Continuous patient monitoring without more devices
Instead of adding yet another sensor for fall risk, delirium, respiratory effort, or agitation, video can infer these states using models that observe minute-to-minute changes in behavior, posture, and movement.

Making telehealth clinically richer
Right now, telehealth is an upgraded phone call. With video analytics, it becomes something far closer to an in-person visit, catching the subtle clinical cues that are lost on the 2D screen.

Once healthcare accepts that video is a clinical asset, not solely a legal risk, the list of use cases becomes almost embarrassingly long.

Why video and AI work so well

Part of the power of video is that it embeds context over time, what AI researchers call temporal coherence. A single frame of an ultrasound might reveal little, but a sequence of frames over one cardiac cycle reveals ejection fraction, valve motion, wall thickness, and even novice-versus-expert probe handling.

Video doesn’t just show what is happening; it shows how it’s happening and how fast. It gives AI the sense of rhythm and progression that mirrors real clinical reasoning.

Now that multimodal models can combine video, audio, and text, the value increases exponentially. A model can understand what’s said in a room, what’s done in a room, and how patients respond to what’s done and said. That’s far richer than the static notes we expect AI will magically interpret.

Healthcare has more insight embedded in its video streams than autonomous vehicles will ever have in their traffic footage. Our visual world contains physiology, emotion, communication, and dexterity, all of which are central to clinical decision-making.

Why healthcare has ignored video

It’s not because the value is unclear. It’s because video belongs to no one.

OR video isn’t owned by perioperative leadership. Telehealth video isn’t owned by IT. NICU cameras aren’t owned by nursing. Everyone assumes someone else is in charge of managing, storing, or governing it. And because it’s not billable, it never enters strategic planning discussions.

Then there are the cultural factors. Clinicians worry that video means surveillance. Executives worry that storing it means liability. Compliance worries that someone will inevitably mishandle permissions. And everyone worries that video storage will be expensive, despite cloud storage now being cheaper than contract labor.

But the biggest reason we’ve ignored video is that we’ve been too focused on squeezing more juice out of the EHR. We’ve been so busy restructuring documentation workflows that we haven’t noticed the richest data source in the building.

Unlocking the future

Health systems that want to lead the next decade of AI-enabled care should start by reframing video as a core clinical asset. That means setting governance rules, storage standards, retention policies, and access controls. It means choosing a few clinical domains (e.g., ultrasound, endoscopy, telehealth triage, OR workflow) and piloting multimodal models that can learn from those streams.

It also means telling clinicians, explicitly, that video is not a tool for punitive surveillance. It’s a tool for safety, quality, documentation, and decision support. Protecting clinicians isn’t tangential; it’s essential to adoption.

The organizations that start now will quickly realize what Dr. Patel is telling us: the future of AI in medicine depends less on models and more on the richness of the inputs we give them. His article serves as a warning shot that healthcare can no longer ignore the data source that every other high-performance AI industry is already exploiting. The real competitive advantage in the next decade won’t be who has the best vendor demo; it will be who controls the deepest, most clinically relevant training data. And the richest dataset we have, as Dr. Patel implicitly reminds us, is hiding in plain sight.

Topics: featured, ai

Module heading text

Get the highest quality chemistry and microbiology testing services aligned closely with current good manufacturing practices (CGMP) for all types of products across all phases of development.

Subscribe to receive blog updates