As a former Navy Corpsman, it’s hard not to be at least a bit excited about Google’s AMIE, which the Google Research Blog announced on Friday, January 12th. Posts on social media flared like a diagnosed case of hemorrhoids to my senior software engineer self.
I dug in and researched.
Reality is that it’s not as much of an advance as some posts and titles may have people believing. Doctors aren’t going to be replaced anytime soon, particularly since the paper’s conclusion was very realistic.
The utility of medical AI systems could be greatly improved if they are better able to interact conversationally, anchoring on large-scale medical knowledge while communicating with appropriate levels of empathy and trust. This research demonstrates the significant potential capabilities of LLM based AI systems for settings involving clinical history-taking and diagnostic dialogue. The performance of AMIE in simulated consultations represents a milestone for the field, as it was assessed along an evaluation framework that considered multiple clinically-relevant axes for conversational diagnostic medical AI. However, the results should be interpreted with appropriate caution. Translating from this limited scope of experimental simulated history-taking and diagnostic dialogue, towards real-world tools for people and those who provide care for them, requires significant additional research and development to ensure the safety, reliability, fairness, efficacy, and privacy of the technology. If successful, we believe AI systems such as AMIE can be at the core of next generation learning health systems that help scale world class healthcare to everyone.
“Towards Conversational Diagnostic AI“(PDF), Conclusion, Many authors (see paper), Google Research and Google Deep Mind, 11 Jan 2024.
In essence, this is a start, and pretty promising given it’s only through a text chat application. Clinicians – real doctors – that took part in the study were in a disadvantage, because they normally have a conversation with the patient.
As I quipped on social media with a friend who is a doctor, if the patient is unresponsive, the best AMIE can do is repeat itself in all caps:
“HEY! ARE YOU UNCONSCIOUS? DID YOU JUST LEAVE? COME BACK! YOU CAN’T DIE UNLESS I DIAGNOSE YOU!”
In that way, the accuracy comparison of 91.3%, compared to 82.5% for physicians should be taken with Dead Sea levels of salt. Yes, the AI beat human doctors by 11.2% when we tied a doctor’s human experience behind their back.
Interestingly, sometimes doctors aren’t the ones who do the patient histories, too. Sometimes it’s nurses, in the Navy it was often Corpsmen. Often when a doctor walked in the room to see a patient they already had SOAP notes to work from, verify, and add on to.
The take from Psychology Today, though, is interesting, pointing out that AI and LLMs are charting a new course in goal-oriented patient dialogues. However, even that article seemed to gloss over the fact that this was all done in text chat when they pointed out in terms of conversation quality, AMIE scored 4.7 out of 5, while physicians averaged 3.9.
There is a very human element to medicine which involves evaluating a patient by looking and listening to them. In my experience as a Navy Corpsman taking medical histories for the doctors, patients can be tricky and unfocused, particularly when in pain. Evaluation often leans more on what one observes more than what the patient says, particularly in an emergency setting. I’ve seen good doctors work magic with patient histories, ordering tests based not on what the patient told them but what they observed, ruling things out diagnostically.
Factor in that in what I consider a commodification of medicine in my lifetime, doctors can be time constrained to see more patients in unit time and that certainly doesn’t help things – and that’s a human induced human error when it crops up. Given the way the study was done, I don’t think it was as much a factor here but it’s worth considering.
When we go to the doctor as patients, when sitting with the doctor in the uncomfortable uniform of the patient on an examination table that is designed to draw all the heat from your body through your buttocks, we tend to think we’re the only person the doctor is dealing with. That’s rarely the case.
I do think we’re missing the boat on this one, though, because one of the best ways to pull artificial intelligence into checking patient charts, which would be a great exercise of what a large language model (LLM) artificial intelligence is good at: evaluating text and information and coming up with a diagnosis. Imagine an artificial intelligence evaluating charts and lab tests when they come back, then alerting doctors when necessary while the patient is being treated. Of course, the doctor gets the final say, but the AI’s ‘thoughts’ are entered into the chart as well.
I’m not sure engaging a patient for patient history was a good first step for a large language model in medicine, but of course that’s not all that Google’s research and Deep Mind teams are working on, so it may be part of an overall strategy. Or it might just be the thing that got funding because it was sexy.
Regardless, this is probably one of the more exciting uses of artificial intelligence because it’s not focused on making money. It’s focused on treating humans better. What’s not to like?
Interesting this popped up in my Facebook memories. How we differ in verbal and written communication.
https://www.inc.com/minda-zetlin/you-should-never-ever-argue-with-anyone-on-facebook-according-to-science.html