Could AI generated summaries present a solution for time pressured GP appointments?
I am part of a team of clinicians and academics that has been conducting a research study comparing AI- versus clinician-authored summaries of simulated primary care electronic health records (EHRs).
While the paper has not yet been through peer review, I am able to share the preprint here.
Why did we do this work?
General practitioners (GPs) are increasingly pushed for time. Primary care electronic health records are typically lengthy, with much of the information stored as unstructured free text, making it difficult to search and extract key information efficiently. This makes it challenging for clinicians working in primary care to review patient records in enough depth without using up a significant proportion of the standard 10-minute consultation. The result? Potential delays in identifying important clinical information — and, in some cases, avoidable medical errors that could compromise patient safety.
What if a large language model, like ChatGPT, could summarise the record for GPs — allowing them to spend more of the consultation time with patients rather than scrolling through notes on a computer screen? How would AI-generated summaries compare to those produced by clinicians? Would the quality of AI-generated summaries be high enough to use in practice?
That’s what we set out to investigate.
How did we do it?
A team of 12 colleagues collaborated on the study, including seven clinicians who authored 70 simulated patient EHRs. We deliberately used simulated patient records — not real ones — to protect patient confidentiality. The simulated records reflected different documentation styles seen in general practice.
Each record was independently summarised by both clinicians and ChatGPT-4. The summaries were then rated by clinicians, who were blinded to the origin of each summary.
What did we find?
Clinicians took a median time of 7 minutes to read through just two years' worth of each simulated patient notes — consuming around 70% of a standard GP consultation time.
AI-authored summaries were rated slightly lower overall than clinician-authored summaries — but they demonstrated similar accuracy and greater consistency.
What does this mean?
Our findings suggest that AI-generated summaries could have important applications in primary care. Given the time pressures GPs face, this is a key example of how AI summarisation tools could improve healthcare productivity- by enabling clinicians to spend more time on direct patient care.
AI is faster than clinicians at summarising EHRs — and delivers results that, in many cases, are not far off what clinicians produce. Could this be another task that we hand over to AI? If so, the benefits could be twofold: better working conditions for GPs and more time for patient care.
In the context of the significant demand-resource mismatch in general practice — which is a key driver of poor retention, AI summarisation may be less of a luxury and more of a necessity.
This research is just one example of the kind of work needed to explore the human-AI interface across different workplaces. Discovering what AI can do is only the tip of the iceberg. The deeper challenge lies in how best to implement AI products alongside a human workforce and human end users. This includes determining how humans retain accountability for AI-generated outputs — and understanding the longer-term impact of working alongside AI on human health, skills, and decision-making.