But humans continue to outperform when issues require deeper nuance
Lawyers struggle to compete with artificial intelligence (AI) when it comes to legal research, a new study has suggested.
A new benchmarking report by Vals AI has compared the performance of four AI tools — Alexi, Counsel Stack, Midpage, and ChatGPT — against a group of practising lawyers. Both were tasked with answering the same 200 legal research questions, developed with input from leading US firms including Paul Weiss, Reed Smith, and Paul Hastings. The responses were then assessed across three key criteria: accuracy, authoritativeness, and appropriateness.
Accuracy measured whether the answer was substantively correct, authoritativeness assessed whether it was supported by valid legal sources, and appropriateness gauged whether it was clear and client-ready.
The results were striking. The AI tools outperformed the lawyers across all three measures. Overall, the AI systems scored between 74% and 78%, compared with an average of 69% for the human participants. Interestingly, the specialist legal AI tools were only marginally stronger than the generalist product, ChatGPT, in most areas.
Breaking down the numbers paints an even clearer picture. On accuracy, the AI tools achieved an average score of 80%, while the lawyers managed 71%. On authoritativeness, legal AI led with 76%, ChatGPT followed with 70%, and the lawyers trailed at 68%. The gap widened further in terms of appropriateness, with legal AI scoring 70%, ChatGPT 67%, and the lawyers just 60%.
It’s not all doom and gloom. The study found that in four out of ten question types, lawyers still had the upper hand — particularly those requiring a more nuanced understanding of context, judgement, and multi-jurisdictional reasoning. On these more complex questions, the human participants outperformed the AI tools by an average of nine percentage points.
The study arrives at a time of crackdowns and disciplinary action against lawyers who have relied on fake case citations hallucinated by ChatGPT and other AI tools in court.
The report also notes that several leading legal tech companies did not participate in the study. This may have been due to its ‘zero-shot’ methodology, where AI products were given each question cold — with no prior examples, follow-up prompts, or use of workflow features that might have improved their results.
Vals also hints that the gap between Legal AI and generalist tools like ChatGPT could narrow once Deep Research — OpenAI’s live web-search capability — becomes more widely integrated into legal workflows.