The State of AI in Insurance: A Comparison of Large Language Models (LLMs)

Artificial intelligence (AI) continues to revolutionize industries worldwide, and insurance is no exception. The insurance sector, known for its vast amounts of documentation, claims, and compliance, increasingly relies on AI tools like Large Language Models (LLMs) to streamline and enhance operations. In this blog post, we explore key findings from the latest report on the state of AI in insurance, focusing on LLM performance, price comparisons, and advancements in the field.

Understanding the Performance of LLMs in Insurance Use Cases

As LLM technology evolves, the capabilities of these models are improving significantly. The report highlights the performance of 16 publicly available LLMs, tested across four diverse insurance-related scenarios:

Complex English-language airline invoices: Models were tasked with extracting detailed data from flight invoices, a scenario that requires high accuracy.
Simple Japanese-language property repair quotes: A less complex scenario that tests how well models extract straightforward data from non-English documents.
Simple French-language dental invoices: Similar to the Japanese scenario, but testing the models' ability to handle French documents.
Complex English-language travel insurance document classification: This scenario involves a higher level of difficulty, requiring classification of diverse travel insurance documents.

LLMs were assessed for their coverage (how well they extract the necessary data) and accuracy (how often they deliver correct results). Models like GPT-4o and GPT-4o Mini led the way with consistently strong performances across both simple and complex tasks. Claude3.5 Sonnet and Mistral Large 2407 also performed well, making these models viable alternatives.

Key Metrics and Findings

The report provides several valuable insights for insurers considering LLMs for automating data extraction and document classification:

Top Performers: GPT-4o achieved the highest aggregate performance score of 87.2%, surpassing other models. Claude3.5 Sonnet closely followed with a score of 86.7%, making it a strong competitor in performance, though at a higher price point.
Price-to-Performance Balance: One of the most striking conclusions from the report is the importance of balancing performance with cost. Although Claude3.5 Sonnet outperformed some models, its price is significantly higher. Meanwhile, GPT-4o Mini delivered a solid performance for a fraction of the cost of previous models like GPT3.5.
Context Size: The token limit, or context size, has become a standard feature for leading LLMs, with most models supporting 128k tokens. This makes them capable of handling most insurance-related use cases effectively.

Implications for the Insurance Industry

With AI playing an increasingly crucial role in insurance, selecting the right LLM is paramount for maximizing efficiency. Here are three takeaways for companies looking to integrate LLMs into their operations:

Price Matters: With multiple models offering similar performance, cost becomes a deciding factor. GPT-4o and GPT-4o Mini offer some of the best price-to-performance ratios, making them top choices for insurers looking to optimize budgets.
Context Size is Now a Baseline: Most models now come equipped with 128k tokens of context size, allowing them to handle the complexity of insurance documents without breaking a sweat.
Advancing Technologies: With constant updates and new models hitting the market, it’s becoming harder to track improvements. Companies should continue to evaluate emerging technologies, but for now, models like GPT-4o and Claude3.5 Sonnet are clear leaders.

Conclusion: The Future of LLMs in Insurance

As the field of AI continues to evolve, insurers are presented with a wide array of tools to enhance their workflows. The tested LLMs have achieved impressive results, and as costs decrease, they become even more accessible. The future looks bright for AI in insurance, with rapid advancements and emerging models poised to push the industry forward.