Evaluating Domain Specific LLMs for Real World Finance — Waseem Alshikh, Writer
AI Summary
Summary of ‘Evaluating Domain Specific LLMs for Real World Finance’
Presented by: Waseem Alshikh, CTO of Writer
Event: AI Engineer Summit 2025
Date: April 22, 2025
Views: 7,226
Likes: 184
Comments: 5Key Points:
Introduction to Writer: Founded in 2020, Writer focuses on developing language models, now with a family of around 16 existing models and more in the pipeline.
General vs. Domain-Specific Models: Discussion on the effectiveness of general models achieving 80-90% accuracy in comparison to domain-specific models tailored for specific industries like finance and medical fields.
Need for Evaluation: Initiated the creation of an evaluation benchmark to assess performance based on realistic financial data scenarios, focusing on both query and context failures.
Categories of Evaluation:
- Query Failure: Including misspellings, incomplete queries, and out-of-domain questions.
- Context Failure: Evaluating the model’s ability to handle irrelevant or incorrect context, character errors in OCR outputs.
Results: Found that smaller, domain-specific models perform better in grounding and following context compared to larger, general models, which tend to experience higher hallucination rates. This raises concerns about the overall robustness of AI applications in high-stakes financial environments.
Conclusion: The need to continue developing domain-specific models is emphasized due to persistent gaps in grounding and context handling, despite advancements in accuracy.
Links: