Leveraging Large Language Models (LLMs) for Electronic Health Records (EHRs): Information Extraction for Medical Records

Background and Purpose

The increasing digitization of healthcare has resulted in widespread adoption of Electronic Health Records (EHRs), creating an abundance of unstructured clinical text containing valuable information on diagnoses, treatments, and patient outcomes. While this data holds great potential for improving health analytics, actuarial modeling, and clinical decision support, extracting structured, meaningful information from free-text medical records remains a longstanding challenge. Variability in documentation styles, medical terminology, and data quality often limits the ability to use EHR data efficiently and accurately.

Recent advances in Large Language Models (LLMs), including general-purpose systems such as GPT-based models and domain-specific variants such as BioGPT, Med-PaLM, and ClinicalBERT, have demonstrated strong capabilities in understanding and transforming unstructured clinical text. These models can automate critical tasks such as diagnosis extraction, medication identification, summarization of clinical notes, and detection of temporal or causal relationships within medical histories. However, their use in actuarial, insurance, and population health applications remains in its early stages. Practical implementation raises questions of accuracy, interpretability, governance, and regulatory compliance.

The Actuarial Innovation and Technology Steering Committee (AITSC) recognizes the transformative potential of LLMs in unlocking value from health data and seeks to support research that evaluates their role in medical information extraction. The goal is to produce insights that can guide actuaries, insurers, and healthcare data professionals in responsibly and effectively deploying LLM-based systems. The research should highlight both opportunities and limitations, ensuring the resulting guidance is evidence-based, actionable, and aligned with ethical and regulatory principles.

Research Objective

The proposal is expected, but not limited to, insights into following considerations:

a. Model Evaluation and Benchmarking:
Review and compare various LLM architectures (e.g., GPT, BERT, T5, BioClinical models) for EHR-related tasks such as entity recognition, relation extraction, and document summarization. Evaluate comparative performance across different model scales and domain adaptation methods.

b. Data Preparation and Governance:
Examine preprocessing, de-identification, and annotation procedures required for responsible use of clinical text data. Discuss methods for ensuring compliance with data privacy and security standards and for maintaining data quality during model training or fine-tuning.

c. Performance and Reliability:
Define and apply quantitative and qualitative metrics for assessing model accuracy, reproducibility, and interpretability in clinical information extraction. Consider robustness under different data sources, domains, and prompt designs.

d. Integration with Actuarial and Health Analytics Processes:
Explore how extracted information can enhance existing actuarial workflows—such as morbidity and mortality modeling, claims analysis, or risk stratification—and how these models may complement existing data pipelines.

e. Ethical and Regulatory Considerations:
Assess ethical issues including fairness, bias, explainability, and patient consent. Provide recommendations to align model development and deployment with healthcare regulations and ethical best practices.

f. Implementation Challenges and Future Outlook:
Identify technical and operational challenges in adopting LLM-based extraction systems, such as computational cost, fine-tuning requirements, and interpretability constraints. Project emerging trends in generative and domain-specific LLMs for healthcare applications.

g. Demonstrative Implementation:
Develop and document a working use case using publicly available or synthetic EHR data. The use case should include:

An end-to-end description of the information extraction pipeline using an LLM;
Examples of extracted data and validation steps;
Open-source release of the source code, model prompts, and supporting documentation under a permissive license to encourage transparency and reuse.

Note that the list above is not meant to be exhaustive but merely examples of proposed topics that may be explored.

Proposal Requirements

To facilitate the evaluation of proposals, the following information should be submitted:

Resumes of the researcher(s), including any graduate student(s) expected to participate, indicating how their background, education and experience bear on their qualifications to undertake the research. If more than one researcher is involved, a single individual should be designated as the lead researcher and primary contact. The person submitting the proposal must be authorized to speak on behalf of all the researchers as well as for the firm or institution on whose behalf the proposal is submitted.
An outline of the approach to be used (e.g. literature search, model, etc.), emphasizing issues that require special consideration. Details should be given regarding the techniques to be used, collateral material to be consulted, and possible limitations of the analysis.
A description of the expected deliverables and any supporting data, tools or other resources to be used.
Cost estimates for the research, including computer time, salaries, report preparation, material costs, etc. Such estimates can be in the form of hourly rates, but in such cases, time estimates should also be included. Any guarantees as to total cost should be given and will be considered in the evaluation of the proposal. While cost will be a factor in the evaluation of the proposal, it will not necessarily be the decisive factor.

As a guide for developing project budgets, please review the Historical Project Cost Guide (see Appendix).

Please note that as a policy, the SOA Research Institute generally does not provide funding to cover academic institution overhead expenses.

A schedule for completion of the research, identifying key dates or time frames for research completion and report submissions. The SOA is interested in completing this project in a timely manner. Suggestions in the proposal for ensuring timely delivery, such as fee adjustments, are encouraged.
Other related factors that give evidence of a proposer's capabilities to perform in a superior fashion should be detailed.

Selection Process

The SOA will appoint a Project Oversight Group (POG) to oversee the project. The POG is responsible for recommending to the Section Research Committee the proposal to be funded, if any. Input from other knowledgeable individuals also may be sought, but the Section Research Committee will make the final recommendation, subject to Society of Actuaries Research Institute (SOA) leadership approval. An SOA staff research actuary will provide staff actuarial support.

Questions

Any questions regarding this RFP should be directed to research-ait@soa.org with the subject line LLMs for Electronic Health Records.

Notification of Intent to Submit Proposal

If you intend to submit a proposal, please email written notification by December 19, 2025, to research-ait@soa.org with the subject line LLMs for Electronic Health Records.

Submission of Proposal

Please email your proposal to research-ait@soa.org with the subject line LLMs for Electronic Health Records; proposals must be received no later than January 16, 2026. It is anticipated that all proposers will be informed of the status of their proposal by the end of February 2029.

Conditions

The selection of a proposal is conditioned upon and not considered final until a Letter of Agreement is executed by both the Society of Actuaries Research Institute and the researcher.

The Society of Actuaries Research Institute reserves the right to not award a contract for this research. Reasons for not awarding a contract could include, but are not limited to, a lack of acceptable proposals or a finding that insufficient funds are available. The Society of Actuaries Research Institute also reserves the right to redirect the project as is deemed advisable.

The Society of Actuaries Research Institute plans to hold the copyright to the research and to publish the results with appropriate credit given to the researcher(s).

The Society of Actuaries Research Institute may choose to seek public exposure or media attention for the research. By submitting a proposal, you agree to cooperate with the [Society of Actuaries/sponsoring entity] in publicizing or promoting the research and responding to media requests.

The Society of Actuaries may also choose to market and promote the research to members, candidates and other interested parties. You agree to perform promotional communication requested by the Society of Actuaries, which may include, but is not limited to, leading a webcast on the research, presenting the research at an SOA meeting, and/or writing an article on the research for an SOA newsletter.

Conflict of Interest

You agree to disclose any of your material business, financial and organizational interests and affiliations which are or may be construed to be reasonably related to the interest, activities and programs of the Society of Actuaries or the Society of Actuaries Research Institute.

Appendix

The cost ranges below are intended as a guide for budgeting project costs for proposals in response to SOA Research Institute Request for Proposals (RFP). Please note these figures span the 33rd to 66th percentiles for all projects as well as projects that involve a specific approach (lit review, survey, etc.). They are based on historical costs over several recent years. Expected costs for some RFPs may fall outside these ranges depending on the nature of the work and resources required for completion.

All Contracted Projects

This category includes all contracted projects that the Institute has undertaken within the last several years. The 33^rd – 66^th percentile project costs range is $25,000 – $50,000.

Literature Reviews

This category includes projects that involved only a literature review or the cost for the portion of a larger project that included a literature review. The 33^rd – 66^th percentile project costs range is $15,000 – $20,000.

Surveys

This category includes all projects that had a survey as their primary component. The 33^rd-66^th percentile project costs range is $28,000 – $55,000.