People | Locations | Statistics |
---|---|---|
Naji, M. |
| |
Motta, Antonella |
| |
Aletan, Dirar |
| |
Mohamed, Tarek |
| |
Ertürk, Emre |
| |
Taccardi, Nicola |
| |
Kononenko, Denys |
| |
Petrov, R. H. | Madrid |
|
Alshaaer, Mazen | Brussels |
|
Bih, L. |
| |
Casati, R. |
| |
Muller, Hermance |
| |
Kočí, Jan | Prague |
|
Šuljagić, Marija |
| |
Kalteremidou, Kalliopi-Artemi | Brussels |
|
Azam, Siraj |
| |
Ospanova, Alyiya |
| |
Blanpain, Bart |
| |
Ali, M. A. |
| |
Popa, V. |
| |
Rančić, M. |
| |
Ollier, Nadège |
| |
Azevedo, Nuno Monteiro |
| |
Landes, Michael |
| |
Rignanese, Gian-Marco |
|
Bastarache, Lisa
in Cooperation with on an Cooperation-Score of 37%
Topics
Publications (1/1 displayed)
Places of action
Organizations | Location | People |
---|
article
Identifying and Extracting Rare Diseases and Their Phenotypes with Large Language Models
Abstract
<jats:title>Abstract</jats:title><jats:sec><jats:title>Purpose</jats:title><jats:p>Phenotyping is critical for informing rare disease diagnosis and treatment, but disease phenotypes are often embedded in unstructured text. While natural language processing (NLP) can automate extraction, a major bottleneck is developing annotated corpora. Recently, prompt learning with large language models (LLMs) has been shown to lead to generalizable results without any (zero-shot) or few annotated samples (few-shot), but none have explored this for rare diseases. Our work is the first to study prompt learning for identifying and extracting rare disease phenotypes in the zero- and few-shot settings.</jats:p></jats:sec><jats:sec><jats:title>Methods</jats:title><jats:p>We compared the performance of prompt learning with ChatGPT and fine-tuning with BioClinicalBERT. We engineered novel prompts for ChatGPT to identify and extract rare diseases and their phenotypes (e.g., diseases, symptoms, and signs), established a benchmark for evaluating its performance, and conducted an in-depth error analysis.</jats:p></jats:sec><jats:sec><jats:title>Results</jats:title><jats:p>Overall, fine-tuning BioClinicalBERT resulted in higher performance (F1 of 0.689) than ChatGPT (F1 of 0.472 and 0.610 in the zero- and few-shot settings, respectively). However, ChatGPT achieved higher accuracy for rare diseases and signs in the one-shot setting (F1 of 0.778 and 0.725). Conversational, sentence-based prompts generally achieved higher accuracy than structured lists.</jats:p></jats:sec><jats:sec><jats:title>Conclusion</jats:title><jats:p>Prompt learning using ChatGPT has the potential to match or outperform fine-tuning BioClinicalBERT at extracting rare diseases and signs with just one annotated sample. Given its accessibility, ChatGPT could be leveraged to extract these entities without relying on a large, annotated corpus. While LLMs can support rare disease phenotyping, researchers should critically evaluate model outputs to ensure phenotyping accuracy.</jats:p></jats:sec>