Liu J, Capurro D, Nguyen A, Verspoor K. “Note Bloat” impacts deep learning-based NLP models for clinical prediction tasks. Journal of biomedical informatics. 2022 Sep 1;133:104149.

“Note Bloat” impacts deep learning-based NLP models for clinical prediction tasks One unintended consequence of the Electronic Health Records (EHR) implementation is the overuse of content-importing technology, such as copy-and-paste, that creates “bloated” notes containing large amounts of textual redundancy. Despite the rising interest in applying machine learning models to learn from real-patient data, it

Akhtyamova L, Martínez P, Verspoor K, Cardiff J. (2020) Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives. IEEE Access.

In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients’ discharge summaries and radiology reports. The extracted

Quiroz JC, Laranjo L, Kocaballi AB, Briatore A, Berkovsky S, Rezazadegan D, Coiera E. Identifying relevant information in medical conversations to summarize a clinician-patient encounter. Health Informatics Journal. 0(0):1460458220951719.

To inform the development of automated summarization of clinical conversations, this study sought to estimate the proportion of doctor-patient communication in general practice (GP) consultations used for generating a consultation summary. Two researchers with a medical degree read the transcripts of 44 GP consultations and highlighted the phrases to be used for generating a summary

Wang Y, Coiera E, Magrabi F. Can Unified Medical Language System–based semantic representation improve automated identification of patient safety incident reports by type and severity? Journal of the American Medical Informatics Association. 2020.

Objective The study sought to evaluate the feasibility of using Unified Medical Language System (UMLS) semantic features for automated identification of reports about patient safety incidents by type and severity. Materials and Methods Binary support vector machine (SVM) classifier ensembles were trained and validated using balanced datasets of critical incident report texts (n_type = 2860, n_severity = 1160) collected