NLP | Digital Health

Wang Y, Verspoor K, Baldwin T. (2020) Learning from Unlabelled Data for Clinical Semantic Textual Similarity. Proceedings of the 3rd Clinical Natural Language Processing Workshop at EMNLP2020.

Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning. To overcome this, we propose to utilise domain unlabelled data by assigning pseudo labels from a general model. We evaluate the approach on two clinical STS datasets, and achieve r= 0.80 on N2C2-STS.

Wang Y, Liu F, Verspoor K, Baldwin T (2020) Evaluating the Utility of Model Configurations and Data Augmentation on Clinical Semantic Textual Similarity. Proceedings of the Workshop on Biomedical Natural Language Processing (BioNLP) at ACL2020.

In this paper, we apply pre-trained language models to the Semantic Textual Similarity (STS) task, with a specific focus on the clinical domain. In low-resource setting of clinical STS, these large models tend to be impractical and prone to overfitting. Building on BERT, we study the impact of a number of model design choices, namely

Akhtyamova L, Martínez P, Verspoor K, Cardiff J. (2020) Testing Contextualized Word Embeddings to Improve NER in Spanish Clinical Case Narratives. IEEE Access.

In the Big Data era, there is an increasing need to fully exploit and analyze the huge quantity of information available about health. Natural Language Processing (NLP) technologies can contribute by extracting relevant information from unstructured data contained in Electronic Health Records (EHR) such as clinical notes, patients’ discharge summaries and radiology reports. The extracted