Wang Y, Verspoor K, Baldwin T. (2020) Learning from Unlabelled Data for Clinical Semantic Textual Similarity. Proceedings of the 3rd Clinical Natural Language Processing Workshop at EMNLP2020.
Domain pretraining followed by task fine-tuning has become the standard paradigm for NLP tasks, but requires in-domain labelled data for task fine-tuning. To overcome this, we propose to utilise domain unlabelled data by assigning pseudo labels from a general model. We evaluate the approach on two clinical STS datasets, and achieve r= 0.80 on N2C2-STS.