Quiroz JC, Laranjo L, Tufanaru C, Kocaballi AB, Rezazadegan D, Berkovsky S, Coiera E. Empirical Analysis of Zipf’s Law, Power Law, and Lognormal Distributions in Medical Discharge Reports 2020, Eprint 2003.13352, ArXiv, Cs.CL
Bayesian modelling and statistical text analysis rely on informed probability priors to encourage good solutions. This paper empirically analyses whether text in medical discharge reports follow Zipf’s law, a commonly assumed statistical property of language where word frequency follows a discrete power law distribution. We examined 20,000 medical discharge reports from the MIMIC-III dataset. Methods