Evaluation of Contextual and Non-contextual Word Embedding Models Using Radiology Reports

Khan, Mirza

Evaluation of Contextual and Non-contextual Word Embedding Models Using Radiology Reports

dc.contributor.advisor	Matheny, Michael E
dc.contributor.advisor	Deppen, Stephen A
dc.contributor.advisor	Landman, Bennett A
dc.creator	Khan, Mirza
dc.date.accessioned	2022-01-10T16:47:15Z
dc.date.created	2021-12
dc.date.issued	2021-12-02
dc.date.submitted	December 2021
dc.identifier.uri	http://hdl.handle.net/1803/16995
dc.description.abstract	Many clinical natural language processing (NLP) methods rely on non-contextual or contextual word embedding models. Yet, few intrinsic evaluation benchmarks exist comparing embedding model representations against human judgment. Moreover, it is unclear if the previously described NLP model discordance between intrinsic and extrinsic evaluation performance persists among novel embedding models. We present a framework used to develop two new intrinsic evaluation tasks: term pair similarity to assess non-contextual word embeddings and cloze task accuracy for contextual word embedding models. Using surveys, we quantified the agreement between model representations and clinician judgment. We also created two clinical phenotyping tasks using binary and multi-label classification with varying class balance for extrinsic evaluation. Findings from each of our intrinsic evaluation tasks reveal that models pre-trained on general biomedical and clinical corpora performed as well as, if not better than, models pre-trained on an in-domain clinical corpus; models trained on a general English Wikipedia corpus performed worst. Joint spherical embeddings gave overly optimistic representations of term pair similarity and the gains seen on general NLP intrinsic evaluations tasks failed to translate in this study. The results of our extrinsic evaluation tasks demonstrated that neural network-based models - Bidirectional Long Short Term Memory, Transformer, and Efficient Transformer models - vastly outperform regression models. Efficient Transformer models as a class tended to provide the best classification performance. Contextual word embedding model performance was improved by pre-training on an in-domain corpus. By contrast, models featuring word2vec and fastText non-contextual embeddings yielded equivalent phenotype classification performance regardless of the corpus source. Overall, intrinsic evaluation performance failed to correlate with extrinsic evaluation findings on study of both non-contextual and contextual embedding models.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Natural language processing
dc.subject	Model evaluation
dc.subject	Deep learning
dc.title	Evaluation of Contextual and Non-contextual Word Embedding Models Using Radiology Reports
dc.type	Thesis
dc.date.updated	2022-01-10T16:47:16Z
dc.type.material	text
thesis.degree.name	MS
thesis.degree.level	Masters
thesis.degree.discipline	Biomedical Informatics
thesis.degree.grantor	Vanderbilt University Graduate School
local.embargo.terms	2022-06-01
local.embargo.lift	2022-06-01
dc.creator.orcid	0000-0001-7007-9437

Files in this item

Name:: KHAN-THESIS-2021.pdf
Size:: 1.597Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record