Show simple item record

Evaluation of Contextual and Non-contextual Word Embedding Models Using Radiology Reports

dc.contributor.advisorMatheny, Michael E
dc.contributor.advisorDeppen, Stephen A
dc.contributor.advisorLandman, Bennett A
dc.creatorKhan, Mirza
dc.date.accessioned2022-01-10T16:47:15Z
dc.date.created2021-12
dc.date.issued2021-12-02
dc.date.submittedDecember 2021
dc.identifier.urihttp://hdl.handle.net/1803/16995
dc.description.abstractMany clinical natural language processing (NLP) methods rely on non-contextual or contextual word embedding models. Yet, few intrinsic evaluation benchmarks exist comparing embedding model representations against human judgment. Moreover, it is unclear if the previously described NLP model discordance between intrinsic and extrinsic evaluation performance persists among novel embedding models. We present a framework used to develop two new intrinsic evaluation tasks: term pair similarity to assess non-contextual word embeddings and cloze task accuracy for contextual word embedding models. Using surveys, we quantified the agreement between model representations and clinician judgment. We also created two clinical phenotyping tasks using binary and multi-label classification with varying class balance for extrinsic evaluation. Findings from each of our intrinsic evaluation tasks reveal that models pre-trained on general biomedical and clinical corpora performed as well as, if not better than, models pre-trained on an in-domain clinical corpus; models trained on a general English Wikipedia corpus performed worst. Joint spherical embeddings gave overly optimistic representations of term pair similarity and the gains seen on general NLP intrinsic evaluations tasks failed to translate in this study. The results of our extrinsic evaluation tasks demonstrated that neural network-based models - Bidirectional Long Short Term Memory, Transformer, and Efficient Transformer models - vastly outperform regression models. Efficient Transformer models as a class tended to provide the best classification performance. Contextual word embedding model performance was improved by pre-training on an in-domain corpus. By contrast, models featuring word2vec and fastText non-contextual embeddings yielded equivalent phenotype classification performance regardless of the corpus source. Overall, intrinsic evaluation performance failed to correlate with extrinsic evaluation findings on study of both non-contextual and contextual embedding models.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectNatural language processing
dc.subjectModel evaluation
dc.subjectDeep learning
dc.titleEvaluation of Contextual and Non-contextual Word Embedding Models Using Radiology Reports
dc.typeThesis
dc.date.updated2022-01-10T16:47:16Z
dc.type.materialtext
thesis.degree.nameMS
thesis.degree.levelMasters
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University Graduate School
local.embargo.terms2022-06-01
local.embargo.lift2022-06-01
dc.creator.orcid0000-0001-7007-9437


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record