Show simple item record

Computational Phenotyping and Phenome-wide Association Studies: Leveraging Machine Learning and Natural Language Processing to Understand Electronic Health Record Data

dc.creatorTeixeira, Pedro Luis, Jr.
dc.date.accessioned2020-08-22T20:53:17Z
dc.date.available2017-08-27
dc.date.issued2015-08-27
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-08262015-232710
dc.identifier.urihttp://hdl.handle.net/1803/14022
dc.description.abstractThe aims of this project are 1) to evaluate various data sources and algorithms for identifying hypertensive individuals within the electronic health record, and 2) to develop and evaluate a novel method for identifying associations between genotypes and natural language processing-based phenotypes extracted from the electronic health record. The author evaluated data sources and hypertension phenotyping algorithms using a set of 631 individuals manually reviewed for hypertension status based on their electronic health record data. Combinations of data sources outperformed methods that leveraged any category individually. Random forest models trained with billing codes, medications, vital signs, and hypertension concept counts achieved a median AUC of 0.976. The best algorithms performed similarly at a second site. The author also developed a novel method for phenome-wide association studies using natural language processing-based phenotypes (NLP-PheWAS). Using 29,722 individuals with Exome data, the author extracted 11,553 unique concepts from narrative text after negation, note section, and semantic type filtering. The method replicated 43.7% of known, statistically powered associations from the National Human Genome Research Institute’s genome-wide association catalog. NLP-PheWAS also identified two potentially novel associations among the SNPs studied. They included an association between optic disc neovascularization and rs1497546 and between Langerhans-Cell Histiocytosis and rs7193343. NLP-PheWAS is a promising method for enabling rapid discovery, interpretation of novel associations, and increased understanding of genetic influences within the rapidly expanding narrative text of electronic health records.
dc.format.mimetypeapplication/pdf
dc.subjectbiomedical informatics
dc.subjectphenome-wide association studies
dc.subjecthypertension
dc.subjectrandom forests
dc.subjectmachine learning
dc.subjectnatural language processing
dc.titleComputational Phenotyping and Phenome-wide Association Studies: Leveraging Machine Learning and Natural Language Processing to Understand Electronic Health Record Data
dc.typedissertation
dc.contributor.committeeMemberThomas A. Lasko, M.D., Ph.D.
dc.contributor.committeeMemberTodd L. Edwards, M.S., Ph.D.
dc.contributor.committeeMemberS. Trent Rosenbloom, M.D., MPH
dc.contributor.committeeMemberDan M. Roden, M.D.
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University
local.embargo.terms2017-08-27
local.embargo.lift2017-08-27
dc.contributor.committeeChairJoshua C. Denny, M.D., M.S.


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record