Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis

Carroll, Robert James

Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis

dc.creator	Carroll, Robert James
dc.date.accessioned	2020-08-22T21:14:29Z
dc.date.available	2013-10-21
dc.date.issued	2011-10-21
dc.identifier.uri	https://etd.library.vanderbilt.edu/etd-10192011-143352
dc.identifier.uri	http://hdl.handle.net/1803/14341
dc.description.abstract	Electronic Health Records (EHRs) are valuable tools in clinical and genomic research, providing the ability to generate large patient cohorts for study. Traditionally, EHR-based research is carried out through manual review of patient charts, which is expensive and time consuming, and limits the scalability of EHR-derived genetic or clinical research. The recent introduction of automated phenotype identification algorithms have sped cohort identification, but they also require significant investment to develop. In these studies, we evaluated three aspects of the process of phenotype algorithm implementation and application in the context of Rheumatoid Arthritis (RA), a chronic inflammatory arthritis with known genetic risk factors. The first aspect was whether using a naïve set of features to train a support vector machine (SVM) would have similar performance to models trained using an expert-defined feature set. The second aspect was the effect of training set size on the predictive power of the algorithm for both the naïve and expert-defined sets. The third aspect was the evaluation of the portability across institutions of a trained model using expert-derived features. We show that training an SVM with all available attributes maintains strong performance compared to an SVM trained using an expert-defined set of features. Using an expert-defined feature set allowed for a much smaller training set compared to the naïve feature set, although training set size requirements were much smaller than often used for phenotype algorithm training. We also show the portability of a previously published logistic regression model trained at Partners HealthCare to Vanderbilt and Northwestern Universities. While the original model was portable, models retrained using local data can also improve performance. This research shows the potential for rapid development of new phenotype identification algorithms that may be portable to different EHR systems and institutions. With the application of clinical knowledge in the design, very few training records are required to create strongly predictive models, which could ease the development of models for new conditions. Fast, accurate development of portable phenotype algorithms offers the potential to engender a new era of EHR-based research.
dc.format.mimetype	application/pdf
dc.subject	genetic associations
dc.subject	automated patient cohorts
dc.subject	phenotype algorithm
dc.title	Efficient Development of Electronic Health Record Based Algorithms to Identify Rheumatoid Arthritis
dc.type	thesis
dc.contributor.committeeMember	Tom Lasko
dc.contributor.committeeMember	Hua Xu
dc.type.material	text
thesis.degree.name	MS
thesis.degree.level	thesis
thesis.degree.discipline	Biomedical Informatics
thesis.degree.grantor	Vanderbilt University
local.embargo.terms	2013-10-21
local.embargo.lift	2013-10-21
dc.contributor.committeeChair	Josh Denny

Files in this item

Name:: CarrollThesis.pdf
Size:: 1.220Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record