Show simple item record

Coping With Complexities in High Dimensional Data: PheWAS in EMR and Statistical Inference in fMRI Data

dc.contributor.advisorJohnson, Robert
dc.contributor.advisorKang, Hakmook
dc.creatorLin, Ya-Chen
dc.date.accessioned2020-09-14T15:15:35Z
dc.date.available2020-09-14T15:15:35Z
dc.date.created2020-08
dc.date.issued2020-07-19
dc.date.submittedAugust 2020
dc.identifier.urihttp://hdl.handle.net/1803/15924
dc.description.abstractWhen conducting analyses on high dimensional data, one could face statistical difficulties due to large dimensionality and the noisy nature of the data. In this dissertation, we specifically look into potential complexities one might encounter when analyzing electronic medical record (EMR) and functional magnetic resonance imaging (fMRI) data. Phenome-Wide Association study (PheWAS) is a newly proposed method that scans through phenotypes (Phecodes) with a specific genotype of interest using logistic regression. Since the clinical diagnoses in EMR are often inaccurate which can lead to biases in the odds ratio estimates, much effort has been put to accurately define the cases and controls to ensure an accurate analysis. Specifically in order to correctly classifying controls in the population, an exclusion criteria list for each Phecode was manually compiled to obtain unbiased estimates. However, this method could be inefficient and the accuracy of the lists cannot be guaranteed. We propose to estimate relative risk (RR) instead. With simulation and real data application, we show that RR is unbiased without compiling exclusion criteria lists. With RR as estimates, we are able to extend PheWAS to larger-scale phenotypes which preserve more disease-related clinical information than Phecodes. The main purpose of task-induced fMRI is to measure neuronal activities related to specific task. fMRI data usually require several preprocessing steps before analysis. Among all, spatial smoothing is a necessary step known to increase signal-to-noise ratios but the choice of degree of smoothing is often arbitrary. One critical statistical issue in fMRI analysis is the balance between Type I and II error rates. We first demonstrate the influence of the degree of smoothing and experimental factors on the trade-off between Type I and II error rates. Next, we propose to use second-generation p-values (SGPV) as an inference tool instead of the traditional p-values for hypothesis testing. By allowing the interval null hypothesis, we have shown that SGPV is able to alleviate the critical statistical issue by controlling Type I error rate more steadily while obtaining enough power.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectEMR
dc.subjectPheWAS
dc.subjectfMRI
dc.subjectStatistical analysis
dc.subjectstudy design
dc.subjectType I error rate
dc.subjectType II error rate
dc.subjectp-value
dc.subjectMultiple comparison
dc.subjectSecond-generation p-values
dc.subjectSGPV
dc.subjectInterval null
dc.subjectHypothesis testing
dc.titleCoping With Complexities in High Dimensional Data: PheWAS in EMR and Statistical Inference in fMRI Data
dc.typeThesis
dc.date.updated2020-09-14T15:15:35Z
dc.type.materialtext
thesis.degree.namePhD
thesis.degree.levelDoctoral
thesis.degree.disciplineBiostatistics
thesis.degree.grantorVanderbilt University
dc.creator.orcid0000-0003-3660-4570


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record