Show simple item record

Using observational data in healthcare research: New methods to design, conduct, and analyze efficient two-phase designs

dc.contributor.advisorShepherd, Bryan
dc.contributor.advisorTao, Ran
dc.creatorLotspeich, Sarah Camilla
dc.date.accessioned2021-06-22T17:02:17Z
dc.date.created2021-05
dc.date.issued2021-05-06
dc.date.submittedMay 2021
dc.identifier.urihttp://hdl.handle.net/1803/16679
dc.description.abstractClinically meaningful variables are increasingly becoming available in observational databases. However, these data can be error-prone, giving misleading results in statistical inference. Data validation can help maintain data quality, but validating entire databases is often unrealistic. A cost-effective solution is the two-phase design: error-prone variables are observed for all patients during Phase I and that information is used to select patients for validation (i.e., data auditing) during Phase II. In this dissertation, we propose methods to promote the practical and statistical efficiency of two-phase designs to ensure the integrity of observational cohort data. First, given the resource constraints imposed upon data audits, targeting the most informative patients is paramount for efficient statistical inference. Using the asymptotic variance of the maximum likelihood estimator, we compute the most efficient design under complex outcome and exposure misclassification. Since the optimal design depends on unknown parameters, we propose a multi-wave design to approximate it in practice. We demonstrate the superior efficiency of the optimal designs through extensive simulations and illustrate their implementation in observational HIV studies. Second, sending trained auditors to sites (“travel-audits”) can be costly, particularly in a multi-national cohort, so we investigate the efficacy of training sites to conduct “self-audits.” In 2017, eight research groups audited a subset of their patient records, comparing abstracted research data to the original clinical source documents. Additionally, three sites were randomly selected for travel-audits. We found similar error rates between self- and travel-audits, suggesting self-audits could be a lower-cost alternative for continued data quality. Finally, to obtain efficient odds ratios with partially-audited, error-prone data, we propose a semiparametric analysis approach that uses all information and accommodates many error mechanisms. The outcome and covariates can be error-prone, with correlated errors, and the selection of Phase II records can depend on Phase I data in an arbitrary manner. We devise an EM algorithm to obtain estimators that are consistent, asymptotically normal, and asymptotically efficient. We demonstrate the advantages of the proposed methods through extensive simulations and provide applications to a multi-national HIV cohort.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectdata audits
dc.subjectelectronic health records
dc.subjectEHR
dc.subjectHIV/AIDS
dc.subjectlogistic regression
dc.subjectmeasurement error
dc.subjectmissing data
dc.titleUsing observational data in healthcare research: New methods to design, conduct, and analyze efficient two-phase designs
dc.typeThesis
dc.date.updated2021-06-22T17:02:17Z
dc.type.materialtext
thesis.degree.namePhD
thesis.degree.levelDoctoral
thesis.degree.disciplineBiostatistics
thesis.degree.grantorVanderbilt University Graduate School
local.embargo.terms2022-05-01
local.embargo.lift2022-05-01
dc.creator.orcid0000-0001-5380-2427
dc.contributor.committeeChairSchildcrout, Jonathan


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record