Show simple item record

Privacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory

dc.contributor.advisorMalin, Bradley A.
dc.creatorWan, Zhiyu
dc.date.accessioned2020-12-29T15:30:29Z
dc.date.available2020-12-29T15:30:29Z
dc.date.created2020-12
dc.date.issued2020-11-18
dc.date.submittedDecember 2020
dc.identifier.urihttp://hdl.handle.net/1803/16396
dc.description.abstractIn the big data era, person-specific data are being collected in an unprecedented manner. Given the potential wealth of insights in personal data, many organizations aim to share data while protecting privacy by sharing de-identified data, but are concerned because various demonstrations show such data can be re-identified. A wide array of deterrents have been designed to mitigate concerns, some of which are technical (e.g., obfuscating data), while others are more social (e.g., legal contracts). However, these investigations have focused on worst-case scenarios and spurred the adoption of data sharing practices that unnecessarily impede research. A formal re-identification risk assessment is required to help data sharers make better decisions about how to share data. Game-theoretic approaches, which model rational interactions among the parties involved, can optimally balance utility and risks in data sharing scenarios. I utilize a game-theoretic lens to develop more effective, quantifiable protections for data sharing. This is a fundamentally different approach because it accounts for adversarial behavior and capabilities and tailors protections to anticipated recipients with reasonable resources. I demonstrate this approach with large-scale real-world genomic datasets and show risks can be balanced against utility more effectively than traditional approaches. Confronting high dimensionality in practical scenarios, I develop AI algorithms to accelerate the solution search. I find it is possible to achieve zero risk, in that the recipient never gains from re-identification, while sharing almost as much data as the optimal solution that allows for a small amount of risk. Recognizing that such models are dependent on a variety of parameters, I perform extensive sensitivity analyses to show that my findings are robust to their fluctuations. My dissertation focuses on answering theoretical questions about the privacy-preserving data sharing problems in multi-stage adversarial scenarios and designing practical algorithms for game-solving in high-dimensional environments. I tailor my approaches for building scalable systems demanded by modern big data applications. The game-theoretic methodology that I examine using demographic, genomic, and phenotypic data has the potential to be applied to other data types and be regarded as a general data protection methodology.
dc.format.mimetypeapplication/pdf
dc.language.isoen
dc.subjectData sharing
dc.subjectPrivacy
dc.subjectRe-identification
dc.subjectRisk assessment
dc.subjectGame theory
dc.subjectGenomic data
dc.subjectSummary statistics
dc.subjectAdversarial modeling
dc.subjectGenetic algorithm
dc.subjectSensitivity analysis
dc.titlePrivacy-Preserving Sharing of High-Dimensional Data based on Computational Game Theory
dc.typeThesis
dc.date.updated2020-12-29T15:30:30Z
dc.type.materialtext
thesis.degree.namePhD
thesis.degree.levelDoctoral
thesis.degree.disciplineComputer Science
thesis.degree.grantorVanderbilt University Graduate School
dc.creator.orcid0000-0003-3752-5778


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record