Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation

Brown, Laura Elizabeth

Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation

dc.creator	Brown, Laura Elizabeth
dc.date.accessioned	2020-08-23T16:10:59Z
dc.date.available	2011-12-11
dc.date.issued	2009-12-11
dc.identifier.uri	https://etd.library.vanderbilt.edu/etd-12042009-124713
dc.identifier.uri	http://hdl.handle.net/1803/15107
dc.description.abstract	The focus of my research was to develop several novel computational techniques for discovering informative patterns and complex relationships in biomedical data. First, an efficient, heuristic method was developed to search for the features with largest absolute weight in a polynomial Support Vector Machine (SVM) model. This algorithm provides a new ability to understand, conceptualize, visualize, and communicate polynomial SVM models. Second, a new variable selection algorithm, called Feature Space Markov Blanket (FSMB), was designed. FSMB combines the advantages from kernel methods and Markov Blanket-based techniques for variable selection. FSMB was evaluated on several simulated, "difficult" distributions where it identified the Markov Blankets with high sensitivity and specificity. Additionally, it was run on several real world data sets; the resulting classification models are parsimonious (for two data sets, the models consisted of only 2-3 features). On another data set, the Markov Blanket-based method performed poorly; FSMB's improved performance suggests the existence of a complex, multivariate relationship in the underlying domain. Third, a well-cited algorithm for learning Bayesian networks (Max-Min Hill-Climbing, MMHC) was extended to locally learn a region of a Bayesian network. This local method was compared to MMHC in an empirical evaluation. The local method took, as expected, a fraction of the time to learn regions compared to MMHC; of particular interest, the local technique learned regions with equal or better quality. Finally, an approach using the formalism of causal Bayesian networks was designed to make predictions under manipulations; this approach was used in a submission to the Causality Challenge. The approach required the use and combination of the three methods from this research and many state-of-the-art techniques to build and evaluate models. The results of the competition (the submission performed best on one of the four tasks presented) illustrate some of the strengths and weaknesses of causal discovery methods and point to new directions in the field. The methods explored are introductory steps along research paths to explore understanding SVM models, variable selection in non-faithful problems, identifying causal relations in large domains, and learning with manipulations.
dc.format.mimetype	application/pdf
dc.subject	support vector machines
dc.subject	causal discovery
dc.subject	Bayesian network
dc.subject	variable selection
dc.title	Novel Methods for Variable Selection in Non-faithful Domains, Understanding Support Vector Machines, Learning Regions of Bayesian Networks, and Prediction Under Manipulation
dc.type	dissertation
dc.contributor.committeeMember	Constantin Aliferis
dc.contributor.committeeMember	Daniel Masys
dc.type.material	text
thesis.degree.name	PHD
thesis.degree.level	dissertation
thesis.degree.discipline	Biomedical Informatics
thesis.degree.grantor	Vanderbilt University
local.embargo.terms	2011-12-11
local.embargo.lift	2011-12-11
dc.contributor.committeeChair	Ioannis Tsamardinos
dc.contributor.committeeChair	Douglas Hardin

Files in this item

Name:: leb_phd_SUBMITTED.pdf
Size:: 1.309Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record