Show simple item record

Identifying high quality MEDLINE articles and web sites using machine learning

dc.creatorAphinyanaphongs, Yindalon
dc.date.accessioned2020-08-23T16:15:53Z
dc.date.available2010-04-07
dc.date.issued2007-12-28
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-12072007-141136
dc.identifier.urihttp://hdl.handle.net/1803/15196
dc.description.abstractIn this dissertation, I explore the applicability of text categorization machine learning methods to identify clinically pertinent and evidence-based articles in the literature and web pages on the internet. In the first series of experiments, I found that text categorization techniques identify high quality articles in internal medicine in the content categories of prognosis, diagnosis, etiology, and treatment better than the Clinical Query Filters of Pubmed. In a second set of experiments, I established that the text categorization models generalized both to time periods outside the training set and to areas outside of internal medicine including pediatrics, oncology, and surgery. My third set of experiments revealed that text categorization models built for a specific purpose identified articles better than both bibliometric (number of citations and impact factor) and web-based measures (Google PageRank, Yahoo WebRanks, and total web page hit count). In the fourth set of experiments, I built models for purpose, format, and additional content categories from a labeled gold standard that have high discriminatory power. Furthermore, we built a system called EBMSearch that implements these models to all of MEDLINE. Finally I extended these methods to the web and built the first validated models that identify websites that make false cancer treatment claims outperforming previous unvalidated models and PageRank by 30% area under the receiver operating curve. In conclusion, machine learning-based text categorization methods provide a powerful framework for identifying clinically applicable articles in the medical literature and the Internet.
dc.format.mimetypeapplication/pdf
dc.subjectinformation retrieval
dc.titleIdentifying high quality MEDLINE articles and web sites using machine learning
dc.typedissertation
dc.contributor.committeeMemberDouglas Hardin
dc.contributor.committeeMemberIoannis Tsamardinos
dc.contributor.committeeMemberSteven Brown
dc.contributor.committeeMemberDan Masys
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineBiomedical Informatics
thesis.degree.grantorVanderbilt University
local.embargo.terms2010-04-07
local.embargo.lift2010-04-07
dc.contributor.committeeChairConstantin Aliferis


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record