Show simple item record

Anomaly Detection from Complex Temporal Sequences in Large Data

dc.creatorMack, Daniel Leif Campana
dc.date.accessioned2020-08-22T00:29:52Z
dc.date.available2015-04-18
dc.date.issued2013-04-18
dc.identifier.urihttps://etd.library.vanderbilt.edu/etd-04092013-182409
dc.identifier.urihttp://hdl.handle.net/1803/12087
dc.description.abstractAs systems become more complex and the amount of operational data collected from these systems increases proportionally, new challenges arise about how this data can be used to better understand system operations, and detect unsafe behavior. For large systems made up of a number of interacting subsystems, detecting anomalous behavior while avoiding false alarms becomes an important problem. Anomaly detection in such systems must navigate large amounts of data that include a large number of operational runs under a variety of operating conditions, sensors, and long sequences of time series data that cover different aspects of system operation. From a safety viewpoint, we wish to use this data to improve the effectiveness of existing fault detection schemes. Of equal importance, is the development of methods that can detect previously unknown and undetected anomalies from the vast amounts of available operational data. In this thesis, we have developed two approaches for anomaly detection in complex systems. The first approach uses supervised learning methods to improve the detection efficiency and accuracy of known anomalies in available diagnostic reasoners. The second approach uses unsupervised learning methods applied to the large amounts of data to identify previously undiscovered anomalies in system operations. Once anomalous instances are identified, we find the most discriminatory features, which then provide targeted information to help characterize the nature of the newly found anomalies for further study. The methodologies developed in this thesis have been successfully applied to two big data domains. In the first domain, aircraft flight operations data is used for targeted improvement of known anomalies to improve diagnostic accuracy of a vehicle reasoner. This data is also used for identifying previously undetected or unknown anomalies during the takeoff phase of aircraft flight, which are then evaluated in terms of their potential impact on aviation safety. In the second domain, data recorded from pitches thrown in Major League Baseball games is used with our exploratory approach to identify anomalous games for individual pitchers, and then characterize these games in terms of the specific pitch types that differed from the nominal set thrown by these pitchers.
dc.format.mimetypeapplication/pdf
dc.subjectbaseball
dc.subjectaviation safety
dc.subjectcomplexity measures
dc.subjectanomaly detection
dc.titleAnomaly Detection from Complex Temporal Sequences in Large Data
dc.typedissertation
dc.contributor.committeeMemberGabor Karsai
dc.contributor.committeeMemberXenofon Koutsoukos
dc.contributor.committeeMemberJulie A. Adams
dc.contributor.committeeMemberDoug Fisher
dc.type.materialtext
thesis.degree.namePHD
thesis.degree.leveldissertation
thesis.degree.disciplineComputer Science
thesis.degree.grantorVanderbilt University
local.embargo.terms2015-04-18
local.embargo.lift2015-04-18
dc.contributor.committeeChairGautam Biswas


Files in this item

Icon

This item appears in the following Collection(s)

Show simple item record