Algorithms and Techniques for Automated Deployment and Efficient Management of Large-Scale Distributed Data Analytics Services

Bhattacharjee, Anirban

Algorithms and Techniques for Automated Deployment and Efficient Management of Large-Scale Distributed Data Analytics Services

dc.contributor.advisor	Gokhale, Aniruddha S
dc.creator	Bhattacharjee, Anirban
dc.date.accessioned	2020-03-02T17:04:38Z
dc.date.available	2020-03-02T17:04:38Z
dc.date.created	2020-02
dc.date.issued	2020-02-20
dc.date.submitted	February 2020
dc.identifier.uri	https://ir.vanderbilt.edu/xmlui/handle/1803/9849
dc.description.abstract	The advent of the Internet of Things (IoT) has enabled smart applications, which provides near real-time and robust predictive data analytics. The backbone of these predictive analytics services is an underlying machine learning (ML) model. Multiple challenges manifest themselves in the context of automated deployment and efficient management of predictive analytics services across the cloud-edge spectrum. First, the development and evaluation of ML models require substantial expertise. Second, provisioning these ML-based data analytics applications often incurs complex deployment and configuration challenges due to handling of diverse ML libraries and frameworks, and incorporating a range of hardware and cloud platforms. Third, to handle the dynamic workload of the prediction requests, the resources need to scale up or down to minimize the operational cost while guaranteeing the Service Level Objective(SLO) of prediction tasks. Fourth, the accuracy of the ML model may degrade over time as new data arrives, which requires continuous model re-training. Model update tasks can be performed along with the background latency-critical tasks on the edge devices. However, this should not hamper the SLO of latency-critical jobs. To address these challenges, this doctoral research makes the following contributions: First, it defines a model-driven, template-based design for the rapid development of machine learning models. Second, it presents an automated service provisioning technique that enables the rapid and agile deployment of application components across the cloud-fog-edge spectrum with minimal domain expertise. Third, it describes our novel algorithms to proactively and dynamically scale the predictive analytics application components to minimize execution time according to the Service Level Objectives (SLO) while optimizing resource usage under variable workloads. Finally, it details a scalable and efficient framework for continual ML, particularly, Deep Learning-based model re-training on heterogeneous edge. The framework minimizes the DL model update time, while guarantees the SLO of the background latency-sensitive jobs.
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.subject	Distributed Systems
dc.subject	Resource Management
dc.subject	Big Data Analytics
dc.subject	Deep Learning
dc.subject	Cloud and Edge devices
dc.subject	Model-Driven Engineering
dc.title	Algorithms and Techniques for Automated Deployment and Efficient Management of Large-Scale Distributed Data Analytics Services
dc.type	Thesis
dc.date.updated	2020-03-02T17:04:38Z
dc.type.material	text
thesis.degree.name	PhD
thesis.degree.level	Doctoral
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Vanderbilt University
dc.creator.orcid	0000-0003-0184-7759

Files in this item

Name:: BHATTACHARJEE-DISSERTATION-2020.pdf
Size:: 6.843Mb
Format:: PDF

View/Open

Name:: AnirbanThesis.zip
Size:: 14.47Mb
Format:: application/

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record