Morpheus: Towards Automated SLOs for Enterprise Clusters
Posted 银灯玉箫
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Morpheus: Towards Automated SLOs for Enterprise Clusters相关的知识,希望对你有一定的参考价值。
Title(文章标题)
2016, Sangeeth Abdu Jyothi, OSDI
Summary
写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。注:写文章summary切记需要通过自己的思考,用自己的语言描述。忌讳直接Ctrl + c原文。
Research Objective(s)
Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job’s performance predictability-- respectively coveted by operators and users.
covet
美 [ˈkʌvɪtid]
英 [ˈkʌvɪtid]
v.垂涎;渴望;妄想(别人东西)
网络梦寐以求的;令人垂涎的;令人羡慕的
We address this in Morpheus, a nwe system that: 1) codifies implict user expectations as explict Service Level Objectives(SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g. due to failures) by means of dynamic reprovisioning of jobs.
Background / Problem Statement
Unpredictability comes from several sources, which roughly can be grouped as
- Sharing-induced - performance variability caused by inconsistent allocations of resources across job runs
- Inherent - due to changes on the job input(size, skew, availability), source code tweaks, failures – this si endemic even in dedicated and lightly used clusters.
Method(s)
-
(a) Data-dependencies in the Provenance Graph(PG).
PG gathers logs (application logs, filesystem logs…)
. (b) Resource utilization of each run in a Telemetry-History infrastructure database(TH). -
(a) Form the PG it derives a dealine d – the SLO.
SLO — derive a dealine for the periodic job-- as time which downstream consumers read a job’s output.
(b) From the TH, it derives a model of the job resource demand over time, R*.
time-seris of resource utilization used by the job every one minute.we refer to R* as the job resource model.
-
Morpheus enforces SLOs via recurring reservations:
(a) Adds a recurring reservation for JobX into the cluster agenda-- this set aside resources over time based on the job resource model R*. -
Formally, skyline for the i-th instance can defined by the sequece s i , k s_i,k si,k
, the average number of containers used for each time-step k k k. Using a collection of sequece as input, the optimization problem outputs the vector s = ( s 1 , . . . . . s K ) s=(s_1,.....s_K) s=(s1,.....sK) – the number of containers reserved at each time-step.
Our optimization ojective is a cost function which is a linear combination of two term: One term which penalizes for “over-allocations” and another term which penalizes fpr “under-allcation”
minimize a ∗ A 0 ( s ) + ( 1 − a ) A u ( s ) a*A_0(s) +(1-a)A_u(s) a∗A0(s)+(1−a)Au(s) -
Over-allocation penalty is defined as the average over-allocation of containers.
-
Using Linear Programming to solve this problem.
(b) New instances of JobX run within the recurring reservation(dedicated resources).
-
The Dynamic Reprovisioning componet monitors the job progress online, and increases/decreases the reservation, to mitigate inherent execution variability.
Reprovisioning is triggered when a job resource demand(used containers plus pending ask) exceeds the resources allocated in the predicted skyline. -
Morpheus constantly feeds back into STep 2 the PG and TH information of the new runs for continuous learning and refinement of the SLO and the job resource model.
Evaluation
作者如何评估自己的方法?实验的setup是什么样的?感兴趣实验数据和结果有哪些?有没有问题或者可以借鉴的地方?
Conclusion
作者给出了哪些结论?哪些是strong conclusions, 哪些又是weak的conclusions(即作者并没有通过实验提供evidence,只在discussion中提到;或实验的数据并没有给出充分的evidence)?
Notes
(optional) 不在以上列表中,但需要特别记录的笔记。
References
(optional) 列出相关性高的文献,以便之后可以继续track下去。
以上是关于Morpheus: Towards Automated SLOs for Enterprise Clusters的主要内容,如果未能解决你的问题,请参考以下文章
Morpheus: Towards Automated SLOs for Enterprise Clusters
Morpheus: Towards Automated SLOs for Enterprise Clusters
Vulnhub之Matrix Breakout 2 Morpheus靶机详细测试过程
Towards Deep Learning Models Resistant to Adversarial Attacks
Evolutionary approaches towards AI: past, present, and future