Morpheus: Towards Automated SLOs for Enterprise Clusters

Posted 银灯玉箫

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Morpheus: Towards Automated SLOs for Enterprise Clusters相关的知识,希望对你有一定的参考价值。

Title(文章标题)
2016, Sangeeth Abdu Jyothi, OSDI

Summary
写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。注:写文章summary切记需要通过自己的思考,用自己的语言描述。忌讳直接Ctrl + c原文。

Research Objective(s)
Modern resource management frameworks for large-scale analytics leave unresolved the problematic tension between high cluster utilization and job’s performance predictability-- respectively coveted by operators and users.
covet
美 [ˈkʌvɪtid]
英 [ˈkʌvɪtid]
v.垂涎;渴望;妄想(别人东西)
网络梦寐以求的;令人垂涎的;令人羡慕的

We address this in Morpheus, a nwe system that: 1) codifies implict user expectations as explict Service Level Objectives(SLOs), inferred from historical data, 2) enforces SLOs using novel scheduling techniques that isolate jobs from sharing-induced performance variability, and 3) mitigates inherent performance variance (e.g. due to failures) by means of dynamic reprovisioning of jobs.

Background / Problem Statement
Unpredictability comes from several sources, which roughly can be grouped as

  • Sharing-induced - performance variability caused by inconsistent allocations of resources across job runs
  • Inherent - due to changes on the job input(size, skew, availability), source code tweaks, failures – this si endemic even in dedicated and lightly used clusters.

Method(s)

  • (a) Data-dependencies in the Provenance Graph(PG).
    PG gathers logs (application logs, filesystem logs…)
    . (b) Resource utilization of each run in a Telemetry-History infrastructure database(TH).

  • (a) Form the PG it derives a dealine d – the SLO.
    SLO — derive a dealine for the periodic job-- as time which downstream consumers read a job’s output.
    (b) From the TH, it derives a model of the job resource demand over time, R*.
    time-seris of resource utilization used by the job every one minute.

    we refer to R* as the job resource model.

  • Morpheus enforces SLOs via recurring reservations:
    (a) Adds a recurring reservation for JobX into the cluster agenda-- this set aside resources over time based on the job resource model R*.

  • Formally, skyline for the i-th instance can defined by the sequece s i , k s_i,k si,k
    , the average number of containers used for each time-step k k k. Using a collection of sequece as input, the optimization problem outputs the vector s = ( s 1 , . . . . . s K ) s=(s_1,.....s_K) s=(s1,.....sK) – the number of containers reserved at each time-step.
    Our optimization ojective is a cost function which is a linear combination of two term: One term which penalizes for “over-allocations” and another term which penalizes fpr “under-allcation”
    minimize a ∗ A 0 ( s ) + ( 1 − a ) A u ( s ) a*A_0(s) +(1-a)A_u(s) aA0(s)+(1a)Au(s)

  • Over-allocation penalty is defined as the average over-allocation of containers.

  • Using Linear Programming to solve this problem.

    (b) New instances of JobX run within the recurring reservation(dedicated resources).

  1. The Dynamic Reprovisioning componet monitors the job progress online, and increases/decreases the reservation, to mitigate inherent execution variability.
    Reprovisioning is triggered when a job resource demand(used containers plus pending ask) exceeds the resources allocated in the predicted skyline.

  2. Morpheus constantly feeds back into STep 2 the PG and TH information of the new runs for continuous learning and refinement of the SLO and the job resource model.

Evaluation
作者如何评估自己的方法?实验的setup是什么样的?感兴趣实验数据和结果有哪些?有没有问题或者可以借鉴的地方?

Conclusion
作者给出了哪些结论?哪些是strong conclusions, 哪些又是weak的conclusions(即作者并没有通过实验提供evidence,只在discussion中提到;或实验的数据并没有给出充分的evidence)?

Notes
(optional) 不在以上列表中,但需要特别记录的笔记。

References
(optional) 列出相关性高的文献,以便之后可以继续track下去。

以上是关于Morpheus: Towards Automated SLOs for Enterprise Clusters的主要内容,如果未能解决你的问题,请参考以下文章

Morpheus: Towards Automated SLOs for Enterprise Clusters

Morpheus: Towards Automated SLOs for Enterprise Clusters

Vulnhub之Matrix Breakout 2 Morpheus靶机详细测试过程

Attitude Towards Friends

Towards Deep Learning Models Resistant to Adversarial Attacks

Evolutionary approaches towards AI: past, present, and future