Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices

Posted 银灯玉箫

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices相关的知识,希望对你有一定的参考价值。

Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices

2021, Yanqi Zhang, ASPLOS

引用格式

Summary

写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。注:写文章summary切记需要通过自己的思考,用自己的语言描述。忌讳直接Ctrl + c原文。

Research Objective(s)

Microservice complicate resource management, as dependencies between them introduce backpressure effects and cascading QoS violations.

We present Sinan, a data-driven cluster manager for interactive cloud microservices that is online and QoS-aware. Sinan leverages a set of scalable and validated machine learning models to determine the performance impact of dependencies between microservices, and allocate appropriate resources per tier in a way that perserves the end-to-end tail latency target.
backpressure
美 [bækp’reʃə]
英 [bækp’reʃə]
n.背压;回压
网络半双工背压;反向压力;反压
cascading 级联

Background / Problem Statement

研究的背景以及问题陈述:作者需要解决的问题是什么?
Microservices introduce new system challenges, especially in resource management, since the complex topologies of microservice dependencies exacerbate queueing effects, and introdcue cascading Quality of Serivce (Qos) viloation that are difficult to identify and correct in a timely mananer.

Microservices are by design mostly stateless, hence their performance is defined by their CPU allocation. Given this, Sinan primarily focuses on allocating CPU resourcs to each tier[26].

  1. Dependencies among ties
  2. System complexity
    Second, the application may include third-party software whose source code cannot be instrumented. Alternatively, expecting the use to express each tier's resource sensitivity is problematic, as users already face difficulties correctly reserving resources for simple, monolithic workloads, leading to well-documented underutilization, and the impact of microservice dependencies is especially hard to assess, even for expert developers.
  3. Delayed queued effect
  4. Boudaries of resource allocation space

Method(s)

作者解决问题的方法/算法是什么?是否基于前人的方法?基于了哪些?

Sinan first uses an efficient space exploration algorithm to examine the space of possible resource allocations, especially focusing on corner cases that introduce QoS violations. This yields a training dataset used to train two models: Convolutional Neural Network(CNN) model for detailed short-term performance predcition, and a Boosted Trees model that evaluates the long-term performance evolution. The combination of the two model allows Sinan to both examine the near-future outcome of a resource allocation, and to account for the system’s inertia in building up queues with higher accuracy than a single model examining both time windows.
inertia
美 [ɪ’nɜrʃə]
英 [ɪ’nɜː®ʃə]
n.惯性;惰性;缺乏活力;保守
网络惯量;不活动;无力

2.3 Management Chanllenges & the Need for ML

The resource scheduler should have a global view of the microservice graph and be able to anticipate the impact of dependencies on end-to-end performance.

Delayed queuenig effect

This delayed queueing effect highlights the need for ML to evaluate the long-term impact of resource allocations.

3. Machine learning models

To address this, we designed a two-stage model. First, a CNN that predicts the
end-to-end latency of the next timestep with high accuracy, and, second, a Boossted Trees(BT) model that estimates the proability for QoS violations further in the future, using the latent variable extracted by CNN. BT is generally less prone to overfitting than CNNs, since it has much fewer tunable hyperpameters than NNs; mainly the number of tress and tree depth.

Evaluation

作者如何评估自己的方法?实验的setup是什么样的?感兴趣实验数据和结果有哪些?有没有问题或者可以借鉴的地方?

Conclusion

作者给出了哪些结论?哪些是strong conclusions, 哪些又是weak的conclusions(即作者并没有通过实验提供evidence,只在discussion中提到;或实验的数据并没有给出充分的evidence)?

Notes

(optional) 不在以上列表中,但需要特别记录的笔记。

References

(optional) 列出相关性高的文献,以便之后可以继续track下去。

以上是关于Sinan:ML-Based and QoS-Aware Resource Management for Cloud Microservices的主要内容,如果未能解决你的问题,请参考以下文章

(OrElse and Or) and (AndAlso and And) - 啥时候使用?

Alert and Action sheets and Timer and Animation

D. Kuro and GCD and XOR and SUM

pure and dirtyasl

Ocaml中“type ...and”和“let ...and”之间的范围不一致

不敢乱敲了,5个And. 就能让Google Docs崩溃