Shenango NSDI 2019Achieving High CPU efficiency for Latency-sensitive Datacenter workloads

Posted 银灯玉箫

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Shenango NSDI 2019Achieving High CPU efficiency for Latency-sensitive Datacenter workloads相关的知识,希望对你有一定的参考价值。

附 1. 我的论文笔记框架(Markdown格式)

Title(文章标题)

Year, Authors, Journal Name

引用格式

Summary

写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。注:写文章summary切记需要通过自己的思考,用自己的语言描述。忌讳直接Ctrl + c原文。

Research Objective(s)

作者的研究目标是什么?

Background / Problem Statement

Core allocations impose overhead.
The speed at which cores can be reallocated is lultimately limited by reallocation overheads:determing that a core should be reallocated, instructing an application to yield a core.
Estimiating required cores is difficult.
Previous systems have used application-level metrics such as latency, throughput, or core utilization to eastimate core requirements over long time scales. However, these metrics cannot be appiced over microsecond-scale intervals. Instead, Shenango aims to estimate instananeous load, but this is non-trival.

Method(s)

Shenago’s goal is to optimize CPU efficiency by granting each application as few cores asn possible while avoiding a condition we call compute congesting, in wich failing to grant an addictional core to an application would cause work to be delayed delayed by more than a few microseconds. This objective frees up underused cores for use by other applications, while still keeping tail latency in check.

The contesting detection algorithm determines whether a runtime is overloaded or not based on twoo sources of load: queued threads and queued ingress packets.

If any item is found to be present in a queue for two consecutive runs of the detection algorithm, it indicates that a packet or thread queued for at least 5 us. Because queued packets or threads represent work that could be handled in parallel on another core, the runtime is deemed to be “congested”, and the IOKernelgrants it one additional core. We found that the duration of queueing is a more robust signal than the length of a queue, because using queue length requires carefully tuning a threshold parameter for different duration of requests.

Which cores for each application

When deciding which core to grant to an application, the IOKernel considers three factors:

  1. Hyuer-threading efficieny. Intel’s HyperThreads enable two hardware threads to run on the same physcial core. These threads share processor resources such as L1 and L2 caches eanc execution units**, but are exposed as two separate logical cores**. If hyper-threads from the same application run on the same physical core, they benefit from cache locality; if hyper-threads from different applications shares the same physcial core, they can contend for cache space and degrade each others’ performance.
  2. Cahce locality. If an application’s state is already presetn in the L1/L2 cache of a core it is newly granted, it can avoid many time-consuming cache misses.
  3. Latency. Preempting a core and waiting for it to become available takes time, and wastes cycles that could be spent doing useful work. Thus, the IO Kernel always grants an idle core instead of preempting a busy core, if an idle core exists.

Evaluation

作者如何评估自己的方法?实验的setup是什么样的?感兴趣实验数据和结果有哪些?有没有问题或者可以借鉴的地方?

Conclusion

作者给出了哪些结论?哪些是strong conclusions, 哪些又是weak的conclusions(即作者并没有通过实验提供evidence,只在discussion中提到;或实验的数据并没有给出充分的evidence)?

Notes

(optional) 不在以上列表中,但需要特别记录的笔记。

References

(optional) 列出相关性高的文献,以便之后可以继续track下去。

以上是关于Shenango NSDI 2019Achieving High CPU efficiency for Latency-sensitive Datacenter workloads的主要内容,如果未能解决你的问题,请参考以下文章

Check-N-Run: a Checkpointing System for Training Deep Learning Recommendation Models | NSDI‘ 22

iOS中如何从URL获取webp图片大小?

区块链相关论文研读4: Monoxide异步共识组

UICollectionView 未加载

添加约束时出错

如何遍历对象字典及其键