Hot Topics on Data Center (HotDC) 2018

Posted 2021-01-17 tinoryj

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Hot Topics on Data Center (HotDC) 2018相关的知识，希望对你有一定的参考价值。

Keynote Session

Accelerate Machine Intelligence: An Edge to Cloud Continuum

Hadi Esmaeilzadeh - UCSD

Background

open source: http://act-lab.org/artifacts

Data grows at an unprecedented rate
new landscape of computing: personalize and targeted experience for users
growing gap between data and compute
power/energy efficiency is a primary concern
approximate computing
AxGames: https://www.researchgate.net/publication/303905276_AxGames_Towards_Crowdsourcing_Quality_Target_Determination_in_Approximate_Computing
machines learn to extract insights from data - two disjoin solutions for ml
distrubute computer + FPGA / ASIC chips
don‘t use vhdl / verlog language in the full stack for normal user

CoSMIC stack

how to distribute

understanding machine learning - solving optimize problem
abstraction between algorithm and acceleration system - parallelized stochastic gradient descent solver(to fpga gpu asic cgra xeon phi)
leverage linearity of differentiation for distributed learning
programming and compilation
- build a new language for math
- dataflow graph generation

how to design customizable accelerator

multi-threading acceleration
connectivity and bussing
PE architecture - make hardware simple

how to reduce overhead of distributed coordination

specialized system software in CoSIMC

benchmarks

16-node CoSIMC with UltraScale+FPGA offer 18.8x speedup over 16-node spark with E3 skylake cpu
using FPGA (66%) and software (34%) for speedup

RoboX Accelerator Architecture

DNNs tolerate low-bitwidth operations - bit-level

Making Cloud Systems Reliable and Dependable: Challenges and Opportunities

Lidong Zhou- MSRA

Background

system reliability:

Fault Tolerance
Redundancies
State Machine Replication
Paxos
Erasure Coding

Real-World Gray Failures in Cloud

redundancies in data center networking
active device and link failure localization in data center
NetBouncer: large-Scale path probing and diagnosis
NetBouncer: leverage the power of scale
root cause of the gray failure - stuck due to network issue - heart beat still normal (request stuck)
Insight: should detect what the requesters errors
- critical gray failure are ovserviable
- from error handling to error reporting

Solution - Panorama

Analysis - automatically covert a software component into an in-situ observer
Runtime - observer send to local observation store(LOS)
- locate ob-boundary
- observations not always direct
- observations split to ob-origin & ob-sink
- match ob-origin & ob-sink
Detect what "requesters" see
- failure that matter are observable to requesters
- turn error handlers into error reporters
- enables construction of in-situ observers
- https://github.com/ryanphuang/panorma

Reliability of Large-Scale Distributed Systems

foundation reliability
rethink cloud reliability: new theory & new method
understand gray failure
systematic and comprehensive observations

paper: Gray Failure: The Achilles‘ Heel of Cloud-Scale Systems

Deconstructing RDMA-enabled Distributed Transactions: Hybrid is Better!

Haibo Chen - SJTU

Background

(Distributed) Transactions were slow
High cost for distributed TX - Usually 10s~100s of thousands of TPS - (SIGMOD‘12)
only 4% of wall-clock time spent in useful data processing

new features:

RDMA: remote direct memory access
- ultra low latency(5us)
- ultra high throughput
NVM: Non-volatile memory

An Active Line of Research of RDMA-enabled TX

DrTM - DrTM(SOSP 2015) DrTM-R(EuroSys 2016) DrTM-B(USENIX ATC 2017)
FaRM - FaRM-KV(NSDI 2014) FaRM-TX(SOSP 2015)
FaSST(OSDI 2016)
LITE(SOSP 2017)

Transaction(TX)s

protocols - OCC,2PL,SI...
impl on hardware devices - CX3,CX4,CX5,ROCE, one-side, two-side....
OLTP workloads - TPC-C, TPC-E, TATP, Smallbank

Main: Use RDMA in TXs

outlet:

RDMA primitive-level analysis
Phase-by-phase analysis for TX
DrTM+H: Putting it all together

content:

phase: Exe/Val/Log/Commit
offloading with one-side improves the performance
one-sided primitive has good scalability on modern RNIC
Execution framework & DrTM+H:https://github/com/SJTU-IPADS/drtmh

RDMA in Data Centers: from Cloud Computing to Machine Learning

Chuanxiong Guo - ByteDance

Background

Data Center Network (DCN) offer lot services
- single ownership
- large scale
- bisection bandwidth
TCP/IP not working well
- latency
- bandwidth
- processing overhead(40G) - 12% CPU at receiver & 6% CPU at sender

RDMA over Commodity Ethernet (RoCEv2)

no CPU overhead
single QP, 88Gb/s 1.7% CPU usage (TCP 8 connection 30-50Gb/s, client 2.6% & server 4.3% CPU)
RoCEv2 needs a lossless ethernet network
- PFC(priority-based flow control) hop-by-hop flow control
- DCQCN - sender-switch-receiver (RP-CP-NP)
the slow-receiver symptom - ToR tot NIC is 40Gb/s & NIC to server is 64Gb/s. NIC may generate large number of PFC pause frames

RDMA for DNN Training Acceleration

understanding using DNN
DNN Training: BP
Distributed ML training, GPUs, with mini-batch
RDMA acceleration : ResNet RNNs DNN (rdma performance better than tcp)

Highlighted Research Session

Congestion Control Mechanisms in Data Center Networks

Wei Bai - MSRA

DCN中实现低时延

排队时延 -PIAS(NSDI 2015)
丢包重发时延 - TLT

PIAS

Flow completion Time (FCT)是关键问题
流信息不能假设为已知、可以在现有设备上快速部署
PIAS performs Multi-level feedback queue (MLFQ) to emulate shortest job first (SJF)
three function in pias:
- package tagging
- switch
- rate control

TLT

同时达到Lossy & Loss-Less两种网络的好处
using PFC to eliminate congestion packet losses
packet loss :
- middle - fast retransmissions
- tail - Timeout retransmissions
- 识别重要包, 当交换机队列超过阈值时丢掉非重要包

Understanding the challenges of Scaling Distributed DNN Training

Cheng Li - USTC

Deep Learning growth fast
DNN - Deep Neural Networks
benefit: more data / bigger models / more computation
Jeff Dean - Google

Distributed DNN

Model or data parallelism
- data parallelism is a primary choice
BSP / ASP - BSP is choice (ASP可能不收敛)
- Bulk Synchronous Parallel - 确定时间同步
- Asynchronous Parallel
net server other bottlenecks for parallelism
通过测试确定影响计算能力的制约条件
- 数据压缩传输带来的压缩开销
系统设计
- 弹性系统设计
- 短板效应 - 最终计算速度的制约
- 如何快速调整系统的规模等 - message bus流处理 - 用生产者消费者模型

Octopus: an RDMA-enable Distributed Persistent Memory File System

Youyou Lu - Tsinghua

分布式文件系统设计
非易失性内存 - 内存存储
DRAM Limitations
- Cell Density
- Refresh - 性能/功耗
NVDIMM内存 - 断电后存储数据
Intel 3D Xpoint - 接近内存的延迟, 高容量, 断电非易失
RDMA - 高性能环境下使用
DiskGluster - latency来自于HDD | MemGluster - latency来自于软件
RDMA-enable Distributed File System
- shard data mamangment
- New data flow strategies
- Efficient RPC design
- Concurrent control

Design

I/O处理
- 将所有NVMM组织为同一空间
- 降低DFS中的数据拷贝(7次降到4次)
- server扫描数据存储地址,client获取地址之后自己获取(将任务转嫁给client)
Metadata RPC
Collect-Dispatch Distributed Transaction
性能测试
- 局域网服务期间测试 - 带宽可以达到网络带宽的88%
- 在Hadoop平台下进行测试

Short Talk

Computer Organization and Design Course with FPGA Cloud, Ke Zhang (ICT, CAS)

新的技术AI IOT
提高新的软硬协同设计能力 - CPUGPUFPGAGPUASIC
ZyForce平台 - 虚拟FPGA实验

ActionFlow：A Framework for Fast Multi-Robots Application Development, Jimin Han (UCAS)

国科大大四 - 2018.8开始
机器人应用快速开发

Labeled Network Stack, Yifan Shen (ICT, CAS)

Caching or Not: Rethinking Virtual File System for Non-Volatile Main Memory, Ying Wang (ICT, CAS)

Data Motif-based Proxy Benchmarks for Big Data and AI Workloads, Chen Zheng (ICT, CAS)

以上是关于Hot Topics on Data Center (HotDC) 2018的主要内容，如果未能解决你的问题，请参考以下文章