CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model相关的知识,希望对你有一定的参考价值。

1.  3 traditional ways computes run faster

  • Faster clocks
  • More work/clock cycle  
  • More processors

2. Parallelism

  • A high end Gpu contains over 3,000 arithmatic units,ALUs, that can simultanously run 3,000 arithmetic operations. GPU can have tens of thousands of parallel pieces of work all active at the same time.
  • A modern GPU may be runing up to 65,000 concurrent threads. 

3. GPGPU--General purpose Programmability on the Graphics Grocessing Unit.

4.  How Are CPUs Getting Faster?

    More transistors avaliable for computation.

5. Why don`t we keep increasing clock speed?

  Runing a billion transistors generate an awful lot of heat,and we can`t keep all these processors cool.

6. What kind of processors are we building

  A: Why are traditional CPU-like processors not the most energy efficient processors?

  Q: Traditonal CPU-like processors rise up in flexibility and performance but expensive in terms of power.

  We might choose to build simpler control structures and instead devote those transistors to supporting more computation to the data path.The way that we`re going to build that data path in the GPU is by building a large number of parallel compute units. Individually, these compute units are small,simple,and power efficient.

7.  Build a power efficient processor

  Optimizing 

    • Minimizing Latency(execute time)
    • Throughput(tasks completed unit time, stuff/time, jobs/hour)  

       Notes:these two goals are not necessarily aligned.        

8. Latency vs Bandwidth

  Improved latency often leads to improved througput,and vise versa.But the GP designers are really prioritizing througput.

9. Core GPU design tents

  • Lots of simple compute units and trade simple control for more compute
  • Explicitly(显示) parallel programming model  
  • Optimize for througput,not latency

10. GPU from the point of view of the developer

  8 core Intel Ivy Bridge processor,has 8 cores,each core has 8-wide AVX vector operations,each core supports two simultaneously running threads.Multiply those together will get 128-way parallelism.

以上是关于CUDA Intro to Parallel Programming笔记--Lesson 1 The GPU Programming Model的主要内容,如果未能解决你的问题,请参考以下文章

深度学习部署笔记(十五): CUDA_Run_Time_API_parallel_多流并行,以及多流之间互相同步等待的操作方式

CUDA:如何使用 barrier.sync

PyTorch笔记: GPU上训练的模型加载到CPU/错误处理Attempting to deserialize object on a CUDA device but torch.cuda.is_a

Intro to ShockBurst/Enhanced ShockBurst

Intro to Arrays

Intro to Playbooks