CS4022 High Performance 原理解说

Posted 实诚人病

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CS4022 High Performance 原理解说相关的知识,希望对你有一定的参考价值。

UNIVERSITY OF WARWICK
LEVEL 7 Open Book Assessment [2 hours]
Department of Computer Science
CS4022: High Performance Computing
Instructions

  1. Read all instructions carefully – and read through the entire paper at least
    once before you start writing.
  2. There are four questions. You should attempt two questions from Section
    A and the one question in Section B.
    You should not submit answers to more than the required number of
    questions.
  3. All questions will carry the same number of marks unless otherwise stated.
  4. You should handwrite your answers either with paper and pen or using an
    electronic device with a stylus (unless you have special arrangements for
    exams which allow the use of a computer). Start each question on a new
    page and clearly mark each page with the page number, your student id and
    the question number.
    Handwritten notes must be scanned or photographed and all individual
    solutions should (if you possibly can) be collated into a single PDF with pages
    in the correct order.
    You must upload two files to the AEP: your PDF of solutions and a completed
    cover sheet.
    You must click FINISH ASSESSMENT to complete the submission process.
    After you have done so you will not be able to upload anything further.
  5. Please ensure that all your handwritten answers are written legibly, preferably
    in dark blue or black ink. If you use a pencil ensure that it is not too faint to be
    captured by a scan or photograph.
  6. Please check the legibility of your final submission before uploading. It is your
    responsibility to ensure that your work can be read.
  7. You are allowed to access module materials, notes, resources, references
    and the internet during the assessment.
    2
  8. You should not try to communicate with any other candidate during the
    assessment period or seek assistance from anyone else in completing your
    answers. The Computer Science Department expects the conduct of all
    students taking this assessment to conform to the stated requirements.
    Measures will be in operation to check for possible misconduct. These will
    include the use of similarity detection tools and the right to require live
    interviews with selected students following the assessment.
  9. By starting this assessment you are declaring yourself fit to undertake it. You
    are expected to make a reasonable attempt at the assessment by answering
    the questions in the paper.
    Please note that:
  10. You must have completed and uploaded your assessment before the 24
    hour assessment window closes.
  11. You have an additional 45 minutes beyond the stated length of the paper to
    allow for downloading and uploading the assessment, your files and
    technical delays.
  12. For further details you should refer to the AEP documentation.
    Use the AEP to seek advice immediately if during the assessment period:
    • you cannot access the online assessment;
    • you believe you have been given access to the wrong online assessment.
    Please note that technical support is only available between 9AM and 5PM (BST).
    Invigilator support will be also be available (via the AEP) between 9AM and 5PM
    (BST).
    Notify Dcs.exams@warwick.ac.uk as soon as possible if you cannot complete
    your assessment because:
    • you lose your internet connection;
    • your device fails;
    • you become unwell and are unable to continue;
    • you are affected by circumstances beyond your control (e.g. fire alarm).
    Please note that this is for notification purposes, it is not a help line.
    Your assessment starts below.
    3
    Section A
  13. This question is about fundamental knowledge.
    (a) What do we mean by the Granularity of Parallelism? Give four types of parallelism in
    order of granularity and provide an application example for each. [7]
    (b) Discuss how superthreading and hyperthreading reduce the waste of pipeline slots in the
    pipeline mechanism. [8]
    (c) Discuss the differences between scientific applications such as matrix multiplication and
    graph-based applications such as online-shopping recommendation. Focus your
    discussions on data structure, performance metric and key factors that affect the
    performance. [12]
    (d) Analyse the following two “for” loops in Listing 1. Describe whether the iterations of
    these two loops can be parallelised automatically by compilers and explain how you
    reached your conclusions. [8]
    Loop 1:
    for(i=1; i<=n; i++)
    {
    a[i]= b[i] + c[i];
    d[i]= a[i];
    }
    Loop 2:
    for(i=2; i<=n; i++)
    a[i]= b[i] + a[i-1];
    Listing 1: Two loops for Question 1(d)
    4
  14. This question is about parallel programming models.
    (a) The Synchronous mode is a communication mode in MPI. Explain why the Synchronous
    mode may incur higher communication overhead than the Standard mode. [7]
    (b) Assume there are two MPI processes running on different machines: P0 and P1. In p0,
    MPI_Send is first called to send message A to p1 and then MPI_Recv is called to receive
    message B from p1. In p1, MPI_Send is first called to send message B to p0 and then
    MPI_Recv is called to receive message A from p0. What will happen if the sizes of both
    message A and B exceed the system buffers managed by MPI? Explain why. [8]
    (c) A collective communication operation is performed by all relevant processes at the same
    time with the same set of parameters. However the parameters may have different
    meanings to different processes. Describe, using illustrative examples if necessary, the
    operations of the following two MPI collective communication calls. Further, discuss
    what the parameters in these functions mean to different processes.
    i) MPI_Bcast(void *buf, int count, MPI_Datatype type, int root, MPI_Comm
    Comm) [6]
    ii) MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
    void *recvbuf, int recvcount, MPI_Datatype recvtype, int root,
    MPI_Comm comm) [6]
    (d) MPI_Type_create_indexed_block can be used to construct the users’ own data types. The
    format of the function is as follows:
    MPI_Type_create_indexed_block ( int count,
    int blocklengths,
    int *array_of_displacements,
    MPI_Datatype oldtype,
    MPI_Datatype *newtype)
    Let oldtype ={(MPI_INT, 0), (MPI_CHAR, 2)} with the extent of 3 bytes.
    Let D=(2, 5, 10).
    Give the memory layout of newtype after calling
    MPI_Type_create_indexed_block (3, 2, D, oldtype, newtype) [8]
    5
  15. This question is about high performance computing systems.
    (a) Discuss the differences between multicore CPU and GPU in terms of architecture design
    and performance objective. [7]
    (b) The topology of node interconnection plays an important role in the performance of a
    Cluster system. Draw the topology of a 4-D hypercube. What are the values of node
    degree and bisection width of the topology? Discuss which aspect of network
    performance node degree and bisection width represent. [8]
    (c) Discuss the difference between Cluster systems and Grid systems. [8]
    (d) There are three potential methods to implement parallel I/O: 1) One process performs I/O
    operations for all other processes; 2) Each process reads or writes the data from or to a
    separate file; 3) Different processes access different parts of a common file. Discuss the
    advantages and disadvantages of each method. Which method of parallel I/O is most
    widely used nowadays? [12]
    Section B
  16. This question is about performance modelling.
    (a) Consider a 3-D grid of equal-sized cells. Assume that the volume of the grid is V and the
    grid is a cube (i.e., the length of the grid in each dimension is V
    1/3). Assume V=c×n,
    where c is the number of cells allocated to each processor and n is the number of
    processors. Derive the surface-to-volume ratios under 1-D, 2-D and 3-D decomposition.
    Further, analyse under what circumstances 2-D decomposition is better than 1-D
    decomposition. [12]
    (b) Discuss the drawbacks of using asymptotic analysis to evaluate the performance of an
    algorithm. Give an example for each drawback you list. [8]
    (c) Modelling the execution time of an application is a good way of evaluating the
    performance of the application. Discuss how to model the execution time of an
    application. The discussion should cover the modelling of both computation time and
    communication time, and the discussion should revolve around the various parameters
    used to model the execution time. [10]
    WX:codehelp

以上是关于CS4022 High Performance 原理解说的主要内容,如果未能解决你的问题,请参考以下文章

[High Performance Computing] {Udacity} L1: Course Information

Visual Studio 2013 是不是提供了对 c++11 的 high_performance_clock 的改进?

译|High-Performance Server Architecture

[High Performance Computing] {Udacity} L4: Intro to OpenMP

AHB(Advanced High Performance Bus)协议规范

AHB(Advanced High Performance Bus)协议规范