CS4022 High Performance 原理解说
Posted 实诚人病
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了CS4022 High Performance 原理解说相关的知识,希望对你有一定的参考价值。
UNIVERSITY OF WARWICK
LEVEL 7 Open Book Assessment [2 hours]
Department of Computer Science
CS4022: High Performance Computing
Instructions
- Read all instructions carefully – and read through the entire paper at least
once before you start writing. - There are four questions. You should attempt two questions from Section
A and the one question in Section B.
You should not submit answers to more than the required number of
questions. - All questions will carry the same number of marks unless otherwise stated.
- You should handwrite your answers either with paper and pen or using an
electronic device with a stylus (unless you have special arrangements for
exams which allow the use of a computer). Start each question on a new
page and clearly mark each page with the page number, your student id and
the question number.
Handwritten notes must be scanned or photographed and all individual
solutions should (if you possibly can) be collated into a single PDF with pages
in the correct order.
You must upload two files to the AEP: your PDF of solutions and a completed
cover sheet.
You must click FINISH ASSESSMENT to complete the submission process.
After you have done so you will not be able to upload anything further. - Please ensure that all your handwritten answers are written legibly, preferably
in dark blue or black ink. If you use a pencil ensure that it is not too faint to be
captured by a scan or photograph. - Please check the legibility of your final submission before uploading. It is your
responsibility to ensure that your work can be read. - You are allowed to access module materials, notes, resources, references
and the internet during the assessment.
2 - You should not try to communicate with any other candidate during the
assessment period or seek assistance from anyone else in completing your
answers. The Computer Science Department expects the conduct of all
students taking this assessment to conform to the stated requirements.
Measures will be in operation to check for possible misconduct. These will
include the use of similarity detection tools and the right to require live
interviews with selected students following the assessment. - By starting this assessment you are declaring yourself fit to undertake it. You
are expected to make a reasonable attempt at the assessment by answering
the questions in the paper.
Please note that: - You must have completed and uploaded your assessment before the 24
hour assessment window closes. - You have an additional 45 minutes beyond the stated length of the paper to
allow for downloading and uploading the assessment, your files and
technical delays. - For further details you should refer to the AEP documentation.
Use the AEP to seek advice immediately if during the assessment period:
• you cannot access the online assessment;
• you believe you have been given access to the wrong online assessment.
Please note that technical support is only available between 9AM and 5PM (BST).
Invigilator support will be also be available (via the AEP) between 9AM and 5PM
(BST).
Notify Dcs.exams@warwick.ac.uk as soon as possible if you cannot complete
your assessment because:
• you lose your internet connection;
• your device fails;
• you become unwell and are unable to continue;
• you are affected by circumstances beyond your control (e.g. fire alarm).
Please note that this is for notification purposes, it is not a help line.
Your assessment starts below.
3
Section A - This question is about fundamental knowledge.
(a) What do we mean by the Granularity of Parallelism? Give four types of parallelism in
order of granularity and provide an application example for each. [7]
(b) Discuss how superthreading and hyperthreading reduce the waste of pipeline slots in the
pipeline mechanism. [8]
(c) Discuss the differences between scientific applications such as matrix multiplication and
graph-based applications such as online-shopping recommendation. Focus your
discussions on data structure, performance metric and key factors that affect the
performance. [12]
(d) Analyse the following two “for” loops in Listing 1. Describe whether the iterations of
these two loops can be parallelised automatically by compilers and explain how you
reached your conclusions. [8]
Loop 1:
for(i=1; i<=n; i++)
{
a[i]= b[i] + c[i];
d[i]= a[i];
}
Loop 2:
for(i=2; i<=n; i++)
a[i]= b[i] + a[i-1];
Listing 1: Two loops for Question 1(d)
4 - This question is about parallel programming models.
(a) The Synchronous mode is a communication mode in MPI. Explain why the Synchronous
mode may incur higher communication overhead than the Standard mode. [7]
(b) Assume there are two MPI processes running on different machines: P0 and P1. In p0,
MPI_Send is first called to send message A to p1 and then MPI_Recv is called to receive
message B from p1. In p1, MPI_Send is first called to send message B to p0 and then
MPI_Recv is called to receive message A from p0. What will happen if the sizes of both
message A and B exceed the system buffers managed by MPI? Explain why. [8]
(c) A collective communication operation is performed by all relevant processes at the same
time with the same set of parameters. However the parameters may have different
meanings to different processes. Describe, using illustrative examples if necessary, the
operations of the following two MPI collective communication calls. Further, discuss
what the parameters in these functions mean to different processes.
i) MPI_Bcast(void *buf, int count, MPI_Datatype type, int root, MPI_Comm
Comm) [6]
ii) MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype,
void *recvbuf, int recvcount, MPI_Datatype recvtype, int root,
MPI_Comm comm) [6]
(d) MPI_Type_create_indexed_block can be used to construct the users’ own data types. The
format of the function is as follows:
MPI_Type_create_indexed_block ( int count,
int blocklengths,
int *array_of_displacements,
MPI_Datatype oldtype,
MPI_Datatype *newtype)
Let oldtype ={(MPI_INT, 0), (MPI_CHAR, 2)} with the extent of 3 bytes.
Let D=(2, 5, 10).
Give the memory layout of newtype after calling
MPI_Type_create_indexed_block (3, 2, D, oldtype, newtype) [8]
5 - This question is about high performance computing systems.
(a) Discuss the differences between multicore CPU and GPU in terms of architecture design
and performance objective. [7]
(b) The topology of node interconnection plays an important role in the performance of a
Cluster system. Draw the topology of a 4-D hypercube. What are the values of node
degree and bisection width of the topology? Discuss which aspect of network
performance node degree and bisection width represent. [8]
(c) Discuss the difference between Cluster systems and Grid systems. [8]
(d) There are three potential methods to implement parallel I/O: 1) One process performs I/O
operations for all other processes; 2) Each process reads or writes the data from or to a
separate file; 3) Different processes access different parts of a common file. Discuss the
advantages and disadvantages of each method. Which method of parallel I/O is most
widely used nowadays? [12]
Section B - This question is about performance modelling.
(a) Consider a 3-D grid of equal-sized cells. Assume that the volume of the grid is V and the
grid is a cube (i.e., the length of the grid in each dimension is V
1/3). Assume V=c×n,
where c is the number of cells allocated to each processor and n is the number of
processors. Derive the surface-to-volume ratios under 1-D, 2-D and 3-D decomposition.
Further, analyse under what circumstances 2-D decomposition is better than 1-D
decomposition. [12]
(b) Discuss the drawbacks of using asymptotic analysis to evaluate the performance of an
algorithm. Give an example for each drawback you list. [8]
(c) Modelling the execution time of an application is a good way of evaluating the
performance of the application. Discuss how to model the execution time of an
application. The discussion should cover the modelling of both computation time and
communication time, and the discussion should revolve around the various parameters
used to model the execution time. [10]
WX:codehelp
以上是关于CS4022 High Performance 原理解说的主要内容,如果未能解决你的问题,请参考以下文章
[High Performance Computing] {Udacity} L1: Course Information
Visual Studio 2013 是不是提供了对 c++11 的 high_performance_clock 的改进?
译|High-Performance Server Architecture
[High Performance Computing] {Udacity} L4: Intro to OpenMP