Become a Readings Member to make your shopping experience even easier. Sign in or sign up for free!

Become a Readings Member. Sign in or sign up for free!

Hello Readings Member! Go to the member centre to view your orders, change your details, or view your lists, or sign out.

Hello Readings Member! Go to the member centre or sign out.

Compiling Parallel Loops for High Performance Computers: Partitioning, Data Assignment and Remapping
Hardback

Compiling Parallel Loops for High Performance Computers: Partitioning, Data Assignment and Remapping

$276.99
Sign in or become a Readings Member to add this title to your wishlist.

This title is printed to order. This book may have been self-published. If so, we cannot guarantee the quality of the content. In the main most books will have gone through the editing process however some may not. We therefore suggest that you be aware of this before ordering this book. If in doubt check either the author or publisher’s details as we are unable to accept any returns unless they are faulty. Please contact us if you have any questions.

The exploitation of parallel processing to improve computing speeds is being examined at virtually all levels of computer science, from the study of parallel algorithms to the development of microarchitectures which employ multiple functional units. The most visible aspect of this interest in parallel processing is the commercially available multiprocessor systems which have appeared in the past decade. Unfortunately, the lack of adequate software support for the development of scientific applications that will run efficiently on multiple processors has stunted the acceptance of such systems. One of the major impediments to achieving high parallel efficiency on many data-parallel scientific applications is communications overhead, which is exemplified by cache coherency traffic and global memory accesses. This book presents compile-time techniques to reduce the overhead of interprocessors with a logically shared address space and physically distributed memory. Such techniques can be used by scientific application designers seeking to optimize code for a particular high-performance computer. In addition, these techniques can be seen as a necessary step toward developing software to support efficient parallel programmes. In multiprocessor systems with physically distributed memory, reducing communication overhead involves both data partitioning and data placement. Adaptive Data Partitioning (ADP) reduces the execution time of parallel programmes by minimizing interprocessor communication for iterative data-parallel loops with near-neighbour communication. Data placement schemes are presented that reduce communication overhead. Under the loop partition specified by ADP, global data is partitioned into classes for each processor, allowing each processor to cache certain regions of the global data set. In addition, for many scientific applications, peak parallel efficiency is achieved only when machine-specific tradeoffs between load imbalance and communication are evaluated and utilized in choosing the data partition. The techniques in this book evaluate these tradeoffs to generate optimum cyclic partitions for data-parallel loops with either a linearly varying or uniform computational structure and either neighbourhood or dimensional multicast communication patterns. This tradeoff is also treated within the CPR (Collective Partitioning and Remapping) algorithm, which partitions a collection of loops with various computational structures and communication patterns. Experiments that demonstrate the advantage of ADP, data placement, cyclic partitioning and CPR were conducted on the Encore Multimax and BBN TC2000 multiprocessors using the ADAPT system, a programme partitioner which automatically restructures iterative data-aparallel loops.

Read More
In Shop
Out of stock
Shipping & Delivery

$9.00 standard shipping within Australia
FREE standard shipping within Australia for orders over $100.00
Express & International shipping calculated at checkout

MORE INFO
Format
Hardback
Publisher
Springer
Country
NL
Date
31 October 1992
Pages
159
ISBN
9780792392835

This title is printed to order. This book may have been self-published. If so, we cannot guarantee the quality of the content. In the main most books will have gone through the editing process however some may not. We therefore suggest that you be aware of this before ordering this book. If in doubt check either the author or publisher’s details as we are unable to accept any returns unless they are faulty. Please contact us if you have any questions.

The exploitation of parallel processing to improve computing speeds is being examined at virtually all levels of computer science, from the study of parallel algorithms to the development of microarchitectures which employ multiple functional units. The most visible aspect of this interest in parallel processing is the commercially available multiprocessor systems which have appeared in the past decade. Unfortunately, the lack of adequate software support for the development of scientific applications that will run efficiently on multiple processors has stunted the acceptance of such systems. One of the major impediments to achieving high parallel efficiency on many data-parallel scientific applications is communications overhead, which is exemplified by cache coherency traffic and global memory accesses. This book presents compile-time techniques to reduce the overhead of interprocessors with a logically shared address space and physically distributed memory. Such techniques can be used by scientific application designers seeking to optimize code for a particular high-performance computer. In addition, these techniques can be seen as a necessary step toward developing software to support efficient parallel programmes. In multiprocessor systems with physically distributed memory, reducing communication overhead involves both data partitioning and data placement. Adaptive Data Partitioning (ADP) reduces the execution time of parallel programmes by minimizing interprocessor communication for iterative data-parallel loops with near-neighbour communication. Data placement schemes are presented that reduce communication overhead. Under the loop partition specified by ADP, global data is partitioned into classes for each processor, allowing each processor to cache certain regions of the global data set. In addition, for many scientific applications, peak parallel efficiency is achieved only when machine-specific tradeoffs between load imbalance and communication are evaluated and utilized in choosing the data partition. The techniques in this book evaluate these tradeoffs to generate optimum cyclic partitions for data-parallel loops with either a linearly varying or uniform computational structure and either neighbourhood or dimensional multicast communication patterns. This tradeoff is also treated within the CPR (Collective Partitioning and Remapping) algorithm, which partitions a collection of loops with various computational structures and communication patterns. Experiments that demonstrate the advantage of ADP, data placement, cyclic partitioning and CPR were conducted on the Encore Multimax and BBN TC2000 multiprocessors using the ADAPT system, a programme partitioner which automatically restructures iterative data-aparallel loops.

Read More
Format
Hardback
Publisher
Springer
Country
NL
Date
31 October 1992
Pages
159
ISBN
9780792392835