大规模并行处理器程序设计

大规模并行处理器程序设计

(美) 柯克 (Kirk,D.) , 著

出版社:清华大学出版社

年代:2010

定价:29.0

书籍简介:

本书介绍了并行程序设计与GPU体系结构的基本概念,并详细探讨了用于构建并行程序的各种技术,用案例演示了并行程序设计的整个开发过程。

书籍目录:

Preface

Acknowledgments

Dedication

CHAPTER 1 INTRODUCTION

1.1 GPUs as Parallel Computers

1.2 Architecture of a Modem GPU

1.3 Why More Speed or Parallelism?

1.4 Parallel Programming Languages and Models

1.5 0verarching Goals

1.6 Organization of the Book

CHAPTER 2 HISTORY OF GPU COMPUTING

2.1 Evolution of Graphics Pipelines

2.1.1 The Era of Fixed-Function Graphics Pipelines

2.1.2 Evolution of Programmable Real-Time Graphics

2.1.3 Unified Graphics and Computing Processors

2.1.4 GPGPU: An Intermediate Step

2.2 GPU Computing

2.2.1 Scalable GPUs

2.2.2 Recent Developments

2.3 Future Trends

CHAPTER 3 INTRODUCTION TO CUDA

3.1 Data Parallelism

3.2 CUDA Program Structure

3.3 A Matrix-Matrix Multiplication Example

3.4 Device Memories and Data Transfer

3.5 Kernel Functions and Threading

3.6 Summary

3.6.1 Function declarations

3.6.2 Kernel launch

3.6.3 Predefined variables

3.6.4 Runtime API

CHAPTER 4 CUDA THREADS

4.1 CUDA Thread Organization

4.2 blockIdx and threadIdx

4.3 Synchronization and Transparent Scalability

4.4 Thread Assignment

4.5 Thread Scheduling and Latency Tolerance

4.6 Summary

4.7 Exercises

CHAPTER 5 CUDATM MEMORIES

5.1 Importance of Memory Access Efficiency

5.2 CUDA Device Memory Types

5.3 A Strategy for Reducing Global Memory Traffic

5.4 Memory as a Limiting Factor to Parallelism

5.5 Summary

5.6 Exercises

CHAPTER 6 PERFORMANCE CONSIDERATIONS

6.1 More on Thread Execution

6.2 Global Memory Bandwidth

6.3 Dynamic Partitioning of SM Resources

6.4 Data Prefetching

6.5 Instruction Mix

6.6 Thread Granularity

6.7 Measured Performance and Summary

6.8 Exercises

CHAPTER 7 FLOATING POINT CONSIDERATIONS

7.1 Floating-Point Format

7.1.1 Normalized Representation of M

7.1.2 Excess Encoding of E

7.2 Representable Numbers

7.3 Special Bit Patterns and Precision

7.4 Arithmetic Accuracy and Rounding

7.5 Algorithm Considerations

7.6 Summary

7.7 Exercises

CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI

RECONSTRUCTION

8.1 Application Background

8.2 Iterative Reconstruction

8.3 Computing FHd

Step 1. Determine the Kernel Parallelism Structure

Step 2. Getting Around the Memory Bandwidth Limitation.

Step 3. Using Hardware Trigonometry Functions

Step 4. Experimental Performance Tuning

8.4 Final Evaluation

8.5 Exercises

CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION

AND ANALYSIS

9.1 Application Background

9.2 A Simple Kernel Implementation

9.3 Instruction Execution Efficiency

9.4 Memory Coalescing

9.5 Additional Performance Comparisons

9.6 Using Multiple GPUs

9.7 Exercises

CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL

THINKING

10.1 Goals of Parallel Programming

10.2 Problem Decomposition

10.3 Algorithm Selection

10.4 Computational Thinking

10.5 Exercises

CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM

11.1 Background

11.2 Data Parallelism Model

11.3 Device Architecture

11.4 Kernel Functions

11.5 Device Management and Kernel Launch

11.6 Electrostatic Potential Map in OpenCL

11.7 Summary

11.8 Exercises

CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK

12.1 Goals Revisited

12.2 Memory Architecture Evolution

12.2.1 Large Virtual and Physical Address Spaces

12.2.2 Unified Device Memory Space

12.2.3 Configurable Caching and Scratch Pad

12.2.4 Enhanced Atomic Operations

12.2.5 Enhanced Global Memory Access

12.3 Kernel Execution Control Evolution

12.3.1 Function Calls within Kernel Functions

12.3.2 Exception Handling in Kernel Functions

12.3.3 Simultaneous Execution of Multiple Kernels

12.3.4 Interruptible Kernels

12,4 Core Performance

12.4.1 Double-Precision Speed

12.4.2 Better Control Flow Efficiency

12.5 Programming Environment

12.6 A Bright Outlook

APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION

SOURCE CODE

A.1 matrixmul.cu

A.2 matri mulgol d.cpp

A.3 matrixmul, h

A.4 assi st. h

A.5 Expected Output

APPENDIX B GPU COMPUTE CAPABILITIES

B.1 GPU Compute Capability Tables

B.2 Memory Coalescing Variations

Index

内容摘要:

  本书介绍了并行程序设计与GPU体系结构的基本概念,并详细探讨了用于构建并行程序的各种技术,用案例演示了并行程序设计的整个开发过程,即从并行计算的思想开始,直到最终实现实际且高效的并行程序。
  本书特点
  介绍了并行计算的思想,使得读者可以把这种问题的思考方式渗透到高性能并行计算中去。
  介绍了CUDA的使用,CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具。
  介绍如何使用CUDA编程模式和OpenCL来获得高性能和高可靠性。

编辑推荐:

  《大规模并行处理器程序设计(影印版)》特点:介绍了并行计算的思想,使得读者可以把这种问题的思考方式渗透到高性能并行计算中去。介绍了CUDA的使用,CUDA是NVIDIA公司专门为大规模并行环境创建的一种软件开发工具。介绍如何使用CLJDA编程模式和OpellCL来获得高性能和高可靠性。

书籍规格:

书籍详细信息
书名大规模并行处理器程序设计站内查询相似图书
丛书名大学计算机教育国外著名教材系列
9787302229735
如需购买下载《大规模并行处理器程序设计》pdf扫描版电子书或查询更多相关信息,请直接复制isbn,搜索即可全网搜索该ISBN
出版地北京出版单位清华大学出版社
版次影印本印次1
定价(元)29.0语种英文
尺寸23 × 19装帧平装
页数印数

书籍信息归属:

大规模并行处理器程序设计是清华大学出版社于2010.7出版的中图分类号为 TP311.11 的主题关于 并行程序-程序设计-高等学校-教材-英文 的书籍。