Graphic Processing Units (GPU)

Beschreibung

Superior Computer Architecture Mindmap am Graphic Processing Units (GPU), erstellt von Artur Assis am 21/06/2022.
Artur Assis
Mindmap von Artur Assis, aktualisiert more than 1 year ago
Artur Assis
Erstellt von Artur Assis vor mehr als 3 Jahre
10
0

Zusammenfassung der Ressource

Graphic Processing Units (GPU)
  1. Overview
    1. A graphics processing unit (GPU) is similar to a set of vector processors sharing hardware. The multiple SIMD processors in a GPU act as independent MIMD cores, like vector computers have multiple vector processors. The main difference is multithreading, which is fundamental to GPU. This feature is missing on most vector processors.
      1. Set of vector processors
        1. Multiple SIMD processors
          1. Act like independend MIMD
          2. Multithreading
          3. Programming for the GPU
            1. Compute Unified Device Architecture (CUDA)
              1. It is a C-like programming language developed by NVIDIA used to program for its GPUs. CUDA generates C/C++ code for the system processor (named host), and a C/C++ dialect for the GPU (named device). In this setup system, the processor is known as the “host”, and the GPU as the “device”.
                1. Characteristics
                  1. Developed by NVIDIA
                    1. C-like programming language
                      1. Setup System
                        1. Host
                          1. System processor
                            1. C/C++ code
                            2. Device
                              1. GPU
                                1. C/C++ dialect
                            3. CUDA thread
                              1. Lowest level of parallelism
                                1. Single instruction, Multiple Threads (SIMT)
                                2. Thread block
                                  1. Threads are executed together in blocks
                                  2. Multithreaded SIMD
                                    1. It is the hardware that executes a whole block of threads.
                                    2. Modifiers
                                      1. Function modifiers
                                        1. The CUDA functions can have different modifiers such as device, global or host.
                                          1. __device__
                                            1. Executed in the device, launched by the device.
                                            2. __global__
                                              1. Executed in the device, launched by the host.
                                              2. __host__
                                                1. Executed in the host, launched by the host.
                                              3. Variable Modifiers
                                                1. __device__
                                                  1. A variable declared with this modifier is allocated to the GPU memory, and accessible by all multithreaded SIMD processors
                                                  2. The CUDA variables have also some modifiers such as the device.
                                                2. CUDA specific terms
                                                  1. Code examples
                                                    1. Ex. Y = a*X + Y
                                                        1. Conventional C code
                                                          1. CUDA corresponding version
                                                            1. This code launches n threads, once per vector element, with 256 threads per thread block in a multithread SIMD processor. The GPU function begins by computing the corresponding element index i based on the block ID, number of threads per block, and the thread ID. The operation of multiplication and addition is performed as long as the index i is within the array.
                                                          2. Ex. A = B * C
                                                            1. Multiply 2 vectors with 8192 elements each
                                                              1. Grid (Vectorized loop)
                                                                1. GPU code that works on the whole 8192 elements multiply is called grid.
                                                                  1. A grid is composed of thread blocks (body of vectorized loops)
                                                                    1. in this case, each thread block with up to 512 elements (16 threads/block x 32 elements/thread), i.e., 16 threads per block
                                                                    2. SIMD instruction executes 32 elements at a time
                                                                2. Open Computing Language (Open-CL)
                                                                  1. The Open Computing Language (Open-CL) is a CUDA-similar programming language, in a general and rough sense. Several companies are developing OpenCL to offer a vendor-independent language for multiple platforms, in contrast to CUDA.
                                                                    1. Vendor independent
                                                                      1. Multiple Platforms
                                                                      2. Extended function call
                                                                          1. Components
                                                                            1. dimGrid
                                                                              1. Specifies the dimensions of the code, in terms of thread blocks
                                                                              2. dimBlock
                                                                                1. Specifies the dimensions of a block, in terms of threads.
                                                                                2. Parameter list
                                                                                  1. blockIdx
                                                                                    1. It is the identifier/index for blocks.
                                                                                    2. threadIdx
                                                                                      1. It is the identifier/index of the current thread within its block
                                                                                      2. blockDim
                                                                                        1. It stands for the number of threads in a block, which comes from the dimBlock parameter.
                                                                                Zusammenfassung anzeigen Zusammenfassung ausblenden

                                                                                ähnlicher Inhalt