Parallel Architectures

Beschreibung

Superior Computer Architecture Mindmap am Parallel Architectures, erstellt von Artur Assis am 21/06/2022.
Artur Assis
Mindmap von Artur Assis, aktualisiert more than 1 year ago
Artur Assis
Erstellt von Artur Assis vor mehr als 3 Jahre
2
0

Zusammenfassung der Ressource

Parallel Architectures
  1. Multiple instruction streams, multiple data streams (MIMD)
    1. Multiprocessors
      1. Computers consisting of tightly coupled processors whose coordination and usage are generally controlled by a single operating system. The processors share memory through a shared address space.
        1. Multiple processors tightly coupled
          1. Single Operating System
            1. Shared memory through shared memory space
            2. OBS: Multiprocessor is different from multicore.
            3. Memory Organization
              1. Multiprocessors may share:
                1. cache memory, main memory, and I/O system
                  1. main memory and I/O system
                    1. I/O system
                      1. nothing, usually communicating through networks
                        1. Are all these options interesting?
                          1. Note that it is important to avoid bottlenecks in the architecture. And the answer to this question also depends on the project requirements that have to be fulfilled
                            1. Avoid bottlenecks
                              1. It depends on the project requirements
                          2. Classification
                            1. Symmetric (shared-memory) Multiprocessors (SMP) or Centralized shared-memory Multiprocessors
                              1. Characteristics
                                1. ≈ 32 cores or less
                                  1. share a single centralized memory where processors have equal access to.
                                    1. symmetric
                                    2. memory/bus may become a bottleneck
                                      1. Large use of caches and buses to avoid the bottleneck
                                      2. uniform access time (UMA) to all the memory from all the processors
                                        1. UMA
                                      3. Distributed Shared Memory (DSM)
                                        1. Characteristics
                                          1. 16-64 processor cores
                                            1. Large number of processors
                                            2. Distributed memory
                                              1. Increase bandwidth
                                                1. Reduce access latency
                                                2. More complex communication between processors
                                                  1. More software effort to take advantage of the increased memory bandwidth
                                                    1. Distributed I/O
                                                      1. Each node can be a small distributed system with centralized memory
                                                      2. Nonuniform memory access (NUMA). Access time depends on the location of a data word in memory.
                                                        1. NUMA
                                                  2. Memory Architecture for
                                                    1. SMP
                                                      1. processors share a single memory
                                                        1. uniform access times to the memory (UMA)
                                                        2. DSM
                                                          1. processors share the same address space, but not necessarily the same physical memory
                                                            1. nonuniform memory access (NUMA)
                                                            2. Multicomputers
                                                              1. processors with independent memories and address spaces
                                                                1. communicate through interconnection networks
                                                                  1. may even be complete computers connected in network, i.e., clusters
                                                              2. Communication models for
                                                                1. SMP
                                                                  1. SMP Communication Model: There is an implicit memory communication (memory access). In SMP, with a central memory, it is possible to use the threads and fork-join model. One alternative is the application programming interface named open multi-processing (OpenMP).
                                                                    1. Implicit memory communication
                                                                      1. Memory access
                                                                      2. Central memory
                                                                        1. Threads and fork-joins
                                                                          1. Ex.: OpenMP
                                                                            1. Thread (lightweight process)
                                                                              1. Threads are lines of execution within a process and they share the address space of the process to which they belong.
                                                                              2. Fork-join model
                                                                                1. A process creates a child process (thread), by using fork. A process (parent) and its threads (children) share the same address space. Next, the process waits for their threads to finish their computation by calling join. Creating a process is expensive to the operating system, and that is one reason to use different threads to perform a computation.
                                                                          2. DSM
                                                                            1. DSM Communication Model: In DSM, with a distributed memory, it is possible to make use of the message passing model. There is a library project implementing a message passing interface (MPI). Explicit communication is in place (message passing). This solution brings also synchronization problems.
                                                                              1. Distributed memory
                                                                                1. Message passing model
                                                                                  1. synchronization problems
                                                                                    1. Ex.: Message Passing Interface (MPI)
                                                                                    2. Explicit communication (message passing)
                                                                                2. Market share for
                                                                                  1. SMP
                                                                                    1. biggest market share, both in money and in units
                                                                                      1. Ex.: Multiprocessors in a chip.
                                                                                      2. Multicomputers
                                                                                        1. Ex.: Clusters for systems on the Internet.
                                                                                          1. Ex.: > 100 processors (massively parallel processors MPP)
                                                                                        2. Observations about SMP
                                                                                          1. Large and efficient cache systems can greatly reduce the need for memory bandwidth
                                                                                            1. Not much extra hardware is needed
                                                                                              1. Based on general purpose processors (GPP)
                                                                                                1. Base question for cache coherence problem: Caches not only provide locality but also replication. Is that a problem?
                                                                                                Zusammenfassung anzeigen Zusammenfassung ausblenden

                                                                                                ähnlicher Inhalt