Papaefstathiou IoannisΠαπαευσταθιου ΙωαννηςTampouratzis NikolaosΤαμπουρατζης Νικολαος2024-10-312024-10-3120142014-07-02Nikolaos Tampouratzis, "Hardware accelerated basic blocks for power-aware intercommunication in HPC and embedded systems", Master Thesis, Σχολή Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Chania, Greece, 2014Νικόλαος Ταμπουρατζής, "Hardware accelerated basic blocks for power-aware intercommunication in HPC and embedded systems", Μεταπτυχιακή Διατριβή, Σχολή Ηλεκτρονικών Μηχανικών και Μηχανικών Υπολογιστών, Πολυτεχνείο Κρήτης, Χανιά, Ελλάς, 2014https://dspace.library.tuc.gr/handle/123456789/743In the past, a transition to the next fabrication process typically translated to more transistors and frequency and less power. The higher frequencies paired with innovations in computer architecture defined the semiconductor industry and research until the mid-90s. At that point architecture research saturated and industry resided to the technology scaling for performance gains. During the mid-00s frequency scaling saturated as well. Transistor count, the only resource which reliably kept scaling, along with intra-chip parallelism, which could leverage and extend the existing knowledge of old-days supercomputers, emerged as the only solution to keep Moore’s law live. In parallel systems, computing nodes cooperate to solve processing intensive problems. The communication between nodes is achieved through a variety of protocols. Traditionally, research has focused on optimizing these protocols and identifying the most suitable ones per system and application. Recently, an attempt to unify the primitive operations of the proposed intercommunication protocols has been realized through the Portals system. Portals offer a set of low level communication routines which can be composed to model complex protocols. However, Portals modularity comes at a performance cost, as communication protocols have been tuned and many of their timing critical parts have been decoupled from the main execution thread and in many cases accelerated as dedicated hardware. This work targets to close the performance gap between a generic and reusable intercommunication layer, Portals, and the several monolithic but highly tuned protocols. A software driven hardware accelerated system is suggested which resides on execution of actual software to highlight the critical parts of the communication routines. Accelerating the bottlenecks starts by modeling the hardware in untimed virtual prototypes and the software in a range of candidate embedded processors. A novel path from hardware prototypes to actual silicon allows rapid characterization of the accelerator in terms of power, performance and area. The suggested approach triggers a speedup from one order of magnitude in bottleneck components of Portals, while it is up to two orders of magnitude faster in both MPI and GA baseline implementations in a recent embedded processor.3 megabytesenhttp://creativecommons.org/licenses/by-nc-sa/4.0/Accelerate Intercommunication cost in HPC and Embedded SystemsEmbedded systems (Computer systems)embedded computer systemsembedded systems computer systemsHPC (Computer science)high performance computinghpc computer scienceHardware accelerated basic blocks for power-aware intercommunication in HPC and embedded systemsΜεταπτυχιακή Διατριβή