Περίληψη: | In the current dissertation an implementation of shared memory abstraction on top of
contemporary multi-core and many-core clusters has taken place. The results of the presented research effort are mainly depicted in the implementation of the cluster middleware platform Pleiad. Pleiad is a Java-based prototype that incorporates best practices
from the field of distributed shared memory systems and also includes some prototype
characteristics. Next we review briefly the main results and contributions of the current
dissertation:
• e presented middleware, Pleiad, is characterized by a highly modular design.
Moreover, contrast to most other related efforts, which are usually bound to a
specific implementation of consistency, Pleiad has the infrastructure to incorporate many implementations for a certain mechanism and can even interchange
such implementations during runtime.
• Reference implementations are offered for the relaxed consistency models of Lazy
Release Consistency (LRC) and Scope Consistency (ScC). Pleiad is the first Javabased middleware to incorporate implementations for both protocols.
• In the current dissertation is taking place one of the few evaluations on a cluster
that is supplied with low-power processors (Intel Atom) and thus can be thought
as a characteristic case of embedded oriented multi-core clusters.
• In the current dissertation one of the first implementations of shared memory abstraction on top of GPU clusters is presented. Shared memory abstraction is evaluated under two schemes. On the first scheme shared memory programming with
GPU clusters is achieved under a hybrid combination of the first commercial implementation of OpenMP for clusters, the Intel Cluster OpenMP, and the CUDA
platform. e evaluated scheme is the first evaluation of OpenMP and CUDA
in the context of GPU clusters. e second scheme involves the enhancement of
Pleiad in order to support utilization of GPU clusters. Such implementation is one
of the few unified implementation of a shared memory abstraction programming
environment that
• For the moment there is no establishment of available and widely used benchmarks
or application codes that utilize multiple GPUs, either on a cluster or a single node.
us, among the thesis contributions is considered the evaluation of shared memory abstraction with real application codes, since the few related systems either
have used simple kernels or have been evaluated on a single node.
• Specifically, in the current thesis applications from two characteristic domains,
computational fluid dynamics (CFD) and data clustering, have been implemented and evaluated using GPU clusters and single GPUs. In the first case, a computationally intensive CDF code that operates on structured grids has been accelerated on a GPU cluster, while a simulation that manipulates unstructured grid has
been accelerated in the context of a single GPU and demonstrates its potential for
GPU cluster acceleration. Accordingly, a partitional data clustering algorithm is
accelerated using shared memory abstraction on GPU clusters and a preliminary
implementation of a hierarchical data clustering algorithm on GPUs is described.
|