Hardware acceleration of stencil computations

Stencil kernels appear on many mathematical problems such as linear and partial differential equations. On top of that, they are widely used in many application fields including image processing, computer vision and computer simulations. Therefore, there is the need to accelerate stencil computation...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Λευθεριώτης, Αιμίλιος
Άλλοι συγγραφείς: Leftheriotis, Aimilios
Γλώσσα:English
Έκδοση: 2022
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/15806
Περιγραφή
Περίληψη:Stencil kernels appear on many mathematical problems such as linear and partial differential equations. On top of that, they are widely used in many application fields including image processing, computer vision and computer simulations. Therefore, there is the need to accelerate stencil computations and iterative stencil loops. One option is using hardware acceleration by creating a Field Programmable Gate Array (FPGA) design with High Level Synthesis (HLS). This thesis will focus on the 5-point Jacobi kernel. There are 9 different architectures proposed, with the most potent being the STSM Cascade architectures, which exploit both temporal and spatial parallelism. In the implementation section of the thesis, 4 of the 9 architectures are implemented on a Zynq7000 board, with both AXI4-Lite and AXI4-Stream protocol versions being created. The designs are clocked at 200MHz and use 32-bit fixed-point arithmetic with 16 decimal bits. In particular, the AXI4-Stream designs are using two clock domains, to overcome the fact that the Zynq7000 board cannot implement the AXI4-Stream elements in frequencies greater than 50MHz. On top of that, a plethora of experiments (over 150) were conducted on a Virtex-7 series FPGA. This design space exploration contained tuning of all the hyperparameters in each architecture. Additionally, parallel software implementations were explored, using polyhedral model transformations. Finally, both theoretical and experimental error analyses of the fixed-point configuration used in the experiments were carried out.