Accelerating VGG neural network in GPU and FPGA technologies

In recent years, Neural Networks have come to the forefront by eloquently solving long-standing problems. Deep Learning has made possible computational models, composed of multiple processing layers, to learn representations of data with multiple levels of abstraction. This in turn, has led to advan...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Χριστόπουλος, Στέφανος
Άλλοι συγγραφείς: Christopoulos, Stefanos
Γλώσσα:English
Έκδοση: 2022
Θέματα:
Διαθέσιμο Online:https://hdl.handle.net/10889/23751
Περιγραφή
Περίληψη:In recent years, Neural Networks have come to the forefront by eloquently solving long-standing problems. Deep Learning has made possible computational models, composed of multiple processing layers, to learn representations of data with multiple levels of abstraction. This in turn, has led to advances in Computer Vision, Speech Recognition, Machine Translation and many other fields such as Cancer Research. Due to their ability to solve real-world problems, Neural Networks have countless applications and are starting to become part of every-day life. Web searches, music/video recommendations, mobile phone camera processing and even semi-autonomous driving vehicles run with the help of Neural Networks. By nature, Neural Networks require little engineering effort. Instead, they mostly rely on computational power and vast amounts of data which appears to be a driving factor in their success in our modern digital world. Unfortunately, as Moore’s Law seems to be slowing down, it becomes increasingly important to write software that can utilize the underlying hardware to its maximum but also design hardware with software in mind. To this end, in the current thesis, I set to maximally use the available resources of the GPU for the purpose of training Deep Neural Networks. Furthermore, I seek to demonstrate the feasibility of running large Neural Networks on FPGAs that can be used as edge devices. With this goal in mind, I design a Convolution Accelerator for FPGA devices. I apply GPU software optimization strategies to develop custom CUDA kernels for various Neural Network layers and use them to accelerate the VGG16 Neural Network. By using the Winograd Convolution algorithm, optimizing Matrix Multiplication on GPUs and reducing memory bandwidth requirements by fusing layers, I achieve an 8% reduction in execution time over Nvidia’s cuDNN library during Inference and 30% during training of VGG16. In addition, my implementation surpasses Google’s Tensorflow performance by more than 40% during training. In addition, using Xilinx’s Vitis HLS tool, I implement 2 designs with different quantizations of the Winograd Convolution algorithm for the convolutional layers of VGG16 during Inference. Performance results show the FPGA design being 33% more energy efficient than the GPU while achieving sufficiently low execution times to be used as an edge device for Neural Network Inference.