Hardware acceleration of AI/deep learning applications for the RISC-V architecture

AI & Deep Learning Applications are becoming increasingly popular in our everyday lives. The massive improvements in computing performance that originate in the late 20th century made it possible to implement AI and Deep Learning features in all sorts of instances. Cars now feature Virtual Assis...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Λιάσος, Αλέξανδρος
Άλλοι συγγραφείς: Liasos, Alexandros
Γλώσσα:English
Έκδοση: 2021
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/15019
Περιγραφή
Περίληψη:AI & Deep Learning Applications are becoming increasingly popular in our everyday lives. The massive improvements in computing performance that originate in the late 20th century made it possible to implement AI and Deep Learning features in all sorts of instances. Cars now feature Virtual Assistants, capable of responding to the driver’s natural voice language commands. In the Healthcare sector, computer-aided disease detection and computer-aided diagnosis have been made possible using Deep Learning algorithms. The aim of this thesis is to bring AI/Deep Learning capabilities to the RISC-V Architecture. The open-source nature of RISC-V makes it an ideal base to implement the extra hardware required to handle those specific workloads. Chapter 1 serves as an introduction to the thesis. We begin with the basic structure of a processor and note the differences between RISC and CISC architectures. Next, an introduction to the RISC-V architecture is made, where its origin and goals are listed. Then, the considerations when selecting a base processor design for the project are laid out. The chapter ends with an in-depth look at the individual components found within the chosen processor. Chapter 2 briefly demonstrates the different routes we can follow in order to produce code that can be executed on the processor. After the code is generated, the execution of the code is simulated. Chapter 3 follows the process of implementing the processor on programmable logic. After a brief introduction to programmable logic devices, some considerations when choosing a suitable programmable logic device are noted. Then, the steps regarding the implementation of the processor, as well as the peripherals needed for the processor to function are listed, noting some easy-to-make mistakes. With the stock processor design now complete, chapter 4 covers the software development process for the processor, as well as the use of the extra peripherals. Chapter 5 serves as a short introduction into neural networks, with special emphasis into convolutional neural networks. Their structure, use cases, and operation are topics covered in this chapter. Chapter 6 dives deep into the topic of Hardware Acceleration. In the beginning, considerations regarding the selection of the portion of the convolutional neural network, upon which to improve are noted. Then, observations about the target task to be hardware accelerated are made, in order to tailor the hardware acceleration approach to the aspects of the task. Two different approaches are used; the first utilizes a co-processor alongside the main processor, which handles the convolution step of a convolutional neural network and is inspired by the Cell processor found in the PlayStation 3. The second approach is through the use of specialized instructions, inspired by ARM and DSP architectures. For both approaches, the rationale behind them, as well as their implementation into the design and use are covered. Comparisons between the approaches, in terms of efficiency increase, power consumption and resource utilization are made. These approaches are then compared with each other, as well as other Hardware Acceleration attempts on RISC-V based designs. The last chapter, chapter 7, covers instances, in which the methodology featured in this thesis would be applicable in the real word, resulting in feasible solutions to real-world problems.