Optimized SIMD architecture exploration and implementation for ultra-low energy processors

On-line monitoring is an important challenge in future biotechnology applications, for instance in the domain of precision livestock farming where a strong need is present for low-cost intelligent sensors to monitor animal welfare. On-line poultry monitoring can significantly improve living condi...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Δακουρού, Στεφανία
Άλλοι συγγραφείς: Γκούτης, Κωνσταντίνος
Μορφή: Thesis
Γλώσσα:English
Έκδοση: 2012
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/5372
Περιγραφή
Περίληψη:On-line monitoring is an important challenge in future biotechnology applications, for instance in the domain of precision livestock farming where a strong need is present for low-cost intelligent sensors to monitor animal welfare. On-line poultry monitoring can significantly improve living conditions of hens in industrial farms. A very low-cost low-energy solution needs to be provided though due to the stringent battery limitations. Domain-specific ASIPs can be an ideal solution when they cover enough submarkets to increase the production volume (reducing the price) and ultra-low energy concepts are used for their realization. This work is a part of a larger project and aiming to high energy-efficiency. The current study implements data parallelization, using a recently introduced software-controlled SIMD realization in an innovative way. The approaches that have been employed for the determination of the final instruction set of the architecture that has been created for the mapping of the critical Gauss loop of the detection application, are thoroughly explored. The re-design of the data-parallel data path, also referred to as Soft-SIMD architecture, has been necessary in order to achieve instruction encoding optimization. Furthermore, we have explored the capabilities that a commercial compiler retargetable Tool, like Target, can offer for our target design and we have suggested some potential modifications that would help the tool to become more efficient and useful for a designer’s needs in such architecture. Thereby, this study also demonstrates the promising results obtained by experimenting with detours around the current Target tool design limitations. Finding the right balance between efficiency and flexibility requires the ability to quickly evaluate alternative architectures through simulations and testing techniques. The methods developed for exactly this purpose, with the help of Target’s IP Designer retargetable tool-suite, are discussed in detail. By exploiting the profiling information produced by the ISS, and by reading the assembly code produced by the C compiler, it is possible to identify the instructions in the critical loop, and optimize them by using a number of techniques discussed. The main purpose of this optimization is to reduce the cycle count of the application, in order to reduce the overall power consumption. VHDL files of the optimized and un-optimized processor are automatically generated using the HDL generation tool. However, examining a bio-imaging application, instantiated from the ULP-ASIP architectural template [FEENECS book], many other issues are present too. In particular, the way that these kinds of implementations have to be tested should be taken into consideration. Preferably, the testability has not only to be sufficient and efficient but also reusable, in the sense that test patterns should be able to be generated not only for a specific application or for a group of applications but for the entire architectural template. Therefore, this study also illustrates a Systematic Test Vector generation process for the ULP-ASIP template. Our goal is to make generalized principles, because such principles are reusable and can be applied to any instances, such as our present processor for the Gauss Filter. Finally, this study is completed by presenting some realistic power numbers based on layout back-annotation, which concern the data path components of the processor. Based on all the advanced optimizations and broad search space explorations that are presented in this thesis, a heavily optimized ASIP architecture has been fully implemented which results in a low-cost ultra low-energy consumption while still meeting all the performance requirements.