Optimized SIMD architecture exploration and implementation for ultra-low energy processors
On-line monitoring is an important challenge in future biotechnology applications, for instance in the domain of precision livestock farming where a strong need is present for low-cost intelligent sensors to monitor animal welfare. On-line poultry monitoring can significantly improve living condi...
Κύριος συγγραφέας: | |
---|---|
Άλλοι συγγραφείς: | |
Μορφή: | Thesis |
Γλώσσα: | English |
Έκδοση: |
2012
|
Θέματα: | |
Διαθέσιμο Online: | http://hdl.handle.net/10889/5372 |
id |
nemertes-10889-5372 |
---|---|
record_format |
dspace |
institution |
UPatras |
collection |
Nemertes |
language |
English |
topic |
SIMD Ultra-low energy processors ASIP Instruction set processors Επεξεργαστές χαμηλής κατανάλωσης Επαναπρογραμματιζόμενοι επεξεργαστές Σετ εντολών επεξεργαστή 621.395 |
spellingShingle |
SIMD Ultra-low energy processors ASIP Instruction set processors Επεξεργαστές χαμηλής κατανάλωσης Επαναπρογραμματιζόμενοι επεξεργαστές Σετ εντολών επεξεργαστή 621.395 Δακουρού, Στεφανία Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
description |
On-line monitoring is an important challenge in future biotechnology applications,
for instance in the domain of precision livestock farming where a strong need is
present for low-cost intelligent sensors to monitor animal welfare. On-line poultry
monitoring can significantly improve living conditions of hens in industrial farms.
A very low-cost low-energy solution needs to be provided though due to the stringent
battery limitations. Domain-specific ASIPs can be an ideal solution when
they cover enough submarkets to increase the production volume (reducing the
price) and ultra-low energy concepts are used for their realization.
This work is a part of a larger project and aiming to high energy-efficiency.
The current study implements data parallelization, using a recently introduced
software-controlled SIMD realization in an innovative way. The approaches that
have been employed for the determination of the final instruction set of the architecture
that has been created for the mapping of the critical Gauss loop of the
detection application, are thoroughly explored. The re-design of the data-parallel
data path, also referred to as Soft-SIMD architecture, has been necessary in order
to achieve instruction encoding optimization.
Furthermore, we have explored the capabilities that a commercial compiler retargetable
Tool, like Target, can offer for our target design and we have suggested
some potential modifications that would help the tool to become more efficient and
useful for a designer’s needs in such architecture. Thereby, this study also demonstrates
the promising results obtained by experimenting with detours around the
current Target tool design limitations.
Finding the right balance between efficiency and flexibility requires the ability to
quickly evaluate alternative architectures through simulations and testing techniques.
The methods developed for exactly this purpose, with the help of Target’s
IP Designer retargetable tool-suite, are discussed in detail. By exploiting the profiling
information produced by the ISS, and by reading the assembly code produced
by the C compiler, it is possible to identify the instructions in the critical loop, and
optimize them by using a number of techniques discussed. The main purpose of
this optimization is to reduce the cycle count of the application, in order to reduce
the overall power consumption. VHDL files of the optimized and un-optimized
processor are automatically generated using the HDL generation tool.
However, examining a bio-imaging application, instantiated from the ULP-ASIP
architectural template [FEENECS book], many other issues are present too. In
particular, the way that these kinds of implementations have to be tested should
be taken into consideration. Preferably, the testability has not only to be sufficient
and efficient but also reusable, in the sense that test patterns should be able to
be generated not only for a specific application or for a group of applications
but for the entire architectural template. Therefore, this study also illustrates a
Systematic Test Vector generation process for the ULP-ASIP template. Our goal
is to make generalized principles, because such principles are reusable and can be
applied to any instances, such as our present processor for the Gauss Filter.
Finally, this study is completed by presenting some realistic power numbers based
on layout back-annotation, which concern the data path components of the processor.
Based on all the advanced optimizations and broad search space explorations
that are presented in this thesis, a heavily optimized ASIP architecture has been
fully implemented which results in a low-cost ultra low-energy consumption while
still meeting all the performance requirements. |
author2 |
Γκούτης, Κωνσταντίνος |
author_facet |
Γκούτης, Κωνσταντίνος Δακουρού, Στεφανία |
format |
Thesis |
author |
Δακουρού, Στεφανία |
author_sort |
Δακουρού, Στεφανία |
title |
Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
title_short |
Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
title_full |
Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
title_fullStr |
Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
title_full_unstemmed |
Optimized SIMD architecture exploration and implementation for ultra-low energy processors |
title_sort |
optimized simd architecture exploration and implementation for ultra-low energy processors |
publishDate |
2012 |
url |
http://hdl.handle.net/10889/5372 |
work_keys_str_mv |
AT dakouroustephania optimizedsimdarchitectureexplorationandimplementationforultralowenergyprocessors AT dakouroustephania exereunēsēkaiylopoiēsēbeltistopoiēmenēssimdarchitektonikēsgiaepexergastespolychamēlēskatanalōsēs |
_version_ |
1771297355575853056 |
spelling |
nemertes-10889-53722022-09-05T20:30:43Z Optimized SIMD architecture exploration and implementation for ultra-low energy processors Εξερεύνηση και υλοποίηση βελτιστοποιημένης SIMD αρχιτεκτονικής για επεξεργαστές πολύ χαμηλής κατανάλωσης Δακουρού, Στεφανία Γκούτης, Κωνσταντίνος Γκούτης, Κωνσταντίνος Αλεξίου, Γεώργιος Θεωδορίδης, Γεώργιος Dakourou, Stefania SIMD Ultra-low energy processors ASIP Instruction set processors Επεξεργαστές χαμηλής κατανάλωσης Επαναπρογραμματιζόμενοι επεξεργαστές Σετ εντολών επεξεργαστή 621.395 On-line monitoring is an important challenge in future biotechnology applications, for instance in the domain of precision livestock farming where a strong need is present for low-cost intelligent sensors to monitor animal welfare. On-line poultry monitoring can significantly improve living conditions of hens in industrial farms. A very low-cost low-energy solution needs to be provided though due to the stringent battery limitations. Domain-specific ASIPs can be an ideal solution when they cover enough submarkets to increase the production volume (reducing the price) and ultra-low energy concepts are used for their realization. This work is a part of a larger project and aiming to high energy-efficiency. The current study implements data parallelization, using a recently introduced software-controlled SIMD realization in an innovative way. The approaches that have been employed for the determination of the final instruction set of the architecture that has been created for the mapping of the critical Gauss loop of the detection application, are thoroughly explored. The re-design of the data-parallel data path, also referred to as Soft-SIMD architecture, has been necessary in order to achieve instruction encoding optimization. Furthermore, we have explored the capabilities that a commercial compiler retargetable Tool, like Target, can offer for our target design and we have suggested some potential modifications that would help the tool to become more efficient and useful for a designer’s needs in such architecture. Thereby, this study also demonstrates the promising results obtained by experimenting with detours around the current Target tool design limitations. Finding the right balance between efficiency and flexibility requires the ability to quickly evaluate alternative architectures through simulations and testing techniques. The methods developed for exactly this purpose, with the help of Target’s IP Designer retargetable tool-suite, are discussed in detail. By exploiting the profiling information produced by the ISS, and by reading the assembly code produced by the C compiler, it is possible to identify the instructions in the critical loop, and optimize them by using a number of techniques discussed. The main purpose of this optimization is to reduce the cycle count of the application, in order to reduce the overall power consumption. VHDL files of the optimized and un-optimized processor are automatically generated using the HDL generation tool. However, examining a bio-imaging application, instantiated from the ULP-ASIP architectural template [FEENECS book], many other issues are present too. In particular, the way that these kinds of implementations have to be tested should be taken into consideration. Preferably, the testability has not only to be sufficient and efficient but also reusable, in the sense that test patterns should be able to be generated not only for a specific application or for a group of applications but for the entire architectural template. Therefore, this study also illustrates a Systematic Test Vector generation process for the ULP-ASIP template. Our goal is to make generalized principles, because such principles are reusable and can be applied to any instances, such as our present processor for the Gauss Filter. Finally, this study is completed by presenting some realistic power numbers based on layout back-annotation, which concern the data path components of the processor. Based on all the advanced optimizations and broad search space explorations that are presented in this thesis, a heavily optimized ASIP architecture has been fully implemented which results in a low-cost ultra low-energy consumption while still meeting all the performance requirements. Η αυτόματη μέθοδος παρακολούθησης ζωντανών οργανισμών, όπως έχει ερευνηθεί και δημοσιευθεί από το Τμήμα Biosystems (BIOSYST) του K.U. Leuven [1], συνίσταται από μια εϕαρμογή με «υπολογιστική όραση», η οποία, βασιζόμενη στις αποκρίσεις τους, κατηγοριοποιεί τη συμπεριϕορά τους. Η βιοτεχνολογική αυτή εϕαρμογή αναπτύσσει ένα πλήρως αυτοματοποιημένο σύστημα «υπολογιστικής όρασης» σε μεμονωμένες και υπό περιορισμό όρνι- θες.Η εϕαρμογή χωρίζεται σε δύο αλγόριθμους, εκ των οποίων ο πρώτος ανιχνεύει το αντι- κείμενο παρακολούθησης (detection algorithm) και ο δεύτερος το εντοπίζει (tracking algorithm). Η παρούσα μελέτη αποτελεί κομμάτι ενός μεγαλυτέρου project και συνέχεια της προηγούμενης δουλείας που αναπτύχθηκε στον τομέα αυτό.Ο σκοπός αυτής της μελέτης είναι η εξερεύνηση της αρχιτεκτονικής που έχει δημιουργηθεί για την αντιστοίχιση του κρίσιμου βρόχου Gauss του αλγόριθμου ανίχνευσης προκειμένου να καθοριστεί το τελικό σύνολο εντολών του ULP-ASIP SIMD επεξεργαστή. Οι τεχνικές και οι προσεγγίσεις που χρησιμοποιούνται για την υποστήριξη της διαδικασίας βελτιστοποίησης της κωδικοποίησης του συνόλου εντολών παρουσιάζονται εκτεταμένα στο κεϕάλαιο 2. Επιπλέον, κατά τη διάρκεια της εξερεύνησης της αρχιτεκτονικής, το σύνολο εντολών που ορίστηκε και οι τεχνικές αντιστοίχισης επανεξετάζονται, προκειμένου να μειωθεί το συνολικό κόστος εκτέλεσης. Η εύρεση της σωστής ισορροπίας μεταξύ της αποτελεσματικότητας και της ευελιξίας απαιτεί την ικανότητα να αξιολογούνται γρήγορα εναλλακτικές αρχιτεκτονικές μέσω εξομοιώσεων και τεχνικών δοκιμών. Το Κεϕάλαιο 3 επεξηγεί τις μεθόδους που αναπτύχθηκαν ακριβώς για το σκοπό αυτό, με τη βοήθεια του περιβάλλοντος σχεδίασης IP των TARGET Compiler Τεχνολογιών η οποία προσϕέρει ένα πλήρες reTARGETable εργαλείο. Ωστόσο, μια πιο συστηματική διαδικασία παραγωγής διανυσμάτων δοκιμής για ολόκληρη την πλατϕόρμα ULP-ASIP κατέληξε να είναι ένα πολύ σημαντικό πλεονέκτημα για την επικύρωση της λειτουργίας του επεξεργαστή ULP-ASIP. Ως εκ τούτου, μια τέτοια μέθοδος, αναλύεται και παρουσιάζεται εκτεταμένα στο κεϕάλαιο 4. Τέλος, το Κεϕάλαιο 5 παρουσιάζει την εκτίμηση της ενέργειας του data path του επεξεργαστή. Με βάση όλες τις προηγμένες βελτιστοποιήσεις και τις ευρείες εξερευνήσεις του χώρου αναζήτησης που παρουσιάζονται στα προηγούμενα κεϕάλαια, μια ισχυρά βελτιστοποιημένη συνθέσιμη αρχιτεκτονική ASIP υλοποιείται πλήρως η οποία οδηγεί σε μια χαμηλού κόστους, πολύ χαμηλής κατανάλωσης ενέργειας πλατϕόρμα, καλύπτοντας συγχρόνως όλες τις απαιτήσεις επιδόσεων. 2012-07-19T10:42:16Z 2012-07-19T10:42:16Z 2011-05-25 2012-07-19 Thesis http://hdl.handle.net/10889/5372 en Η ΒΚΠ διαθέτει αντίτυπο της διατριβής σε έντυπη μορφή στο βιβλιοστάσιο διδακτορικών διατριβών που βρίσκεται στο ισόγειο του κτιρίου της. 12 application/pdf |