Effective algorithms and improved high-volume data analysis techniques
Behind the buzzword big data lays a challenge which is very true and very important both for academia and industry. Efficiently handling an overwhelming volume of data, possibly coming from a large number of sources abiding to vastly different functionality protocols, coding conventions, sampling r...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://hdl.handle.net/10889/12795 |
id |
nemertes-10889-12795 |
---|---|
record_format |
dspace |
institution |
UPatras |
collection |
Nemertes |
language |
English |
topic |
Parallel computing GPU computing Special purpose computing Linear algebra System solution Sparse matrices Eigenvalues Spectral factorization Graph mining Graph resilience Triangles Blockchain management Distributed consensus Digital transparency Digital health Mobile health applications Insurance market Computational kernel Παράλληλη επεξεργασία Γραμμική άλγεβρα 005.7 |
spellingShingle |
Parallel computing GPU computing Special purpose computing Linear algebra System solution Sparse matrices Eigenvalues Spectral factorization Graph mining Graph resilience Triangles Blockchain management Distributed consensus Digital transparency Digital health Mobile health applications Insurance market Computational kernel Παράλληλη επεξεργασία Γραμμική άλγεβρα 005.7 Λιαπάκης, Ξενοφών Effective algorithms and improved high-volume data analysis techniques |
description |
Behind the buzzword big data lays a challenge which is very true and very important both for
academia and industry. Efficiently handling an overwhelming volume of data, possibly coming from a large number of sources abiding to vastly different functionality protocols, coding conventions, sampling rates, and quality standards, is very important from engineering, algorithmic, economic, and even social perspective. However, no matter how challenging the technical part is, it is nonetheless only a fraction of the entire challenge. This happens because harnessing new, non-trivial knowledge from literally an entire data ocean is even more challenging given the additional constraint that the value of this newly obtained knowledge must at least equal the total extraction it, including factors such as power, storage cost, equipment procurements, and data collection cost. And this is the marginal case, which cannot be maintained indefinitely. On the contrary, the knowledge value must be a multiple of the total effort cost in order for any big data pipeline to be viable from a business perspective.
The twofold objective of this PhD dissertation is to:
To explore the applications of parallelism to accelerating critical computations in challenging
problems from various fields. One very concrete example comes from the emerging field of
computational combinatorics. Specifically, a novel graph structural resilience metric based
on triangles and paths is proposed. Since this metric is purely structural, namely function
oblivious, it can be applied to virtually any graph as long as the patterns it relies on have a
physical meaning.
To show how parallelism can be part of very efficient and wide applicable computational
kernels, such as those found in the BLAS library for basic linear algebra operations, which
can be applied to various engineering and financial problems, the proposed algorithms are
examined from a computational kernel perspective. It is shown that they can be applied to
other problems as well, increasing thus their usefulness. |
author2 |
Παυλίδης, Γεώργιος |
author_facet |
Παυλίδης, Γεώργιος Λιαπάκης, Ξενοφών |
format |
Thesis |
author |
Λιαπάκης, Ξενοφών |
author_sort |
Λιαπάκης, Ξενοφών |
title |
Effective algorithms and improved high-volume data analysis techniques |
title_short |
Effective algorithms and improved high-volume data analysis techniques |
title_full |
Effective algorithms and improved high-volume data analysis techniques |
title_fullStr |
Effective algorithms and improved high-volume data analysis techniques |
title_full_unstemmed |
Effective algorithms and improved high-volume data analysis techniques |
title_sort |
effective algorithms and improved high-volume data analysis techniques |
publishDate |
2019 |
url |
http://hdl.handle.net/10889/12795 |
work_keys_str_mv |
AT liapakēsxenophōn effectivealgorithmsandimprovedhighvolumedataanalysistechniques AT liapakēsxenophōn apotelesmatikoialgorithmoikaibeltiōmenestechnikesanalysēsdedomenōnmegalouonkou |
_version_ |
1771297303476305920 |
spelling |
nemertes-10889-127952022-09-05T20:36:41Z Effective algorithms and improved high-volume data analysis techniques Αποτελεσματικοί αλγόριθμοι και βελτιωμένες τεχνικές ανάλυσης δεδομένων μεγάλου όγκου Λιαπάκης, Ξενοφών Παυλίδης, Γεώργιος Παυλίδης, Γεώργιος Μεγαλοοικονόμου, Βασίλης Γαροφαλάκης, Ιωάννης Τζήμας, Ιωάννης Σιούτας, Σπύρος Τσόλης, Δημήτρης Στυλιάρας, Γεώργιος Liapakis, Xenofon Parallel computing GPU computing Special purpose computing Linear algebra System solution Sparse matrices Eigenvalues Spectral factorization Graph mining Graph resilience Triangles Blockchain management Distributed consensus Digital transparency Digital health Mobile health applications Insurance market Computational kernel Παράλληλη επεξεργασία Γραμμική άλγεβρα 005.7 Behind the buzzword big data lays a challenge which is very true and very important both for academia and industry. Efficiently handling an overwhelming volume of data, possibly coming from a large number of sources abiding to vastly different functionality protocols, coding conventions, sampling rates, and quality standards, is very important from engineering, algorithmic, economic, and even social perspective. However, no matter how challenging the technical part is, it is nonetheless only a fraction of the entire challenge. This happens because harnessing new, non-trivial knowledge from literally an entire data ocean is even more challenging given the additional constraint that the value of this newly obtained knowledge must at least equal the total extraction it, including factors such as power, storage cost, equipment procurements, and data collection cost. And this is the marginal case, which cannot be maintained indefinitely. On the contrary, the knowledge value must be a multiple of the total effort cost in order for any big data pipeline to be viable from a business perspective. The twofold objective of this PhD dissertation is to: To explore the applications of parallelism to accelerating critical computations in challenging problems from various fields. One very concrete example comes from the emerging field of computational combinatorics. Specifically, a novel graph structural resilience metric based on triangles and paths is proposed. Since this metric is purely structural, namely function oblivious, it can be applied to virtually any graph as long as the patterns it relies on have a physical meaning. To show how parallelism can be part of very efficient and wide applicable computational kernels, such as those found in the BLAS library for basic linear algebra operations, which can be applied to various engineering and financial problems, the proposed algorithms are examined from a computational kernel perspective. It is shown that they can be applied to other problems as well, increasing thus their usefulness. Το ανωτέρω ρητό εκτός από μια επιτυχημένη αναλογία μεταξύ του ψηφιακού κόσμου του 21ου αιώνα και του αμέσως προηγούμενου, ο οποίος ξεκίνησε ουσιαστικά τον 19ο αιώνα, κρύβει και μια πολύ σημαντική αλήθεια. Όπως και το πετρέλαιο, τα δεδομένα, ειδικά όταν η ψηφιακή μάζα τους ξεπερνά κάποια σημαντικά τεχνολογικά όρια, θα πρέπει να αποκτηθούν εύκολα και θα πρέπει να είναι υψηλής ποιότητας για την εξόρυξη και την επακόλουθη έγκαιρη αξιοποίηση νέας και μη τετριμμένης γνώσης από αυτά. Αν και κάθε οργανισμός ο οποίος παράγει ή διαχειρίζεται σήμερα δεδομένα (θα πρέπει να) διαθέτει ουσιαστικά εσωτερικά κριτήρια ποιότητας δεδομένων αν όντως θέλει να αντλήσει από αυτά έγκαιρο και έγκυρο πρόσθετο πληροφοριακό πλεονέκτημα έναντι του ανταγωνισμού. Αν και τα εν λόγω κριτήρια προφανώς και εξαρτώνται όχι μόνον από την οργανωτική δομή, την φύση, τους πόρους, και την αποστολή του εκάστοτε οργανισμού αλλά ορισμένες φορές, κυρίως για ad hoc δράσεις, και από εντελώς συγκυριακές περιστάσεις, η επιστημονική κοινότητα έχει επεξεργαστεί ένα σύστημα έξι κυρίων αξόνων, το οποίο μπορεί κάλλιστα να συνδυαστεί με τα εκάστοτε εσωτερικά κριτήρια ενός οργανισμού, για την αξιολόγηση των δεδομένων, το λεγόμενο σύστημα 6V. 2019-11-03T11:46:21Z 2019-11-03T11:46:21Z 2019-07-24 Thesis http://hdl.handle.net/10889/12795 en 0 application/pdf |