Effective algorithms and improved high-volume data analysis techniques

Behind the buzzword big data lays a challenge which is very true and very important both for academia and industry. Efficiently handling an overwhelming volume of data, possibly coming from a large number of sources abiding to vastly different functionality protocols, coding conventions, sampling r...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Λιαπάκης, Ξενοφών
Άλλοι συγγραφείς: Παυλίδης, Γεώργιος
Μορφή: Thesis
Γλώσσα:English
Έκδοση: 2019
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/12795
Περιγραφή
Περίληψη:Behind the buzzword big data lays a challenge which is very true and very important both for academia and industry. Efficiently handling an overwhelming volume of data, possibly coming from a large number of sources abiding to vastly different functionality protocols, coding conventions, sampling rates, and quality standards, is very important from engineering, algorithmic, economic, and even social perspective. However, no matter how challenging the technical part is, it is nonetheless only a fraction of the entire challenge. This happens because harnessing new, non-trivial knowledge from literally an entire data ocean is even more challenging given the additional constraint that the value of this newly obtained knowledge must at least equal the total extraction it, including factors such as power, storage cost, equipment procurements, and data collection cost. And this is the marginal case, which cannot be maintained indefinitely. On the contrary, the knowledge value must be a multiple of the total effort cost in order for any big data pipeline to be viable from a business perspective. The twofold objective of this PhD dissertation is to: To explore the applications of parallelism to accelerating critical computations in challenging problems from various fields. One very concrete example comes from the emerging field of computational combinatorics. Specifically, a novel graph structural resilience metric based on triangles and paths is proposed. Since this metric is purely structural, namely function oblivious, it can be applied to virtually any graph as long as the patterns it relies on have a physical meaning. To show how parallelism can be part of very efficient and wide applicable computational kernels, such as those found in the BLAS library for basic linear algebra operations, which can be applied to various engineering and financial problems, the proposed algorithms are examined from a computational kernel perspective. It is shown that they can be applied to other problems as well, increasing thus their usefulness.