Innovative algorithmic techniques in cloud computing

In recent years, due to the universal spread of the internet, the amount of digitally available information continuously grows. Indicatively, in 2021 there are 7.83 billion people, 4.66 billions of whom are considered active Internet users 1 . In addition, over 100 million Gigabytes of data have yet...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Πισπιρίγκος, Γεώργιος
Άλλοι συγγραφείς: Pispirigkos, Georgios
Γλώσσα:English
Έκδοση: 2021
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/15160
id nemertes-10889-15160
record_format dspace
institution UPatras
collection Nemertes
language English
topic Cloud computing
Distributed processing
Social network analysis
Community detection
Community prediction
Distributed machine learning
Supervised learning
Classification
Bagging ensemble learning
Stacking ensemble learning
Word-sense disambiguation
Υπολογιστικό νέφος
Κατανεμημένη επεξεργασία
Ανάλυση κοινωνικών δικτύων
Ανίχνευση κοινοτήτων
Πρόβλεψη κοινοτήτων
Κατανεμημένη μηχανική μάθηση
Επιβλεπόμενη μάθηση
Ομαδοποίηση
Συνδυαστικό μοντέλο μηχανικής μάθησης
Συνδυαστικό μοντέλο στοίβαξης μηχανικής μάθησης
Αποσαφήνιση λέξης
spellingShingle Cloud computing
Distributed processing
Social network analysis
Community detection
Community prediction
Distributed machine learning
Supervised learning
Classification
Bagging ensemble learning
Stacking ensemble learning
Word-sense disambiguation
Υπολογιστικό νέφος
Κατανεμημένη επεξεργασία
Ανάλυση κοινωνικών δικτύων
Ανίχνευση κοινοτήτων
Πρόβλεψη κοινοτήτων
Κατανεμημένη μηχανική μάθηση
Επιβλεπόμενη μάθηση
Ομαδοποίηση
Συνδυαστικό μοντέλο μηχανικής μάθησης
Συνδυαστικό μοντέλο στοίβαξης μηχανικής μάθησης
Αποσαφήνιση λέξης
Πισπιρίγκος, Γεώργιος
Innovative algorithmic techniques in cloud computing
description In recent years, due to the universal spread of the internet, the amount of digitally available information continuously grows. Indicatively, in 2021 there are 7.83 billion people, 4.66 billions of whom are considered active Internet users 1 . In addition, over 100 million Gigabytes of data have yet been indexed, while approximately 2.5 billion Gigabytes of raw data are generated on daily basis 2 . Thus, as digital information massively grows, the need for compact data representations has never been more immediate. Among the rest data representations, information networks can undoubtedly be considered one of the most prominent, since they manage to harmonically combine different aspects of information in the same data entity. Specifically, this representation model subtly combines different data modalities, so as to conveniently incorporate the intrinsic inference and semantics. Therefore, by presenting each individual entity as a node and by denoting any kind of association with an interconnection edge, information graphs may adeptly formulate any functional system as one of interacting entities. The usage of this general-purpose abstraction has practically been proven beneficial and has widely been adopted in various scientific sectors - such as chemistry, biology, neuroscience, finance, linguistics, social sciences, software engineering, digital marketing, etc., - mostly because of its obvious interpretation. As a result, with the extended application of this hierarchical data representation in various important issues, - such as customer segmentation, epidemiology, political influence evolution, criminal identification, tissue/organ detection, etc.; - graph analysis has attracted significant scientific interest, triggering the conduction of an impressive amount of research. Focusing on social media, where in 2021 more than 4.14 billion people are considered active social media users 1 , it is obvious that the respective information can effectively be structured with information networks. Specifically, each user can compactly be presented as a vertex, while any kind of user interaction - such as “friendship”, “liking”, “following”, “sharing”, “expressing interest”, “re-tweeting”, etc.; - might be expressed with an interconnection edge between the corresponding vertices. From sentiment analysis, expert identification and mood analysis to recommendation systems, digital footprint analysis and viral marketing campaigns, graph theory lies under numerous substantial issues, aiming to introduce efficient information managing and data mining techniques. Undeniably, one of the most important network analysis topic is community detection. This problem principally aims to identify groups of highly similar entities, aka communities, by primarily leveraging the network topology [1-9] of the data graph under study. The necessity of this topic, as plainly explained in [10], concerns the deeper understanding of the subjacent network structure that could lead to the extraction of advantageous insights regarding the underlying dynamic processes. Therefore, with its appropriate generalization and its widespread adoption from numerous fields - such as recommendation systems, targeted market analysis, influence propagation, link prediction, opinion mining etc.; - this NP-hard graph analysis problem [1] has unquestionably become one of the most essential and challenging. In respect of sociology, community detection can alternatively be construed as the expression of the homophily effect [11,12], which reflect the natural human tendency to mostly associate and interact with groups of similar. Concentrating on social media, community detection can be differently interpreted as the identification of social media users’ groups that are, either directly or indirectly, connected with each other and tend to interact more often, comparing to users of different communities [13]. Consequently, by identifying the strongly related groups of individuals, any social network might be naturally decomposed to groups of highly interacting entities, the communities, which plainly disclose the given social graph's inner mechanics. Despite the lack of a broadly accepted definition [1-9], a community can be intuitively perceived considering its graph representation. Specifically, a set of vertices is expected to form a meaningful community, if only its intra-cluster and inter-cluster densities are bigger, and smaller respectively, to the average link density of the original graph [6]. However, as it is strongly underlined in [14], the optimal community hierarchy is generated when the inter-cluster and intra-cluster densities are comparatively better than expected, and not by the optimization of the individuals. In other words, a good graph division would not be the one in which the number of connections between communities is minimized, but the one in which there are fewer inter-connection edges than originally found. Because of this abstraction, community detection has practically become one of the most conceptually challenging and computationally demanding network analysis topics. Due to its profitable application in plenteous scientific areas, a numerous set of community detection algorithms [1-9] has already been published. From methods that are exclusively based on the repetitive calculation of a global network topology criterion, to alternatives inspired by discrete mathematics and physics, the pluralism of classic community detection approaches is indeed remarkable. The great majority of those methods and algorithms are originally designed to be generally applied to any information network. The classic community detection techniques - e.g., the Louvain [15] algorithm, the Girvan–Newman [16] algorithm, the Clauset–Newman–Moore [6] algorithm, the edge centrality optimization method [1], the geodesic edge betweenness [1] approach, etc.; - are basically recursive methods of high polynomial computational complexity aiming to optimize an iteratively modified and repetitively calculated set of global network topology metrics, in order to extract the underlying community hierarchy from any possible network. Nevertheless, those methods’ capacity has practically confirmed [1,4,8] to be profoundly limited in terms of scalability, outcome consistency, and overall reliability. As meticulously described in [1-9,17], the classic community detection solutions are principally static methods that neglect not only the information networks’ topological heterogeneities, but also their significantly variant subjacent community structure. Additionally, apart from the undisputed computational difficulties of global optimization processes, the actual size of today’s real-world social networks also acts as another major inhibitory factor. As plainly explained in [1], with their corresponding computational complexity being at least quadratic, the application of classic community detection algorithms in large-scale information networks, such as contemporary social media graphs, is prohibitive. In this regard, the primary intention of this thesis is the introduction of highly scalable and accurate community detection methodologies that would efficiently extract the underlying community hierarchy of modern, large-scale social information networks regardless of their size and density. Initially, by alternatively interpreting the node-oriented definition of community to its equivalent edge-oriented representation, the introduction of distributed community prediction takes place. The true purpose of these methodologies is to efficiently identify the subjacent community hierarchy of any large-scale social graph through prediction. This is achieved by classifying the imminent graph's edges to either the ones associating nodes of different communities, aka inter-connection edges, or of the same, aka intra-connection edges, solely based on plain network topology characteristics. The promising perspectives of the distributed community prediction have meticulously been analysed in the [18-20] research publications. All different proposed methodologies have thoroughly been examined on numerous real-life social networks and proven superior to various classic community detection methods in terms of performance, stability and accuracy. Furthermore, aiming to enhance the community identification process to further consider different aspects of social networks’ information, such as the intrinsic user profile information, the article [21] has been published. In this article, a distributed, hybrid, community detection methodology has been introduced that ably combined the local topological characteristics of each social media graph under study along with the existing user profile information, in order to unveil the subjacent community structure. The proposed hybrid community detection approach [21] has been extensively tested on various real-world social graphs, roundly compared to other classic divisive community detection algorithms and practically proven highly scalable and adequately accurate. Finally, the research publications [22,23] have plainly presented the beneficial application of community detection in other sectors such as the viral marketing and computational linguistics. Particularly, in case of [22], the Twitter Personality based Communicative Communities Extraction (T-PCCE) system has been introduced. This system [22], given a real-world Twitter social subgraph, aims to identify the communities of high internal information flows, by also considering the users’ personality traits. Regarding the case of [23], a word-sense disambiguation methodology has been introduced. Specifically, by employing classic community detection techniques and establishing the unprecedented concept of community coherence on the Wikipedia Entities’ semantic ontologies graph, this distributed methodology demonstrated impressive precision and general computational performance as regards the Wikification / text annotation problem. [1] Santo Fortunato: Community detection in graphs. CoRR abs/0906.0612 (2009) [2] Schaeffer, Satu. (2007). Graph Clustering. Computer Science Review. 1. 27-64. 10.1016/j.cosrev.2007.05.001. [3] Devi, J. & Eswaran, Poovammal. (2016). An Analysis of Overlapping Community Detection Algorithms in Social Networks. Procedia Computer Science. 89. 349-358. 10.1016/j.procs.2016.06.082. [4] Stavros Souravlas, Angelo Sifaleras, M. Tsintogianni, Stefanos Katsavounis: A classification of community detection methods in social networks: a survey. Int.J. Gen. Syst. 50(1): 63-91 (2021), doi: 10.1080/03081079.2020.1863394 [5] Cai, Q., Ma, L., Gong, M. and Tian, D. (2016) A survey on network community detection based on evolutionary computation, Int. J. Bio-Inspired Computation, Vol. 8, No. 2, pp.84–98, DOI: 10.1504/IJBIC.2016.076329. [6] Bisma S. Khan, Muaz A. Niazi: Network Community Detection: A Review and Visual Survey. CoRR abs/1708.00977 (2017) [7] Michele Coscia, Fosca Giannotti, Dino Pedreschi: A Classification for Community Discovery Methods in Complex Networks. CoRR abs/1206.3552 (2012) [8] Papadopoulos, S., Kompatsiaris, Y., Vakali, A. et al. Community detection in Social Media. Data Min Knowl Disc 24, 515–554 (2012). https://doi.org/10.1007/s10618-011-0224-z [9] Javed, M.A., Younis, M.S., Latif, S., Qadir, J., Baig, A., Community detection in networks: A multidisciplinary review, Journal of Network and Computer Applications (2018), doi: 10.1016/j.jnca.2018.02.011. [10] Andrea Lancichinetti, Mikko Kivelä, Jari Saramäki, Santo Fortunato: Characterizing the community structure of complex networks. CoRR abs/1005.4376 (2010) [11] Khediri, Nourhene & Karoui, Wafa. (2017). Community Detection in Social Network with Node Attributes Based on Formal Concept Analysis, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, 1346-1353. 10.1109/AICCSA.2017.200. [12] Yuan Li: Community Detection with Node Attributes and its Generalization. CoRR abs/1604.03601 (2016) [13] P. Held, B. Krause and R. Kruse, "Dynamic Clustering in Social Networks Using Louvain and Infomap Method," 2016 Third European Network Intelligence Conference (ENIC), 2016, pp. 61-68, doi: 10.1109/ENIC.2016.017. [14] Newman MEJ. Modularity and community structure in networks. PNAS June 6, 2006 103 (23) 8577-8582; https://doi.org/10.1073/pnas.0601602103 [15] Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008) [16] E.J. Newman, Mark & Girvan, Michelle. (2004). Finding and Evaluating Community Structure in Networks. Physical review. E, Statistical, nonlinear, and soft matter physics. 69. 026113. 10.1103/PhysRevE.69.026113. [17] Peel, L.; Larremore, D.B.; Clauset, A. The ground truth about metadata and community detection in networks. Sci. Adv. 2017, 3, e1602548, doi:10.1126/sciadv.1602548. [18] Makris, C.; Pettas, D.; Pispirigos, G. Distributed Community Prediction for Social Graphs Based on Louvain Algorithm. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Cham, Switzerland, 2019; pp. 500–511. DOI: 10.1007/978-3-030-19823-7_42 [19] Makris, C.; Pispirigos, G.; Rizos, I.O. A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information 2020, 11, 199. https://doi.org/10.3390/info11040199 [20] Makris, C.; Pispirigos, G. Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks. Big Data Cogn. Comput. 2021, 5, 14. https://doi.org/10.3390/bdcc5010014 [21] Konstantinos Georgiou, Christos Makris, Georgios Pispirigos: A Distributed Hybrid Community Detection Methodology for Social Networks. Algorithms 12(8): 175 (2019). https://doi.org/10.3390/a12080175 [22] Eleanna Kafeza, Andreas Kanavos, Christos Makris, Georgios Pispirigos, Pantelis Vikatos: T-PCCE: Twitter Personality based Communicative Communities Extraction System for Big Data. IEEE Trans. Knowl. Data Eng. 32(8): 1625-1638 (2020). DOI: 10.1109/TKDE.2019.2906197 [23] Makris, C.; Pispirigos, G.; Simos, M.A. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence. Algorithms 2020, 13, 160. https://doi.org/10.3390/a13070160
author2 Pispirigkos, Georgios
author_facet Pispirigkos, Georgios
Πισπιρίγκος, Γεώργιος
author Πισπιρίγκος, Γεώργιος
author_sort Πισπιρίγκος, Γεώργιος
title Innovative algorithmic techniques in cloud computing
title_short Innovative algorithmic techniques in cloud computing
title_full Innovative algorithmic techniques in cloud computing
title_fullStr Innovative algorithmic techniques in cloud computing
title_full_unstemmed Innovative algorithmic techniques in cloud computing
title_sort innovative algorithmic techniques in cloud computing
publishDate 2021
url http://hdl.handle.net/10889/15160
work_keys_str_mv AT pispirinkosgeōrgios innovativealgorithmictechniquesincloudcomputing
AT pispirinkosgeōrgios kainotomesalgorithmikestechnikessesystēmataypologistikounephous
_version_ 1771297271199039488
spelling nemertes-10889-151602022-09-05T20:23:35Z Innovative algorithmic techniques in cloud computing Καινοτόμες αλγοριθμικές τεχνικές σε συστήματα υπολογιστικού νέφους Πισπιρίγκος, Γεώργιος Pispirigkos, Georgios Cloud computing Distributed processing Social network analysis Community detection Community prediction Distributed machine learning Supervised learning Classification Bagging ensemble learning Stacking ensemble learning Word-sense disambiguation Υπολογιστικό νέφος Κατανεμημένη επεξεργασία Ανάλυση κοινωνικών δικτύων Ανίχνευση κοινοτήτων Πρόβλεψη κοινοτήτων Κατανεμημένη μηχανική μάθηση Επιβλεπόμενη μάθηση Ομαδοποίηση Συνδυαστικό μοντέλο μηχανικής μάθησης Συνδυαστικό μοντέλο στοίβαξης μηχανικής μάθησης Αποσαφήνιση λέξης In recent years, due to the universal spread of the internet, the amount of digitally available information continuously grows. Indicatively, in 2021 there are 7.83 billion people, 4.66 billions of whom are considered active Internet users 1 . In addition, over 100 million Gigabytes of data have yet been indexed, while approximately 2.5 billion Gigabytes of raw data are generated on daily basis 2 . Thus, as digital information massively grows, the need for compact data representations has never been more immediate. Among the rest data representations, information networks can undoubtedly be considered one of the most prominent, since they manage to harmonically combine different aspects of information in the same data entity. Specifically, this representation model subtly combines different data modalities, so as to conveniently incorporate the intrinsic inference and semantics. Therefore, by presenting each individual entity as a node and by denoting any kind of association with an interconnection edge, information graphs may adeptly formulate any functional system as one of interacting entities. The usage of this general-purpose abstraction has practically been proven beneficial and has widely been adopted in various scientific sectors - such as chemistry, biology, neuroscience, finance, linguistics, social sciences, software engineering, digital marketing, etc., - mostly because of its obvious interpretation. As a result, with the extended application of this hierarchical data representation in various important issues, - such as customer segmentation, epidemiology, political influence evolution, criminal identification, tissue/organ detection, etc.; - graph analysis has attracted significant scientific interest, triggering the conduction of an impressive amount of research. Focusing on social media, where in 2021 more than 4.14 billion people are considered active social media users 1 , it is obvious that the respective information can effectively be structured with information networks. Specifically, each user can compactly be presented as a vertex, while any kind of user interaction - such as “friendship”, “liking”, “following”, “sharing”, “expressing interest”, “re-tweeting”, etc.; - might be expressed with an interconnection edge between the corresponding vertices. From sentiment analysis, expert identification and mood analysis to recommendation systems, digital footprint analysis and viral marketing campaigns, graph theory lies under numerous substantial issues, aiming to introduce efficient information managing and data mining techniques. Undeniably, one of the most important network analysis topic is community detection. This problem principally aims to identify groups of highly similar entities, aka communities, by primarily leveraging the network topology [1-9] of the data graph under study. The necessity of this topic, as plainly explained in [10], concerns the deeper understanding of the subjacent network structure that could lead to the extraction of advantageous insights regarding the underlying dynamic processes. Therefore, with its appropriate generalization and its widespread adoption from numerous fields - such as recommendation systems, targeted market analysis, influence propagation, link prediction, opinion mining etc.; - this NP-hard graph analysis problem [1] has unquestionably become one of the most essential and challenging. In respect of sociology, community detection can alternatively be construed as the expression of the homophily effect [11,12], which reflect the natural human tendency to mostly associate and interact with groups of similar. Concentrating on social media, community detection can be differently interpreted as the identification of social media users’ groups that are, either directly or indirectly, connected with each other and tend to interact more often, comparing to users of different communities [13]. Consequently, by identifying the strongly related groups of individuals, any social network might be naturally decomposed to groups of highly interacting entities, the communities, which plainly disclose the given social graph's inner mechanics. Despite the lack of a broadly accepted definition [1-9], a community can be intuitively perceived considering its graph representation. Specifically, a set of vertices is expected to form a meaningful community, if only its intra-cluster and inter-cluster densities are bigger, and smaller respectively, to the average link density of the original graph [6]. However, as it is strongly underlined in [14], the optimal community hierarchy is generated when the inter-cluster and intra-cluster densities are comparatively better than expected, and not by the optimization of the individuals. In other words, a good graph division would not be the one in which the number of connections between communities is minimized, but the one in which there are fewer inter-connection edges than originally found. Because of this abstraction, community detection has practically become one of the most conceptually challenging and computationally demanding network analysis topics. Due to its profitable application in plenteous scientific areas, a numerous set of community detection algorithms [1-9] has already been published. From methods that are exclusively based on the repetitive calculation of a global network topology criterion, to alternatives inspired by discrete mathematics and physics, the pluralism of classic community detection approaches is indeed remarkable. The great majority of those methods and algorithms are originally designed to be generally applied to any information network. The classic community detection techniques - e.g., the Louvain [15] algorithm, the Girvan–Newman [16] algorithm, the Clauset–Newman–Moore [6] algorithm, the edge centrality optimization method [1], the geodesic edge betweenness [1] approach, etc.; - are basically recursive methods of high polynomial computational complexity aiming to optimize an iteratively modified and repetitively calculated set of global network topology metrics, in order to extract the underlying community hierarchy from any possible network. Nevertheless, those methods’ capacity has practically confirmed [1,4,8] to be profoundly limited in terms of scalability, outcome consistency, and overall reliability. As meticulously described in [1-9,17], the classic community detection solutions are principally static methods that neglect not only the information networks’ topological heterogeneities, but also their significantly variant subjacent community structure. Additionally, apart from the undisputed computational difficulties of global optimization processes, the actual size of today’s real-world social networks also acts as another major inhibitory factor. As plainly explained in [1], with their corresponding computational complexity being at least quadratic, the application of classic community detection algorithms in large-scale information networks, such as contemporary social media graphs, is prohibitive. In this regard, the primary intention of this thesis is the introduction of highly scalable and accurate community detection methodologies that would efficiently extract the underlying community hierarchy of modern, large-scale social information networks regardless of their size and density. Initially, by alternatively interpreting the node-oriented definition of community to its equivalent edge-oriented representation, the introduction of distributed community prediction takes place. The true purpose of these methodologies is to efficiently identify the subjacent community hierarchy of any large-scale social graph through prediction. This is achieved by classifying the imminent graph's edges to either the ones associating nodes of different communities, aka inter-connection edges, or of the same, aka intra-connection edges, solely based on plain network topology characteristics. The promising perspectives of the distributed community prediction have meticulously been analysed in the [18-20] research publications. All different proposed methodologies have thoroughly been examined on numerous real-life social networks and proven superior to various classic community detection methods in terms of performance, stability and accuracy. Furthermore, aiming to enhance the community identification process to further consider different aspects of social networks’ information, such as the intrinsic user profile information, the article [21] has been published. In this article, a distributed, hybrid, community detection methodology has been introduced that ably combined the local topological characteristics of each social media graph under study along with the existing user profile information, in order to unveil the subjacent community structure. The proposed hybrid community detection approach [21] has been extensively tested on various real-world social graphs, roundly compared to other classic divisive community detection algorithms and practically proven highly scalable and adequately accurate. Finally, the research publications [22,23] have plainly presented the beneficial application of community detection in other sectors such as the viral marketing and computational linguistics. Particularly, in case of [22], the Twitter Personality based Communicative Communities Extraction (T-PCCE) system has been introduced. This system [22], given a real-world Twitter social subgraph, aims to identify the communities of high internal information flows, by also considering the users’ personality traits. Regarding the case of [23], a word-sense disambiguation methodology has been introduced. Specifically, by employing classic community detection techniques and establishing the unprecedented concept of community coherence on the Wikipedia Entities’ semantic ontologies graph, this distributed methodology demonstrated impressive precision and general computational performance as regards the Wikification / text annotation problem. [1] Santo Fortunato: Community detection in graphs. CoRR abs/0906.0612 (2009) [2] Schaeffer, Satu. (2007). Graph Clustering. Computer Science Review. 1. 27-64. 10.1016/j.cosrev.2007.05.001. [3] Devi, J. & Eswaran, Poovammal. (2016). An Analysis of Overlapping Community Detection Algorithms in Social Networks. Procedia Computer Science. 89. 349-358. 10.1016/j.procs.2016.06.082. [4] Stavros Souravlas, Angelo Sifaleras, M. Tsintogianni, Stefanos Katsavounis: A classification of community detection methods in social networks: a survey. Int.J. Gen. Syst. 50(1): 63-91 (2021), doi: 10.1080/03081079.2020.1863394 [5] Cai, Q., Ma, L., Gong, M. and Tian, D. (2016) A survey on network community detection based on evolutionary computation, Int. J. Bio-Inspired Computation, Vol. 8, No. 2, pp.84–98, DOI: 10.1504/IJBIC.2016.076329. [6] Bisma S. Khan, Muaz A. Niazi: Network Community Detection: A Review and Visual Survey. CoRR abs/1708.00977 (2017) [7] Michele Coscia, Fosca Giannotti, Dino Pedreschi: A Classification for Community Discovery Methods in Complex Networks. CoRR abs/1206.3552 (2012) [8] Papadopoulos, S., Kompatsiaris, Y., Vakali, A. et al. Community detection in Social Media. Data Min Knowl Disc 24, 515–554 (2012). https://doi.org/10.1007/s10618-011-0224-z [9] Javed, M.A., Younis, M.S., Latif, S., Qadir, J., Baig, A., Community detection in networks: A multidisciplinary review, Journal of Network and Computer Applications (2018), doi: 10.1016/j.jnca.2018.02.011. [10] Andrea Lancichinetti, Mikko Kivelä, Jari Saramäki, Santo Fortunato: Characterizing the community structure of complex networks. CoRR abs/1005.4376 (2010) [11] Khediri, Nourhene & Karoui, Wafa. (2017). Community Detection in Social Network with Node Attributes Based on Formal Concept Analysis, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, 1346-1353. 10.1109/AICCSA.2017.200. [12] Yuan Li: Community Detection with Node Attributes and its Generalization. CoRR abs/1604.03601 (2016) [13] P. Held, B. Krause and R. Kruse, "Dynamic Clustering in Social Networks Using Louvain and Infomap Method," 2016 Third European Network Intelligence Conference (ENIC), 2016, pp. 61-68, doi: 10.1109/ENIC.2016.017. [14] Newman MEJ. Modularity and community structure in networks. PNAS June 6, 2006 103 (23) 8577-8582; https://doi.org/10.1073/pnas.0601602103 [15] Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008) [16] E.J. Newman, Mark & Girvan, Michelle. (2004). Finding and Evaluating Community Structure in Networks. Physical review. E, Statistical, nonlinear, and soft matter physics. 69. 026113. 10.1103/PhysRevE.69.026113. [17] Peel, L.; Larremore, D.B.; Clauset, A. The ground truth about metadata and community detection in networks. Sci. Adv. 2017, 3, e1602548, doi:10.1126/sciadv.1602548. [18] Makris, C.; Pettas, D.; Pispirigos, G. Distributed Community Prediction for Social Graphs Based on Louvain Algorithm. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Cham, Switzerland, 2019; pp. 500–511. DOI: 10.1007/978-3-030-19823-7_42 [19] Makris, C.; Pispirigos, G.; Rizos, I.O. A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information 2020, 11, 199. https://doi.org/10.3390/info11040199 [20] Makris, C.; Pispirigos, G. Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks. Big Data Cogn. Comput. 2021, 5, 14. https://doi.org/10.3390/bdcc5010014 [21] Konstantinos Georgiou, Christos Makris, Georgios Pispirigos: A Distributed Hybrid Community Detection Methodology for Social Networks. Algorithms 12(8): 175 (2019). https://doi.org/10.3390/a12080175 [22] Eleanna Kafeza, Andreas Kanavos, Christos Makris, Georgios Pispirigos, Pantelis Vikatos: T-PCCE: Twitter Personality based Communicative Communities Extraction System for Big Data. IEEE Trans. Knowl. Data Eng. 32(8): 1625-1638 (2020). DOI: 10.1109/TKDE.2019.2906197 [23] Makris, C.; Pispirigos, G.; Simos, M.A. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence. Algorithms 2020, 13, 160. https://doi.org/10.3390/a13070160 Τα τελευταία χρόνια, λόγω της παγκόσμιας εξάπλωσης του διαδικτύου, το μέγεθος της ψηφιακά διαθέσιμης πληροφορίας ολοένα και μεγαλώνει. Ενδεικτικά, το 2021 υπάρχουν περισσότεροι από 7,83 δισεκατομμύρια άνθρωποι στον πλανήτη, με τα 4,66 δισεκατομμύρια εξ ’αυτών να δηλώνουν ενεργοί χρήστες του διαδικτύου. Επιπλέον, το ίδιο έτος εκτιμάται πως έχουν επεξεργαστεί και δεικτοδοτηθεί περισσότερα από 100 δισεκατομμύρια Gigabytes πληροφορίας, ενώ ο μέσος ημερήσιος ρυθμός αναπαραγωγής πρωτογενούς πληροφορίας ξεπερνά τα 2,5 δισεκατομμύρια Gigabytes. Είναι συνεπώς προφανές, πως όσο μεγαλώνει η διαθεσιμότητας της ψηφιοποιημένης πληροφορίας, τόσο μεγαλύτερη καθίσταται και η ανάγκη για συμπαγείς και αποδοτικές δομές αναπαράστασης δεδομένων. Ανάμεσα στις υπόλοιπες δομές αναπαράστασης, τα δίκτυα δεδομένων θεωρούνται η κυρίαρχη, μιας και συνδυάζουν αρμονικά διαφορετικές πτυχές της πληροφορίας στην ίδια εννοιολογική οντότητα. Συγκεκριμένα, το μοντέλο αναπαράστασης πληροφορίας αυτό, συνδυάζει ευφυώς τις διαφορετικές εκφάνσεις των πληροφοριακών οντοτήτων με τις υπαρκτές μεταξύ τους συσχετίσεις, ενσωματώνοντας έτσι την εγγενή τους σημειολογία στην αρχική πληροφορία. Συνεπώς, αναπαριστώντας σαν κόμβο ενός δικτύου κάθε διακριτή εννοιολογική οντότητα και υποδηλώνοντας κάθε πιθανή συσχέτιση με μια ακμή διασύνδεσης, τα δίκτυα δεδομένων μπορούν να αναπαραστήσουν ιδανικά οποιοδήποτε σύστημα σαν σύστημα αλληλοεπιδρώμενων οντοτήτων. Η χρήση αυτής της αφαίρεσης γενικού σκοπού έχει στην πράξη αποδειχτεί ευεργετική και έχει ευρέως υιοθετηθεί σε διαφορετικούς επιστημονικούς τομείς - όπως η χημεία, η βιολογία, η οικονομία, η γλωσσολογία, οι κοινωνικές επιστήμες, η ανάπτυξη λογισμικού, το ψηφιακό μάρκετινγκ, κλπ. - λόγω της προφανούς της ερμηνείας. Ως εκ τούτου, η εκτεταμένη εφαρμογή αυτής της ιεραρχικής δομής αναπαράστασης σε ένα ευρύ φάσμα μοντέρνων πρακτικών ζητημάτων - όπως το customer segmentation, η επιδημιολογία, η εξέλιξη πολιτικής επιρροής, η αναγνώριση εγκληματιών, η ανίχνευση ιστών / οργάνων, κλπ. - προσέλκυσε το επιστημονικό ενδιαφέρον γύρω από τη θεωρία ανάλυσης γράφων, πυροδοτώντας τη διεξαγωγή εκτενούς επιστημονικής έρευνας. Εστιάζοντας στα μέσα κοινωνικής δικτύωσης όπου το 2021 εκτιμώνται πως έχουν περισσότερους από 4,14 δισεκατομμύρια ενεργούς χρήστες, γίνεται εύκολα αντιληπτό ότι οι περιεχόμενες πληροφορίες τους μπορούν να αναπαρασταθούν αποδοτικά χρήσει των δικτύων πληροφορίας. Συγκεκριμένα, κάθε χρήστης αντιστοιχίζεται σε μια κορυφή του γραφήματος αναπαράστασης, ενώ κάθε είδους αλληλεπίδραση μεταξύ χρηστών - όπως friendship, like, following, sharing, interest, react, κ.λπ. - μπορεί να εκφραστεί ως ακμή διασύνδεσης μεταξύ των αντίστοιχων κορυφών. Η θεωρία γράφων εμπλέκεται σε πολλά σύγχρονα θέματα ανάλυσης - όπως η ανάλυση συναισθημάτων, το expert identification, η ανάλυση διάθεσης, τα recommendation systems, η ψηφιακή αναγνώριση αποτυπώματος, οι καμπάνιες viral marketing, κ.λπ. - στοχεύοντας όχι μόνο στην εισαγωγή αποτελεσματικών τεχνικών χειρισμού της πληροφορίας αλλά κυρίως στην εφαρμογή αποτελεσματικών μεθόδων εξόρυξης δεδομένων. Αναμφισβήτητα, ένα από τα πιο σημαντικά ζητήματα ανάλυσης γραφημάτων είναι αυτό της ανίχνευσης κοινοτήτων, το οποίο στοχεύει στον εντοπισμό συνόλων παρόμοιων οντοτήτων, επίσης γνωστά και ως κοινότητες, αξιοποιώντας κυρίως τις υπάρχουσες τοπολογικές ιδιότητες των δικτύων [1-9]. Η κρισιμότητα του ζητήματος αυτού, όπως ξεκάθαρα παρουσιάζεται στο [10], αφορά τη βαθύτερη κατανόηση της υποκείμενης κοινοτικής δομής που ευνοεί την εξαγωγή γνώσης αναφορικά με τις υποβόσκουσες δυναμικές διαδικασίες εξέλιξης του δικτύου πληροφορίας. Έτσι, λόγω της εγγενούς γενίκευσης και της ευρείας υιοθέτησής του από διάφορους τομείς, το NP-hard κλάσης πρόβλημα [1] αυτό θεωρείται ένα από τα πιο σημαντικά και ταυτόχρονα δυσεπίλυτα. Από κοινωνιολογικής άποψης, το ζήτημα της ανίχνευσης κοινοτήτων ερμηνεύεται και ως η έκφραση του φαινομένου της ομοφυλίας [11,12], γνωστό και ως homophily effect, που αντικατοπτρίζει τη φυσική ανθρώπινη τάση του καθενός να συνδέεται και να αλληλοεπιδρά κυρίως με παρόμοια άτομα. Στα πλαίσια των μέσων κοινωνικής δικτύωσης, ως ανίχνευση κοινοτήτων ορίζεται η αναγνώριση των συνόλων χρηστών που συνδέονται, είτε άμεσα είτε έμμεσα, και τείνουν να αλληλοεπιδρούν συχνότερα μεταξύ τους, συγκριτικά με τους υπόλοιπους χρήστες του κοινωνικού δικτύου [11]. Κατά συνέπεια, εντοπίζοντας τις ομάδες έντονα αλληλοεπιδρώμενων μεταξύ τους ατόμων, οποιοδήποτε κοινωνικό δίκτυο μπορεί φυσικά να χαρακτηριστεί ως το υπερσύνολο των επιμέρους ομάδων οντοτήτων υψηλής αλληλεπίδρασης. Παρά την έλλειψη ενός ευρέως αποδεκτού ορισμού [1-9], μια κοινότητα μπορεί να γίνει αντιληπτή λαμβάνοντας υπόψη την διαισθητική ερμηνεία των παραπάνω περιγραφών. Συγκεκριμένα, ένα σύνολο κορυφών αναμένεται πως σχηματίζει μια έγκυρη κοινότητα όταν η πυκνότητα εσωτερικών διασυνδέσεων του συνόλου είναι μεγαλύτερη από τη σχετική πυκνότητα εξωτερικών διασυνδέσεων. Παράλληλα, η πυκνότητα εσωτερικών διασυνδέσεων θα πρέπει να είναι σημαντικά μεγαλύτερη από το μέσο βαθμό διασύνδεσης των κόμβων του αρχικού γραφήματος [6]. Ωστόσο, όπως υπογραμμίζεται στο [14], η βέλτιστη κοινοτική δομή επιτυγχάνεται όταν οι επιμέρους εσωτερικές και εξωτερικές πυκνότητες διασυνδέσεων των επιμέρους συνόλων είναι συγκριτικά μεγαλύτερες, και αντίστοιχα μικρότερες, από το αναμενόμενο, και όχι από την επιμέρους βελτιστοποίηση τους. Με άλλα λόγια, ένας καλός επιμερισμός του αρχικού γραφήματος σε κοινότητες δεν προκύπτει όταν το πλήθος των ακμών διασύνδεσης μεταξύ των διακριτών κοινοτήτων ελαχιστοποιείται, αλλά όταν προκύπτουν λιγότερες από το αναμενόμενο. Αυτή η εγγενής αφαίρεση είναι πρακτικά ο λόγος της αυξημένης εννοιολογικής δυσκολίας του προβλήματος ανίχνευσης κοινοτήτων ώστε να καθίσταται ως ένα από τα πλέον απαιτητικά. Λόγω της ωφέλιμης εφαρμογής της ανίχνευσης κοινοτήτων σε διάφορους επιστημονικούς τομείς, έχει ήδη δημοσιευτεί ένα ευρύ σύνολο αλγορίθμων επίλυσης [1-9]. Από μεθόδους που βασίζονται αποκλειστικά στον επαναληπτικό υπολογισμό και στη βελτιστοποίησης μιας γενικής τοπολογικής μετρικής, μέχρι σε εναλλακτικές μεθόδους, εμπνευσμένες από τα διακριτά μαθηματικά και τη φυσική, ο πλουραλισμός των κλασικών μεθόδων ανίχνευσης κοινοτήτων είναι πράγματι αξιοσημείωτος. Υπάρχει πληθώρα δημοσιεύσεων που προσανατολίζονται στη γενική εφαρμογή τους σε οποιοδήποτε δίκτυο πληροφοριών και δεν εξειδικεύονται στην περίπτωση των γραφημάτων κοινωνικής δικτύωσης. Αυτές οι κλασικές τεχνικές - όπως είναι ο αλγόριθμος Louvain [15], ο αλγόριθμος Girvan – Newman [14], ο αλγόριθμος Clauset – Newman – Moore [8], κ.λπ. - είναι κατά βάση αναδρομικές μέθοδοι υψηλής πολυωνυμικής υπολογιστικής πολυπλοκότητας που ορίζουν την βελτιστοποίηση μιας διαρκώς τροποποιημένης τοπολογικής μετρικής που αφορά το σύνολο του γράφου, με απώτερο στόχο την εξαγωγής της υποκείμενης ιεραρχικής κοινοτικής δομής. Ωστόσο, οι δυνατότητες των κλασσικών αλγορίθμων ανίχνευσης είναι αρκετά περιορισμένες όσον αφορά την επεκτασιμότητα, τη συνέπεια των αποτελεσμάτων και τη συνολική αξιοπιστία τους. Όπως περιγράφεται σχολαστικά στα επιστημονικά άρθρα [1-9,17], αυτές οι λύσεις είναι κυρίως στατικές μέθοδοι που αμελούν τις εγγενείς τοπολογικές ετερογένειες των γραφημάτων αλλά και τη σημαντικά διακυμαινόμενη υποκείμενη κοινοτική δομή των κοινωνικών δικτύων. Επιπλέον, πέρα από τις αδιαμφισβήτητες, εγγενείς υπολογιστικές δυσκολίες που προκύπτουν από την προσπάθεια βελτιστοποίησης μιας γενικής τοπολογικής μετρικής, το μέγεθος των σημερινών κοινωνικών δικτύων αποτελεί έναν εξίσου σημαντικό ανασταλτικό παράγοντα. Ως εκ τούτου, όπως εξηγείται στο [1], με την υπολογιστική τους πολυπλοκότητά να είναι τουλάχιστον δευτέρου βαθμού, η εφαρμογή των κλασικών αλγορίθμων ανίχνευσης κοινοτήτων κρίνεται πρακτικά ανεφάρμοστη σε γραφήματα υψηλής κλιμακωσιμότητας, για τα οποία οι πολυωνυμικές λύσεις είναι απαγορευτικές. Έτσι, η εισαγωγή νέων, καινοτόμων και υψηλά κλιμακώσιμων εναλλακτικών λύσεων κρίνεται απαραίτητη. Σε αντίθεση με τις υπάρχουσες κλασικές τεχνικές εφαρμογής αυστηρών, αναδρομικών υπολογισμών, η βασική πρόθεση αυτής της διδακτορικής διατριβής είναι η εισαγωγή υψηλά κλιμακώσιμων και επαρκώς αποτελεσματικών μεθοδολογιών ανίχνευσης κοινοτήτων που εξαγάγουν με ακρίβεια την υποκείμενη κοινοτική δομή κοινωνικών δικτύων μεγάλης κλίμακας ανεξάρτητα από το μέγεθος και την πυκνότητά σύνδεσης τους. Αρχικά, ερμηνεύοντας διαφορετικά τον διαισθητικό, κόμβο-προσανατολισμένο ορισμό της κοινότητας με την αντίστοιχη άκμο-προσανατολισμένη επεξήγησή του, πραγματοποιείται η εισαγωγή μεθοδολογιών πρόβλεψης της υποκείμενης κοινοτικής δομής των δικτύων. Ο απώτερος σκοπός αυτών των κατανεμημένων μεθοδολογιών είναι η ανίχνευση της υποκείμενης ιεραρχίας κοινοτήτων μέσω της πρόβλεψης. Αυτό επιτυγχάνεται κατηγοριοποιώντας το σύνολο των ακμών του υπό-επεξεργασία γραφήματος μεγάλης κλίμακας σε ακμές εσωτερικής ή εξωτερικής κοινοτικής διασύνδεσης βασιζόμενοι αποκλειστικά στα τοπολογικά τους χαρακτηριστικά. Έπειτα από ενδελεχή μελέτη και ανάλυση, όπως παρουσιάστηκε στις αντίστοιχες ερευνητικές δημοσιεύσεις [18-20], οι προοπτικές των συγκεκριμένων μεθοδολογιών πρόβλεψης κρίθηκαν φέρελπεις μιας και από την πειραματική διαδικασία αποδείχθηκαν ανώτερες από διάφορες κλασικές μεθόδους ως προς την απόδοση, τη σταθερότητα και την ακρίβεια. Επιπροσθέτως, στο επιστημονικό άρθρο [21] εισάγεται μια κατανεμημένη μεθοδολογία υβριδικής ανίχνευσης κοινοτικής δομής. Συγκεκριμένα, στο άρθρο αυτό περιγράφεται μια τεχνική ανίχνευσης κοινοτήτων που συνδυάζει τα τοπολογικά χαρακτηριστικά ενός δεδομένου κοινωνικού γραφήματος με τις υπάρχουσες πληροφορίες των χρηστών, όπως αυτές ορίζονται στα εκάστοτε προφίλ. Η προτεινόμενη υβριδική προσέγγιση ελέγχθηκε διεξοδικά σε διάφορα πραγματικά κοινωνικά γραφήματα, συγκρίθηκε πειραματικά με διάφορους κλασικούς, διαιρετικούς αλγόριθμους ανίχνευσης κοινοτήτων και αποδείχτηκε εξαιρετικά κλιμακώσιμη και επαρκώς ακριβής. Τέλος, στις δημοσιεύσεις [22,23], αναδεικνύεται η σαφώς ευεργετική εφαρμογή των τεχνικών ανίχνευσης κοινοτικών δομών στο ψηφιακό μάρκετινγκ και στη υπολογιστική γλωσσολογία. Συγκεκριμένα, στην περίπτωση του άρθρου [22], παρουσιάζεται το σύστημα Επικοινωνίας Κοινοτήτων Εξαγωγής με βάση την Προσωπικότητα του Χρήστη Twitter (T-PCCE), το οποίο στοχεύει στον εντοπισμό υπό-δικτύων υψηλής εσωτερικής ροής πληροφορίας από ένα πραγματικό γράφημα του κοινωνικού δικτύου Twitter. Ενώ στην περίπτωση του άρθρου [23], εισάγεται μια νέα, κατανεμημένη μεθοδολογία κειμενικής αποσαφήνισης λέξεων. Συγκεκριμένα, χρήσει κλασσικών τεχνικών ανίχνευσης κοινοτικής δομής και εφαρμογής τους στο σχηματιζόμενο γράφων οντοτήτων της Wikipedia, αναπτύσσεται ένα σύστημα αποσαφήνισης που παρουσίασε εντυπωσιακή ακρίβεια και απόδοση. [1] Santo Fortunato: Community detection in graphs. CoRR abs/0906.0612 (2009) [2] Schaeffer, Satu. (2007). Graph Clustering. Computer Science Review. 1. 27-64. 10.1016/j.cosrev.2007.05.001. [3] Devi, J. & Eswaran, Poovammal. (2016). An Analysis of Overlapping Community Detection Algorithms in Social Networks. Procedia Computer Science. 89. 349-358. 10.1016/j.procs.2016.06.082. [4] Stavros Souravlas, Angelo Sifaleras, M. Tsintogianni, Stefanos Katsavounis: A classification of community detection methods in social networks: a survey. Int.J. Gen. Syst. 50(1): 63-91 (2021), doi: 10.1080/03081079.2020.1863394 [5] Cai, Q., Ma, L., Gong, M. and Tian, D. (2016) A survey on network community detection based on evolutionary computation, Int. J. Bio-Inspired Computation, Vol. 8, No. 2, pp.84–98, DOI: 10.1504/IJBIC.2016.076329. [6] Bisma S. Khan, Muaz A. Niazi: Network Community Detection: A Review and Visual Survey. CoRR abs/1708.00977 (2017) [7] Michele Coscia, Fosca Giannotti, Dino Pedreschi: A Classification for Community Discovery Methods in Complex Networks. CoRR abs/1206.3552 (2012) [8] Papadopoulos, S., Kompatsiaris, Y., Vakali, A. et al. Community detection in Social Media. Data Min Knowl Disc 24, 515–554 (2012). https://doi.org/10.1007/s10618-011-0224-z [9] Javed, M.A., Younis, M.S., Latif, S., Qadir, J., Baig, A., Community detection in networks: A multidisciplinary review, Journal of Network and Computer Applications (2018), doi: 10.1016/j.jnca.2018.02.011. [10] Andrea Lancichinetti, Mikko Kivelä, Jari Saramäki, Santo Fortunato: Characterizing the community structure of complex networks. CoRR abs/1005.4376 (2010) [11] Khediri, Nourhene & Karoui, Wafa. (2017). Community Detection in Social Network with Node Attributes Based on Formal Concept Analysis, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications, 1346-1353. 10.1109/AICCSA.2017.200. [12] Yuan Li: Community Detection with Node Attributes and its Generalization. CoRR abs/1604.03601 (2016) [13] P. Held, B. Krause and R. Kruse, "Dynamic Clustering in Social Networks Using Louvain and Infomap Method," 2016 Third European Network Intelligence Conference (ENIC), 2016, pp. 61-68, doi: 10.1109/ENIC.2016.017. [14] Newman MEJ. Modularity and community structure in networks. PNAS June 6, 2006 103 (23) 8577-8582; https://doi.org/10.1073/pnas.0601602103 [15] Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre: Fast unfolding of community hierarchies in large networks. CoRR abs/0803.0476 (2008) [16] E.J. Newman, Mark & Girvan, Michelle. (2004). Finding and Evaluating Community Structure in Networks. Physical review. E, Statistical, nonlinear, and soft matter physics. 69. 026113. 10.1103/PhysRevE.69.026113. [17] Peel, L.; Larremore, D.B.; Clauset, A. The ground truth about metadata and community detection in networks. Sci. Adv. 2017, 3, e1602548, doi:10.1126/sciadv.1602548. [18] Makris, C.; Pettas, D.; Pispirigos, G. Distributed Community Prediction for Social Graphs Based on Louvain Algorithm. In IFIP International Conference on Artificial Intelligence Applications and Innovations; Springer: Cham, Switzerland, 2019; pp. 500–511. DOI: 10.1007/978-3-030-19823-7_42 [19] Makris, C.; Pispirigos, G.; Rizos, I.O. A Distributed Bagging Ensemble Methodology for Community Prediction in Social Networks. Information 2020, 11, 199. https://doi.org/10.3390/info11040199 [20] Makris, C.; Pispirigos, G. Stacked Community Prediction: A Distributed Stacking-Based Community Extraction Methodology for Large Scale Social Networks. Big Data Cogn. Comput. 2021, 5, 14. https://doi.org/10.3390/bdcc5010014 [21] Konstantinos Georgiou, Christos Makris, Georgios Pispirigos: A Distributed Hybrid Community Detection Methodology for Social Networks. Algorithms 12(8): 175 (2019). https://doi.org/10.3390/a12080175 [22] Eleanna Kafeza, Andreas Kanavos, Christos Makris, Georgios Pispirigos, Pantelis Vikatos: T-PCCE: Twitter Personality based Communicative Communities Extraction System for Big Data. IEEE Trans. Knowl. Data Eng. 32(8): 1625-1638 (2020). DOI: 10.1109/TKDE.2019.2906197 [23] Makris, C.; Pispirigos, G.; Simos, M.A. Text Semantic Annotation: A Distributed Methodology Based on Community Coherence. Algorithms 2020, 13, 160. https://doi.org/10.3390/a13070160 2021-09-01T06:51:46Z 2021-09-01T06:51:46Z 2021-08-27 http://hdl.handle.net/10889/15160 en application/pdf