Efficient algorithms for big data management

In the context of the doctoral research, I dealt with data management problems by developing methods and techniques that, on the one hand, maintain or improve the privacy and anonymity of users and, on the other hand, are efficient in terms of time and storage space for large volumes of databases....

Full description

Bibliographic Details
Main Author:	Δρίτσας, Ηλίας
Other Authors:	Dritsas, Elias
Language:	English
Published:	2020
Subjects:	Bloom filters Privacy preserving K-NN queries K-anonymity Spatiotemporal batabases Sentiment analysis Twitter Apache spark Geospark Φίλτρα bloom Διατήρηση ιδιωτικότητας Ερωτήματα κοντινότερων γειτόνων k ανωνυμία Χωροχρονικές βάσεις δεδομένων Ανάλυση συναισθήματος
Online Access:	http://hdl.handle.net/10889/14050

id	nemertes-10889-14050
record_format	dspace
spelling	nemertes-10889-140502022-09-05T20:46:41Z Efficient algorithms for big data management Αποδοτικοί αλγόριθμοι διαχείρισης μεγάλου όγκου δεδομένων Δρίτσας, Ηλίας Dritsas, Elias Bloom filters Privacy preserving K-NN queries K-anonymity Spatiotemporal batabases Sentiment analysis Twitter Apache spark Geospark Φίλτρα bloom Διατήρηση ιδιωτικότητας Ερωτήματα κοντινότερων γειτόνων k ανωνυμία Χωροχρονικές βάσεις δεδομένων Ανάλυση συναισθήματος In the context of the doctoral research, I dealt with data management problems by developing methods and techniques that, on the one hand, maintain or improve the privacy and anonymity of users and, on the other hand, are efficient in terms of time and storage space for large volumes of databases. The research results of the work focus on the following: Evaluate the performance of queries in a large volume database using or not the Bloom Filter structure. Evaluate workload time, memory and disk usage of the Privacy Preserving Record Linkage (PPRL) problem in Hadoop MapReduce Framework. Methods of answering queries of nearest neighbors to spatio-temporal data (moving users trajectories) in order to preserve anonymity, where queries are applied to clustered or non-clustered data. The k anonymity method was used, where, the set of anonymity with which each moving object of the space-time database is being camouflaged, consists of its k nearest neighbors. The robustness of the method was quantified with a probability of 1/k and the effect of dimensionality and correlation of the data on the preservation of anonymity and privacy was studied. The above method was improved in terms of efficient storage of spatio-temporal data by applying queries of nearest neighbors to Hough transformed nonlinear trajectories of moving objects. The application of secure k-NN queries was evaluated in the GeoSpark environment. Sentiment Analysis on Twitter Data and Tourist Forecasting at Apache Spark Στο πλαίσιο της διδακτορικής έρευνας, ασχολήθηκα με προβλήματα διαχείρισης δεδομένων αναπτύσσοντας μεθόδους και τεχνικές που, αφενός, διατηρούν ή βελτιώνουν το απόρρητο και την ανωνυμία των χρηστών και, από την άλλη πλευρά, είναι αποτελεσματικές ως προς το χρόνο και τον αποθηκευτικό χώρο για μεγάλου όγκου βάσεων δεδομένων. Τα ερευνητικά αποτελέσματα της εργασίας επικεντρώνονται στα ακόλουθα: Αξιολόγηση της απόδοσης των ερωτημάτων σε μια βάση δεδομένων μεγάλου όγκου χρησιμοποιώντας ή όχι τη δομή φίλτρου Bloom. Αξιολόγηση του χρόνου φόρτου εργασίας, μνήμης και χρήσης δίσκου της Προστατευόμενης Διασύνδεσης Εγγραφών (PPRL) στο Hadoop MapReduce Πλαίσιο. Μέθοδοι απάντησης ερωτημάτων κοντινότερων γειτόνων σε χωροχρονικά δεδομένα (τροχιές κινούμενων χρηστών) για να διατηρηθεί η ανωνυμία, όπου εφαρμόζονται ερωτήματα σε συσταδοποιημένα ή μη δεδομένα. Χρησιμοποιήθηκε η μέθοδος k ανωνυμίας, όπου, το σύνολο της ανωνυμίας με το οποίο το κάθε το κινούμενο αντικείμενο της χωροχρονικής βάσης δεδομένων καλύπτεται, αποτελείται από τους k πλησιέστερους γείτονες του. Η σθεναρότητα της μεθόδου ποσοτικοποιήθηκε με την πιθανότητα 1/k και μελετήθηκε η επίδραση της διαστασιμότητας και του συσχετισμού των δεδομένων στη διατήρηση της ανωνυμίας και της ιδιωτικότητας. Η παραπάνω μέθοδος βελτιώθηκε όσον αφορά την αποτελεσματική αποθήκευση χωροχρονικών δεδομένα εφαρμόζοντας ερωτήματα πλησιέστερων γειτόνων στις κατά Hough μετασχηματισμένες μη γραμμικές τροχιές κινούμενων αντικειμένων. Αξιολογήθηκε η εφαρμογή ασφαλών ερωτημάτων k-NN στο περιβάλλον GeoSpark. Ανάλυση συναισθημάτων σε δεδομένα Twitter και πρόβλεψη τουριστικής ζήτησης στο Apache Spark. 2020-10-21T11:30:55Z 2020-10-21T11:30:55Z 2020-08-31 http://hdl.handle.net/10889/14050 en application/pdf
institution	UPatras
collection	Nemertes
language	English
topic	Bloom filters Privacy preserving K-NN queries K-anonymity Spatiotemporal batabases Sentiment analysis Twitter Apache spark Geospark Φίλτρα bloom Διατήρηση ιδιωτικότητας Ερωτήματα κοντινότερων γειτόνων k ανωνυμία Χωροχρονικές βάσεις δεδομένων Ανάλυση συναισθήματος
spellingShingle	Bloom filters Privacy preserving K-NN queries K-anonymity Spatiotemporal batabases Sentiment analysis Twitter Apache spark Geospark Φίλτρα bloom Διατήρηση ιδιωτικότητας Ερωτήματα κοντινότερων γειτόνων k ανωνυμία Χωροχρονικές βάσεις δεδομένων Ανάλυση συναισθήματος Δρίτσας, Ηλίας Efficient algorithms for big data management
description	In the context of the doctoral research, I dealt with data management problems by developing methods and techniques that, on the one hand, maintain or improve the privacy and anonymity of users and, on the other hand, are efficient in terms of time and storage space for large volumes of databases. The research results of the work focus on the following: Evaluate the performance of queries in a large volume database using or not the Bloom Filter structure. Evaluate workload time, memory and disk usage of the Privacy Preserving Record Linkage (PPRL) problem in Hadoop MapReduce Framework. Methods of answering queries of nearest neighbors to spatio-temporal data (moving users trajectories) in order to preserve anonymity, where queries are applied to clustered or non-clustered data. The k anonymity method was used, where, the set of anonymity with which each moving object of the space-time database is being camouflaged, consists of its k nearest neighbors. The robustness of the method was quantified with a probability of 1/k and the effect of dimensionality and correlation of the data on the preservation of anonymity and privacy was studied. The above method was improved in terms of efficient storage of spatio-temporal data by applying queries of nearest neighbors to Hough transformed nonlinear trajectories of moving objects. The application of secure k-NN queries was evaluated in the GeoSpark environment. Sentiment Analysis on Twitter Data and Tourist Forecasting at Apache Spark
author2	Dritsas, Elias
author_facet	Dritsas, Elias Δρίτσας, Ηλίας
author	Δρίτσας, Ηλίας
author_sort	Δρίτσας, Ηλίας
title	Efficient algorithms for big data management
title_short	Efficient algorithms for big data management
title_full	Efficient algorithms for big data management
title_fullStr	Efficient algorithms for big data management
title_full_unstemmed	Efficient algorithms for big data management
title_sort	efficient algorithms for big data management
publishDate	2020
url	http://hdl.handle.net/10889/14050
work_keys_str_mv	AT dritsasēlias efficientalgorithmsforbigdatamanagement AT dritsasēlias apodotikoialgorithmoidiacheirisēsmegalouonkoudedomenōn
_version_	1771297327359721472

Efficient algorithms for big data management

Similar Items