Rank aggregation of partial rankings : algorithms and applications

In the last few years, more and more applications on the internet exploit the power that users (or agents) have, in order to rate, grade, vote, review, and rank sets of services, local businesses, software, etc. (or, more generally, alternatives). Of course, the aim of these applications is to prov...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Κριμπάς, Γεώργιος
Άλλοι συγγραφείς: Καραγιάννης, Ιωάννης
Μορφή: Thesis
Γλώσσα:English
Έκδοση: 2019
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/12329
Περιγραφή
Περίληψη:In the last few years, more and more applications on the internet exploit the power that users (or agents) have, in order to rate, grade, vote, review, and rank sets of services, local businesses, software, etc. (or, more generally, alternatives). Of course, the aim of these applications is to provide better information, and services to the general public. Put it simply, users have knowledge upon different aspects, and internet applications want to use this knowledge which needs to be aggregated. Voting and in general Social Choice Theory from Microeconomics equips us with methods and procedures in order to aggregate this information. In this thesis, we study the application of aggregation methods upon rankings in order to recover a ground truth. We make use variations of simple voting rules that are very well-known in Social Choice Theory, which help us aggregate small in size partial or incomplete rankings, into a global one. An immediate application comes from peer grading in Massive Open Online Courses (MOOCs). These platforms are used extensively from students all around the world, and in most cases, the grading is performed by the students that are enrolled in a course themselves. The idea behind peer grading in MOOCs is that every student gets a small in size bundle of exam papers (of other students) to grade with the restriction that no student will get its exam paper to grade. We are particularly interested in grading where students do not use numerical scores, but they provide a ranking of the exam papers in their bundle from the best to the worst. Then voting rules are applied so as to aggregate these rankings in order to come up with a full ranking of all exam papers. The efficiency objective requires this full ranking to be as close as possible to the ground truth. We prove that a very simple voting rule, specifically Borda count, can recover a large fraction of the true ranking. We also define and study a large class of simple aggregation rules, which we call Type-Ordering Aggregation Rules. We present a theoretical framework for assessing the performance of each member of this class. We infer statistical information about the grading behaviour of students and then our framework can serve as an optimization toolkit for selecting the optimal Type-Ordering Aggregation Rule. This requires an exact solution to an instance of the feedback arc set problem which, albeit NP-hard in general, can be solved exactly for the instances that do arise in practice. Our theoretical framework allows us to obtain a series of results. We prove that Borda rule is the optimal Type-Ordering Aggregation Rule when students act as perfect graders, and its performance is always extremely close to optimality. Furthermore, the decision of the optimal aggregation rule strongly depends on the information about the grading behaviour of the students. Finally, we design and analyse algorithms for the aggregation of incomplete rankings that are useful in rating applications. Applications of this kind can be found in recommendation systems and platforms. Here, every user can provide us with an incomplete ranking of the restaurants, hotels, or movies that she has used as a product. Then we design algorithms to find the best scoring rule in order to aggregate the individual rankings. The assumption here is that we have some knowledge about the correct comparisons a priori, which are used as constraints. Given these constraints and the individual rankings, the optimisation problem is to find the best positional scoring rule OptPSR) which is NP-hard. We design an algorithm which is exact and solves OptPSR in time that depends exponentially only on the parameter d, where d is the number of alternatives that each user ranks. Hence, our algorithm runs in polynomial time when d is constant. Then we seek to approximately solve OptPSR, and we prove that a simple combination of t-approval voting rules yields a 1/d-approximate solution. Then, we design a more sophisticated approximation algorithm, which is parameterized by a positive integer k and yields a k/d-approximate solution. Last, we prove that OptPSR is hard to approximate and present an explicit inapproximability bound of 23/24. In addition to theoretical results, the thesis contains simulation and experiments that aim to test the simplicity and the performance of our algorithms in practical scenarios. We use synthetic data that we have generated, and more importantly, we have designed real-world experiments (or field experiments), in which we managed to extract specific information from real users in order to create datasets that we use in extensive simulations to verify or complement our theoretical results.