FREQY, an integrated software for revealing single nucleotide polymorphisms upon ENSEMBL database population comparisons

Modern science revealed the ultimate importance of single nucleotide polymorphisms, directing several research programs towards the unraveling of their contribution in living organisms’ functions. Towards this direction, Dr. Patrinos laboratory grant me access to real data of the genome of 14 Greek...

Full description

Bibliographic Details
Main Author: Λουκάτου, Στυλιανή
Other Authors: Πατρινός, Γεώργιος
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10889/11767
Description
Summary:Modern science revealed the ultimate importance of single nucleotide polymorphisms, directing several research programs towards the unraveling of their contribution in living organisms’ functions. Towards this direction, Dr. Patrinos laboratory grant me access to real data of the genome of 14 Greek individuals, initially aiming to provide a statistical analysis upon them, thus initiating the development of the below mentioned software platform, Freqy, which enables the user to interact with genomic databases towards data mining and statistical analysis of population data. After this statistical analysis, the main goal of the thesis turned to a wider scope, to the development of an integrated bioinformatics tool that falls within the Comparative Genomics field, as its main functionality is the comparison of the genomic features of different populations. Comparative Genomics reveal similarities or differences between different organisms or populations. By using comparative genomics on single nucleotide polymorphisms, many nominated candidates related to a specific phenotype or functionality may arise. Freqy, enables the user to interact with genomic databases towards data mining and statistical analysis of population data. The statistical analysis performed by this software is based on Pearson-Chi square test, a formula that determines the association between categorical variables. Furthermore, two plot types can be created by using this tool to empower data visualization. The first graph is a bar plot that visualizes the frequencies of SNPs observed among the population of interest and the second one is a Heat Map plot which visualizes the calculated p-values per SNP of the sample data in contrast to the populations of interest. This program can interact with both Ensembl and 1000Genome databases, granting access to the most updated data. User can obtain many crucial data according to the available options given to him. To create this software, we used Matlab programming language and tested its functionality on the real data obtained by Dr Patrinos wet lab techniques. We performed statistical analysis on these 14 Greek individuals in contrast of the TSI and CEU population. TSI and CEU population SNPs datasets were mined from Ensembl database through this tool and were statistically analyzed via the software. Afterwards, both type of plots were created and the outcome was analyzed further in the below presented study case. Freqy is an innovative tool, which can contribute in the analysis of human genome, indicating SNPs among different populations with similar or non-similar observation frequencies. The use of this software may give a guide to scientists on which SNPs to focus each time according to their research purpose.