Περίληψη: | The present thesis focuses on the analysis of datasets from skin samples related to psoriasis towards the discovery of biomarkers discovery. RNA-seq technologies and microarray datasets and analysis methods were used to conduct disease biomarker discovery which could provide a possible guide for disease diagnosis and prognosis.
Psoriasis is a chronic skin disease of scaling and inflammation that is characterized by the proliferation and abnormal differentiation of keratinocytes, and by the infiltration of TH1 and TH17 cells and DCs, with a spectrum of clinical phenotypes. It affects primarily the skin, nails and occasionally the joints. It occurs when skin cells rise quickly from their origin below the surface of the skin and pile up on the surface before they have a chance to mature. Under healthy conditions, this movement takes about a month, but in psoriasis, it may occur in only a few days. Psoriasis affects 2-4% of the general population. Comparing to other diseases, the accessibility of skin for tissue biopsy provides us with the ability to study the cellular and molecular nature of cutaneous diseases such as psoriasis, allowing the development of effective targeted therapies for the disease.
The datasets that were used for this analysis are derived from skin punch biopsies taken from psoriatic patients (lesional and non lesional samples) and normal healthy controls. The microarray dataset that was used for this part of the analysis is published in Gene Expression Omnibus with DataSet Record GDS4602. It consists of 180 samples derived from skin punch biopsies taken from 58 psoriatic patients and from 64 healthy individuals. Two biopsies were taken from each patient from lesional and non-lesional parts of the skin. Regarding the RNA-seq analysis, two different datasets were used. They are both publicly available in NCBI’s Gene Expression Omnibus platform with DataSet Records GSE74697 and GSE54456. The first dataset consists of 52 samples derived from skin punch biopsies taken from 18 psoriasis patients before and after treatment and from normal skin from 16 healthy individuals. The second Dataset consists of 174 samples derived from 92 psoriatic and 82 normal skin punch biopsies.
The overall analysis on this thesis consists of three main parts including microarray data analysis, RNA-seq data analysis and the integration of all results together with available clinical variables of the dataset in order to extract the final diagnostic biomarkers and the relevant computational predictive models.
The first step of analysis was the application of preprocessing methods for normalization and missing values imputation which were applied to both control and disease-related samples. Then, some statistical tests were conducted for the extraction of differential expression biomarkers in order to define a minimal set of statistically significant genes. Then, an alternative biomarker discovery method was conducted where gene co-expression networks are constructed for both disease and control datasets. After networks construction, a network
vii
based biomarker discovery method was applied to locate genes whose role in the network changes with a statistically significant difference. The last step of the analysis is the integration of various types of biomarkers which is a key step in understanding the mechanisms that underlie the disease.
The analysis led to the uncovering of 35 biomarkers related to psoriasis. Meta-analysis was also conducted on the final dataset of biomarkers through gene ontology enrichment analysis. Finally, the last step of analysis was the training and testing of computational diagnostic models for psoriasis using the final uncovered biomarkers as input.
Every step of the analysis, its contribution to experimental procedure and the variety of tools that can be used will be described as well as the recommended pipeline that should be followed in every case.
|