Investigation of macromolecular interaction interfaces using 3D graph analysis

Bioinformatics is the field of science in which the three disciplines biology, computer science, and information technology merge into a single one and Bioinformaticians are mostly conducting research for designing new algorithms, software, developing and updating databases in order to help in solvi...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριος συγγραφέας: Κομνηνού, Μαρία-Αγγελική
Άλλοι συγγραφείς: Μεγαλοοικονόμου, Βασίλειος
Μορφή: Thesis
Γλώσσα:English
Έκδοση: 2019
Θέματα:
Διαθέσιμο Online:http://hdl.handle.net/10889/12487
Περιγραφή
Περίληψη:Bioinformatics is the field of science in which the three disciplines biology, computer science, and information technology merge into a single one and Bioinformaticians are mostly conducting research for designing new algorithms, software, developing and updating databases in order to help in solving biological problems. Proteins that are organic molecules playing important roles in all aspects of cell structure and function usually do not act alone, but they form complexes with other proteins or biomolecules. The process in which proteins interact with other proteins is known as Protein-Protein Interactions (PPIs) and are formed when their physical contacts cause binding and activation of specific functions. Moreover, the computational tools that have been created until now, focus on finding binding sites for specifically bound molecules or substrates while, some tools for analyzing the three-dimensional structure of proteins offer comparisons to find structural similarity between proteins. Our methodology suggests a novel tool for studying PPIs that occur at the surface of a protein and are governed by electrostatic forces, hydrophobic and hydrophilic properties as well as a plethora of other physicochemical properties. There are a lot of structural databases hosting 3D protein conformational information in the form of Cartesian coordinates. For the purposes of this study the dataset has been retrieved from the Protein Data Bank database (http://www.rcsb.org) which is the ‘gold standard’ repository for structural information of proteins. Chapter 1 is an introduction to the theoretical background of proteins and their structure, the methods of protein classification, their binding sites and the Protein- Protein interactions that can be present. Moreover, in Chapter 2 there is a short introduction to the open databases that exist for protein data, followed by an extensive presentation of the Protein Data Bank database (PDB). Afterwards, in Chapter 3, there is a presentation about Bioinformatics and their importance but also their use in Drug Design. Chapter 4 presents the methodology that is followed, step by step, and Chapter 5 includes the results of the tools that are created while in there is a discussion on them and possible future work. The tool created in this study gives as output four files for each PDB entry: i) ii) iii) pdb_id-v.pdb file, which is a PDB structured file, contains ATOM records of virtual atoms placed in the mid distance between each pair of atoms on different chains with distance less than 7Å, pdb_id-a.pdb file is a PDB structured file that contains ATOM records of the real pair of atoms that interact for distance less than 7Å pdb_id.int file has records for every pair of atoms that interact for distance less than 7Å. Each record has information of the chain, residue name, residue number, atom name, atom number, 3D position (X, Y, Z) and measured distance between the two atoms that interact. At the end the v percentage of appearance of each residue in the whole protein is recorded but also the percentage of distinct residues involved in interfacing over the total number of residues in interface and the % ratio of each residue involved in interfacing over the total number of the same residue appeared in protein, the Mean, Median, Standard Deviation, minimum and maximum of all distances in interface and the Geometric Centroid and Center of Mass of the protein iv) pdb_id.sum file, is a summary of any Pdb_id.int file, that has the percentage of each residue appeared in protein, the percentage of distinct residues involved in the interface over the total number of residues in interface, the Mean, Median, Standard Deviation, minimum and maximum of all distances in interface and the Geometric Centroid and Center of Mass of the protein. The output files of this tool can be used to decode 3D structural preferences and statistical trends of dimeric proteins in a manner that will lead to the production of novel drugs.