Freitas, Alex Alves (2011) A data mining-based approach for investigating the relationship between DNA repair genes and ageing. Masters thesis, University of Liverpool.
|PDF (Renamed file) - Accepted Version |
Available under License Creative Commons Attribution No Derivatives.
There is a clear motivation for ageing research, since ageing is the greatest risk factor for many diseases, including most types of cancer. Arguably, another strong motivation for ageing research is that, despite the large progress in this area in the last two decades, ageing is still to a large extent a poorly understood process, especially in humans. The vast majority of biogerontology research is still based on “wet lab” experiments done with simpler organisms, due to the problems associated with performing ageing-related experiments with humans. In contrast, this thesis proposes a data mining approach, based on classification algorithms, for analysing data about human DNA repair genes and their relationship to ageing. The classification algorithms – more precisely, decision tree induction and Naive Bayes algorithms – were applied to datasets prepared specifically for this research, by adapting and integrating data from several bioinformatics resources, namely: (a) the GenAge database of ageing-related genes; (b) a web site with a comprehensive list of human DNA repair genes; (c) Uniprot, a centralized repository of richly-annotated data about proteins; (d) the HPRD (Human Protein Reference Database); and (e) the Gene Ontology – a controlled vocabulary for describing gene or protein functions. Some experiments also used a separate dataset including gene expression data. Applying classification algorithms to such datasets aimed at producing classification models that identify which gene properties are most effective in discriminating ageing-related DNA repair genes from other types of genes – mainly non-ageing-related DNA repair genes, but in some experiments the other types of genes also included genes whose protein product interact with DNA repair genes. A related goal of this research was to analyse the automatically-built classification models from two perspectives, namely: (a) measuring the predictive accuracy (or “generalization ability”) of those models from a data mining perspective; and (b) interpreting the meaning of the main gene properties relevant for classification in those models, in the light of biological knowledge about DNA repair genes and the process of ageing. In summary, the main gene properties that were found effective in discriminating ageing-related DNA repair genes from other types of genes (mainly non-ageing-related DNA repair genes) in the datasets created in this research are as follows: ageing-related DNA repair genes’ protein products tend to interact with a considerably larger number of proteins; their protein products are much more likely to interact with WRN (a protein whose defect causes the Werner’s progeroid syndrome) and XRCC5 (KU80, a key protein in the initiation of DNA double-strand repair by the error-prone non-homologous end joining DNA repair pathway); they are more likely to be involved in response to chemical stimulus and, to a lesser extent, in response to endogenous stimulus or oxidative stress; and they are more likely to have high expression in T lymphocytes.
|Item Type:||Thesis (Masters)|
|Uncontrolled Keywords:||ageing; DNA repair; data mining; bioinformatics|
|Subjects:||Q Science > QH Natural history > QH301 Biology|
|Departments, Research Centres and Related Units:||Academic Faculties, Institutes and Research Centres > Faculty of Science > Department of Biological Sciences|
|Deposited On:||22 Aug 2011 16:45|
|Last Modified:||22 Aug 2011 16:45|
Repository Staff Only: item control page