Improved supervised prediction of aging-related genes via weighted dynamic network analysis

Contact: Prof. Tijana Milenković


References:

1. Qi Li, Khalique Newaz, and Tijana Milenković (2021). Improved supervised prediction of aging-related genes via weighted dynamic network analysis. BMC Bioinformatics. : equal contribution.


The considered entire context-unspecific PPI network:

We use the largest connected component of HPRD and BioGRID entire human PPI network data. The largest connected component of HPRD can be downloaded from here, and the largest connected component of BioGRID can be downloaded from here. Note that, for genes in HPRD-based (sub)networks, we use Entrez ID representations; for genes in BioGRID-based (sub)networks, we use gene symbol representations. If use our inferred aging-specific subnetworks, please also cite the corresponding paper from the following papers.
1. The original HPRD can be downloaded from Human Protein Reference Database (HPRD). "Prasad, T. et al. (2009). Human Protein Reference Database—2009 update. Nucleic Acids Research, 37(suppl_1), D767–D772."
2. The original BioGRID can be downloaded from A General Repository for Interaction Datasets (HPRD). "Stark, C. et al. (2006). Biogrid: A General Repository for Interaction Datasets. Nucleic Acids Research. 34:D535-9."


Seven considered aging-specific subnetworks:

1. The seven considered HPRD-based aging-specific subnetworks (i.e., Induced-Dynamic, Induced-Static, NetWalk-Dynamic, NetWalk-Static, NetWalk-Static*, wNetWalk-Dynamic, wNetWalk-Static*) we use in this paper can be downloaded here.
2. The seven considered BioGRID-based aging-specific subnetworks (i.e., Induced-Dynamic, Induced-Static, NetWalk-Dynamic, NetWalk-Static, NetWalk-Static*, wNetWalk-Dynamic, wNetWalk-Static*) we use in this paper can be downloaded here.


Six considered aging-related ground truth data:

1. GenAge genes can be downloaded from https://genomics.senescence.info/genes/. "Tacutu, R. et al. (2017). Human Ageing Genomic Resources: new and updated databases. Nucleic Acids Research, 46(D1), D1083–D1090."
2. GTEx-DAG genes and GTEx-UAG genes can be downloaded from the Supplementary data of "Jia, K. et al. (2018). An analysis of aging-related genes derived from the genotype-tissue expression project (GTEx). Cell Death Discovery, 5(1), 26."
3. 442 BEx2004 genes whose expressions vary with age can be downloaded from the supplementary data of "Lu, T. et al. (2004). Gene regulation and DNA damage in the ageing human brain. Nature, 429(6994), 883."
4. 8,277 BEx2008 genes whose expressions vary with age can be downloaded from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE11882. "Berchtold, N. C. et al. (2008). Gene expression changes in the course of normal brain aging are sexually dimorphic. Proceedings of the National Academy of Sciences, 105(40), 15605–15610."
5. 2,911 ADEx2011 genes whose expression vary across different stages of Alzheimer's disease can be downloaded from the Supplementary data of "Simpson, J. E. et al. (2011). Microarray analysis of the astrocyte transcriptome in the aging brain: relationship to Alzheimer’s pathology and APOE genotype. Neurobiology of Aging, 32(10), 1795–1807."


Two considered definitions of (non-)aging-related gene labels:

1. There are 185 GenAge-based aging-related genes, 347 GTEx-DAG-based aging-related genes, and 1,485 non-aging-related gene labels for HPRD-based (sub)networks. We provide the Entrez IDs for these genes. There are 227 GenAge-based aging-related genes, and 6,575 non-aging-related gene labels for BioGRID-based (sub)networks. We provide the gene symbols for these genes. All of the five gene sets can be downloaded here.


The considered (existing and proposed) features in this study:

1. To get DGDV of each node in a network, please refer to DGDV.
2. To get GoT of each node in a network, please refer to GoT-WAVE.
3. To get GDC, ECC, KC, DegC of each node in a network, please refer to node centrality.
4. CentraMV works as follows. For a given centrality-based feature (out of GDC, ECC, KC, and DegC), the mean and the corresponding variation are computed over a given node’s 37 centrality values corresponding to the 37 snapshots of the dynamic subnetwork. The mean is self-explanatory, and the variation of node u is var(u) = sum(centrality(u)_{i+1} − centrality(u)_i)/36, i = 1,2,...,36 . These two quantities are computed for each of the four centrality-based features, and the resulting eight values form the CentraMV node feature.
4. To get SGDV of each node in a network, please refer to Orca.
5. To get UniNet of each node in a network, please see "Kerepesi, C. et al. (2018). Prediction and characterization of human ageing-related proteins by using machine learning. Scientific Reports, 8(1), 4094." for details. Our implementation of UniNet can be download here.
6. To get mBPIs of each node in a network, please see "Freitas, A. A. et al. (2011). A data mining approach for classifying DNA repair genes into ageing-related or non-ageing-related. BMC Genomics, 12(1), 27." for details. Our implementation of mBPIs can be download here.
7. To get the existing weighted dynamic features, our proposed weighted dynamic features, and our proposed weighted static features, please use the code here.


The best feature matrices for each of the HPRD-based and BioGRID-based (sub)networks:

1. The feature matrices of the best feature for each of the HPRD-based and BioGRID-based (sub)networks can be download here. Note that, we only includes features for aging-related and non-aging-related genes based on primary GenAge or secondary GTEx-DAG defintions. For each feature file, the first column includes gene IDs or symbols, the second column includes gene labels, and the remaining columns include corresponding features. We also provided the genes for each of the five folds in cross-validation with respect to the combination of network and gene labels. In each partition folder, we include genes for training and testing for each of the five folds. Moreover, in the subfolder, i.e., Validation, we include genes of tuning training and tuning testing dasta.


The aging-related gene predictions:

1. The aging-related gene predictions made by each of the eight considered HPRD-based (sub)networks in this study when using GenAge to define our aging- and non-aging-related gene labels can be found here. We have a flag column to mark whether a prediction is currently known to be aging-related or a novel one.
2. The aging-related gene predictions made by each of the eight considered BioGRID-based (sub)networks in this study when using GenAge to define our aging- and non-aging-related gene labels can be found here. We have a flag column to mark whether a prediction is currently known to be aging-related or a novel one.