Identification of hub genes related to the progression of type 1 diabetes by computational analysis

Background Type 1 diabetes (T1D) is a serious threat to childhood life and has fairly complicated pathogenesis. Profound attempts have been made to enlighten the pathogenesis, but the molecular mechanisms of T1D are still not well known. Methods To identify the candidate genes in the progression of T1D, expression profiling by high throughput sequencing dataset GSE123658 was downloaded from Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified, and gene ontology (GO) and pathway enrichment analyses were performed. The protein-protein interaction network (PPI), modules, target gene - miRNA regulatory network and target gene - TF regulatory network analysis were constructed and analyzed using HIPPIE, miRNet, NetworkAnalyst and Cytoscape. Finally, validation of hub genes was conducted by using ROC (Receiver operating characteristic) curve and RT-PCR analysis. A molecular docking study was performed. Results A total of 284 DEGs were identified, consisting of 142 up regulated genes and 142 down regulated genes. The gene ontology (GO) and pathways of the DEGs include cell-cell signaling, vesicle fusion, plasma membrane, signaling receptor activity, lipid binding, signaling by GPCR and innate immune system. Four hub genes were identified and biological process analysis revealed that these genes were mainly enriched in cell-cell signaling, cytokine signaling in immune system, signaling by GPCR and innate immune system. ROC curve and RT-PCR analysis showed that EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN might be involved in the advancement of T1D. Molecular docking studies showed high docking score. Conclusions DEGs and hub genes identified in the present investigation help us understand the molecular mechanisms underlying the advancement of T1D, and provide candidate targets for diagnosis and treatment of T1D.


Introduction
Type 1 diabetes (T1D) (insulin-dependent) is a core challenge for endocrine research around the world [1]. Approximately 5 to 10% of the childhood population is affected with T1D worldwide [2]. T1D affects the eyes, kidneys, heart, peripheral and autonomic nervous systems [3]. Pancreatic cells, particularly β-cells, play a key role in the occurrence and progression of T1D [4]. Treatment for T1D includes targeting β-cells and β-cells regeneration [5]. However, T1D is a complex disease and its biology remains poorly understood [6].
It worth a lot of money and time to identify disease related molecular biomarkers by experiment alone. With the wide application of expression profiling by high throughput sequencing data, there were huge genomics data deposited in public databases [20]. The progression of computational tools gives us an alternative method to diagnose novel molecular biomarkers.
In this investigation, we employed the bioinformatics approach to discover the differentially expressed genes between T1D patients and healthy donors. Original expression profiling by high throughput sequencing dataset GSE123658 was downloaded. 39 T1D patients' samples and 43 healthy donors' samples were analyzed in our investigation. Commonly altered DEGs were isolated from integrated data. Additionally, GO/ REACTOME pathway analysis, construction of protein-protein interaction network, modules, target gene -miRNA regulatory network and target gene -TF regulatory network analysis were performed to analyze these data. Four hub genes (EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN) were identified. ROC (receiver operating characteristic) curve and RT-PCR analysis were used to verify clinically relevant hub genes. The aim of this investigation was to gain a better understanding of the underlying molecular mechanisms and to discover molecular biomarkers for T1D.

Data resources
Expression profiling by high throughput sequencing dataset GSE123658 was downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) [21]. CPM count normalization performed on the original dataset GSE123658 from GEO databse using package edgeR package [22], voom function [23], and Limma [24] of R software. The data was produced using a GPL18573 Illumina NextSeq 500 (Homo sapiens). The GSE123658 dataset contained data from 82 samples, including 39 T1D patients' samples and 43 healthy donors' samples.

Identification of DEGs
The identification of DEGs between 39 T1D patients' samples and 43 healthy donors' samples was performed using Limma package in R bioconductor. lmFit function in the limma package to construct linear model for individual gene [25]. makeContrasts function in the limma package to compose similarity between T1D and healthy donors groups (log fold-changes) are obtained as contrasts of these fitted linear model. eBayes is a function in limma package which figure out empirical Bayes predicts of DEGs [26]. topTable function in limma package to obtain a table of the most significant Up and down regulated genes from a eBayes model fit. To correct the discovery of statistically important molecular biomarkers and limitations of false-positives, we using the adjusted P-value and Benjamini and Hochberg false discovery rate method [27]. Fold-change (FC) and adjust p-values were used to found DEGs. A |log2FC| > 0.94 for up regulated genes, |log2FC| -0.39 for down regulated genes and Pvalue < 0.05 were used as considered statistically significant. The volcano plot was implemented using ggplot2 package [28], and the heat map was established using gplots package in R language.

Gene Ontology (GO) and pathway enrichment analyses of DEGs
Gene Ontology (GO) (http://www.geneontology.org) analysis is a routine analysis for annotating genes and determining biological component, including biological process (BP), cellular component (CC) and molecular function (MF) [29]. REACTOME (https://reactome.org/) [30] pathway database is applied for classification by correlating gene sets into their respective pathways. The ToppGene (ToppFun) (https://toppgene.cchmc.org/ enrichment.jsp) [31] is a gene functional classification tool that objective to provide a extensive set of functional annotation tools for authors to recognize the biological explanation behind large lists of genes. P < 0.05 was find statistically significant.

PPI network construction and module analysis
The online Human Integrated Protein-Protein Interaction rEference (HIPPIE) (http://cbdm.uni-mainz.de/ hippie/) [32] online database was using to predicted the PPI network information. Analyzing the interactions and functions between DEGs may provide information about the mechanisms of generation and development of disease (PPI score > 0.4). Cytoscape 3.8.0 (http://www. cytoscape.org/) [33] is a bioinformatics platform for constructing and visualizing molecular interaction networks. The Network Analyzer Cytoscape plug-in was used to find hub genes were screened using highest node degree [34], betweenness centrality [35], stress centrality [36] and closeness centrality [37] methods and hub genes were further analyzed for pathway and GO enrichment analysis. The plug-in PEWCC1 (http://apps.cytoscape. org/apps/PEWCC1) [38] of Cytoscape was applied to detect densely connected regions in PPI networks. The PPI networks were constructed using Cytoscape and the most key module in the PPI networks was preferred using PEWCC1. The criteria for selection were set as follows: Max depth = 100, degree cut-off = 2, Node score cut-off = 0.2, PEWCC1 scores >5, and K-score = 2.
Construction of miRNA -target regulatory network miRNet database (https://www.mirnet.ca/) [39] online database was used to predict miRNAs that targeted the DEGs associated with T1D. The DEGs were selected according to the screening criterion of P value <0.05. The results were exported into the Cytoscape software for analysis. The target genes -miRNA regulatory network was constructed through network topology prosperities. The node degree was determined using the Network analysis plugin, and miRNAs with a node degree >12.

Construction of TF -target regulatory network
NetworkAnalyst database (https://www.networkanalyst. ca/) [40] online database was used to predict TFs that targeted the DEGs associated with T1D. The DEGs were selected according to the screening criterion of P value <0.05. The results were exported into the Cytoscape software for analysis. The target genes -TF regulatory network was constructed through network topology prosperities. The node degree was determined using the Network analysis plugin, and TFs with a node degree >12.

Validation of hub genes
Receiver operating characteristic (ROC) curve analysis [41] was implemented to calculate the sensitivity and specificity of the DEGs for T1D diagnosis using the R package by using the generalized linear model (GLM) in machine learning algorithms [42]. An area under the curve (AUC) value was determined and used to label the ROC effect. The RIN-m5F (ATCC CRL-11605) cell line procured from ATCC for T1D. RINm5F cell line was grown in RPMI-1640 medium added with 10% fetal bovine serum and penicillin/streptomycin. Incubate this cell line at 37°C in a 5% CO2 in humidified cell culture incubator. The HITT15 (ATCC CRL1777) cell line procured from ATCC for normal. CRL-1777 cell line was grown in Ham's F12K medium added with 10% fetal Table 1 The sequences of primers for quantitative RT-PCR                      bovine serum,87.5%; dialyzedhorse serum, 2 mM Lglutamine, 1.5 g/L sodium bicarbonate, and penicillin/ streptomycin. Incubate the this cell line at 37°C in a 5% CO2 in humidified cell culture incubator. In RT-PCR analysis, total RNA was isolated from in vitro cultured cells of diabetics and normal using a TRI Reagent (Sigma, USA). cDNA was synthesized using 2.0 μg of total RNA with the FastQuant RT kit (with gDNase; Tiangen Biotech Co., Ltd.). The relative mRNA expression was measured in the QuantStudio 7 Flex real-time PCR system (Thermo Fisher Scientific, Waltham, MA, USA). The relative expression levels were determined by the 2 -ΔΔCt method and normalized to internal control βactin [42]. All RT-PCR reactions were performed in triplicate. The primers used to explore mRNA expression of ten hub genes were shown in Table 1.

Molecular Docking studies
The surflex-docking studies of the designed molecule, standard on over expressed genes of PDB protein were performed using SYBYL-X 2.0 perpetual software. All the designed molecules and standard were outlined using ChemDraw Software, imported and saved in sdf. templet using open babel free software. The protein structures of EGFR (epidermal growth factor receptor) and its co-crystallised protein of PDB code 2XYJ, 2XYX and GRIN2B its co-crystallised protein of PDB code and its co-crystallised protein of PDB code 5EWL, 6E7V were extracted from Protein Data Bank [43][44][45][46]. Together with the TRIPOS force field, GasteigerHuckel (GH) charges were added to all designed molecules and standard for the structure optimization process. Additionally, using MMFF94s and MMFF94 algorithm methods, energy minimization was done. The protein preparation was carried out after incorporation of protein. The co-crystallized ligand and all water molecules were removed from the crystal structure; more hydrogen's were added and the side chain was set. TRIPOS force field was used for the minimization of structure. The compounds' interaction efficiency with the receptor was represented by the Surflex-Dock score in kcal / mol units. The interaction between the protein and the ligand, the best pose was incorporated into the molecular area. The visualization of ligand interaction with receptor is done by using discovery studio visualizer.

Identification of DEGs
With a threshold of P-value <0.05 and absolute value of |log2FC| > 0.94 for up regulated genes, |log2FC| -0.39 for down regulated genes, DEGs were identified from dataset. 284 DEGs were screened from the GSE123658 dataset, which consisted of 142 up regulated genes and 142 down regulated genes and only top ten up regulated genes and down regulated genes are listed in Table 2. Volcano plots were used to visualize differential expression of genes between the T1D group and healthy donors group (Fig.1). A heat map showed expression profiling of DEGs in the analysis result (Fig.2).

Gene Ontology (GO) and pathway enrichment analyses of DEGs
To identify the pathways which had the most significant involvement with the genes identified, up regulated and down regulated genes are listed in Table 3 and Table 4. DEGs were submitted into ToppGene for GO and REACTOME pathway enrichment analysis. GO enrichment analysis revealed that in BP terms, the up regulated genes were mainly enriched in cell-cell signaling and reproductive process, whereas down regulated genes were mainly enriched in vesicle fusion and biological adhesion. In CC terms, up regulated genes were mainly enriched in integral component of plasma membrane and supramolecularfiber, whereas down regulated genes were mainly enriched in whole membrane and cell junction. In MF terms, up regulated genes were mainly enriched in signaling receptor activity and structural molecule activity, whereas down regulated genes were mainly enriched in lipid binding and GTPase binding. REACTOME pathway analysis demonstrated that up regulated genes were significantly enriched in signaling by GPCR and muscle contraction, whereas down regulated genes were mainly enriched in innate immune system and cytokine signaling in immune system.

PPI network construction and module analysis
Interactions between the identified up and down regulated genes were reported by constructing a PPI network. In total, there were 5017 nodes and 8133 edges in the network (Fig.3a). According to node degree, betweenness centrality, stress centrality and closeness centrality levels,     Table 5. Significant modules were subsequently constructed with 12 nodes and 23 edges for up regulated genes (Fig.3b) and 5 nodes and 10 edges for down regulated genes, which gained the highest PEWCC1 score (Fig.3c). Subsequent functional enrichment analysis revealed that the genes in these modules were mainly enriched in cell-cell signaling and cytokine signaling in immune system.

Prediction of key miRNAs
The regulatory relationships between the target genes and their miRNAs were established using Cytoscape, which showed that the single gene was regulated by multiple miRNAs is shown in Fig.4a Table 6. Integrating with the result of REAC TOME pathway analysis, it was indicated that these key target genes -miRNA network was mainly involved in the signaling by GPCR and innate immune system.

Prediction of key TFs
The regulatory relationships between the target genes and their TFs were established using Cytoscape, which showed  Table 6. Integrating with the result of REACTOME pathway analysis, it was indicated that these key target genes -TF network was mainly involved in the signaling by GPCR and innate immune system.

Validation of hub genes
As these 4 genes are remarkably expressed in T1D, we executed a ROC curve analysis to calculate their sensitivity and specificity for the diagnosis of T1D. As shown in Fig. 5 EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN achieved an AUC value of >0.982, demonstrating that these genes have high sensitivity and specificity for T1D diagnosis. We further used RT-PCR to detect the mRNA expression of the hub gene. The 10 hub genes contain two up regulated genes (EGFR, GRIN2B, GJA1, CAP2 and MIF) and two down regulated gene (POLR2A, PRKACA, GABARAP, TLN1 and PXN). The RT-PCR data showed that although the trend of expression patterns of these 10 hub genes were consistent, among these up regulated genes, only EGFR, GRIN2B, GJA1, CAP2 and MIF were significantly up regulated in T1D. In addition, the expression of POLR2A, PRKACA, GABARAP, TLN1 and PXN were reduced in T1D (Fig. 6).

Discussion
Pathological complications of T1D remains undetermined, hyperglycemia develops to play a key role [47]. Although T1D is rare compared to type 2 diabetics, it might affect in any part of the body, such as the eyes, kidneys, heart, peripheral and autonomic nervous systems [48]. Considering the poor prognosis of T1D, understanding the specific molecular biomarkers of the disease is important for initial diagnosis and therapy to increase survival rates. In this investigation, we examined the expression profiling by high throughput sequencing dataset GSE123658 including T1D group and healthy donors group to identify the molecular mechanism of T1D and seek some molecular biomarkers. Bioinformatics analysis of these biological factors is applied to explore genes that are favorable to treatment. In total 284 DEGs, 142 up regulated and 142 down regulated genes were identified. Polymorphic gene ARMS2 has an important role in the advancement of T1D [49]. Polymorphic gene AS3MT was reported to be associated with progression of T1D [50]. Wang et al [51] also reported that CRHR2 are abnormally expressed in hypertension patients, but this gene might be induces hypertension in T1D patients. Investigation has indicated that RNF122 decrease expression is associated with hyperactivity disorder [52]; this finding is consistent with our results and indicates that RNF122 might be involved in the development of T1D. A previous investigation found that autophagy regulating TP53INP2 gene expression was associated with development of T1D [53]. Polymorphic gene CRTC2 is well known for its critical role in type 2 diabetes [54], but this polymorphic gene might be responsible for advancement of T1D.
GO and REACTOME pathway enrichment analyses were performed to explore interactions among the DEGs. These DEGs were mainly enriched in cell-cell signaling, integral component of plasma membrane, signaling receptor activity, signaling by GPCR, vesicle fusion, whole membrane, lipid binding and innate immune system. Altered MYH6 gene was known to play a role in congenital heart defects [55], but this gene might be induces T1D in patients with congenital heart defects. Mc-Kenna et al [56] found that AVP (arginine vasopressin) play essential roles in the T1D. Yue et al [57] concluded that high GRIA4 expression was correlated with abdominal aortic aneurysm progression, but this gene might be answerable for advancement of T1D in patients with abdominal aortic aneurysm. Kochetova el at [58] found that GRIN2B is a significant biomarker for type 2 diabetes compared to healthy controls, but it was confirmed for the first time in our study that GRIN2B expression in T1D indicates a good prognosis. Recent investigations demonstrated that DCC (DCC netrin 1 receptor) gene can mediate angiogenesis and plays an important role in diabetic kidney disease [59]. Ruiz de Azua et al [60] indicated that RGS4 was associated progression of type 2 diabetes, but this gene might be essential for T1D progression. A previous investigation demonstrated that GJA1 was associated with the progression of atrial fibrillation [61], but might be important gene signatures of T1D in patients with atrial fibrillation. Faienza et al [62] and Galán et al [63] demonstrated that, GREM1 and EGFR (epidermal growth factor receptor) might be important for predicting treatment response in T1D. TRHR (thyrotropin releasing hormone receptor) is considered a potential biomarker of hypertension [64], but this gene might be associated with development of T1D in patients with hypertension. PTPRT (protein tyrosine phosphatase receptor type T) has been reported to serve a role in obesity associated insulin resistance [65], but this gene might be involved in progression of T1D. Research has demonstrated that FCAMR (Fc fragment of IgA and IgM receptor) gene could contribute to progression of atherosclerosis [66], but this gene might be crucial for progression of T1D in patients with atherosclerosis. Sun et al. [67] revealed that CCL19 was associated with diabetic nephropathy in patients with T1D. DSP  (desmoplakin) has been reported to be expressed in cardiomyopathy [68], but this gene might be linked with progression of T1D in patients with cardiomyopathy. Xu et al. [69] have demonstrated that ITIH4 are related with coronary heart disease, but this gene might be responsible for advancement of T1D in patients with coronary heart disease. Miyashita et al [70] demonstrate that GAB2 plays a role in Alzheimer disease progression, but this gene might be crucial for T1D in patients with Alzheimer disease. McCann et al. [71] showed that IGF2R played an important role in T1D. Previous studies reported that the expression of MYADM (myeloid associated differentiation marker) induced hypertension [72], but this gene might be liable for advancement of T1D in patients with hypertension. Fan et al. [73] and Chan et al. [74] revealed that TPCN2 and APOL1 were associated with type 2 diabetes, but these genes might be responsible for development of T1D. Previous investigation had confirmed that MEFV (MEFV innate immuity regulator, pyrin) play critical roles in ischemic heart disease [75], but this gene might be essential for progression of T1D in patients with ischemic heart disease. Wang et al [76] state that the expression of DTX4 is important event in obesity, but this gene might be linked with progression of T1D in patients with obesity. We analyzed the protein-protein interactions (PPI) and modules of the DEGs involved in T1D. Table 5 summarizes the PPI network hub genes (five up regulated and five down regulated) that were identified in the T1D, which included EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN. A recent study showed that protein expression levels of CAP2 were up regulated in cardiomyopathy patients [77], but this gene might be responsible for T1D in  [78] and KIF1A [79] were reported to be associated with T1D. Recent research suggested that PXN (paxillin) is involved in hypertension [80], but this gene might be associated with progression of T1D in patients with obesity. POLR2A, PRKACA, GABARAP, TLN1 and CIITA (class II major histocompatibility complex transactivator) might be considered as a novel biomarkers associated with the development of T1D. Target gene -miRNA regulatory network and target gene -TF regulatory network analysis demonstrated that DEGs interacted directly or indirectly. The more edges associated with genes, miRNAs and TFs indicated the more potential selection for the targets. Table 6 summarizes the target gene -miRNA regulatory network and target gene -TF regulatory network (target genes, miRNAs and TFs) that were identified in the T1D, which included GRIN2B, EGFR, DKK1, GJA1, RGS4, TLN1, IGF2R, POLR2A, ARHGAP1, HIP1, RGS4, EYA1, CCL19, PRL, PRKACA, GAB2, HIP1, PXN , RGL2, hsa-mir-4257, hsa-mir-564, hsa-mir-587, hsa-mir-941, hsa-mir-561-3p, hsa-mir-4300, hsa-mir-5694, hsa-mir-378b, hsa-mir-3918, hsa-mir-6719-3p, FOXD1, GATA2, FOXL1, TP53, JUND, STAT3, TFAP2A, KLF5, PPARG and HINFP. STAT3 had been reported to be involved in the pathogenesis of T1D [81]. Polymorphic gene GATA2 has been reported to be crucial for the progression of coronary artery disease [82], but this gene might be responsible for advancement of T1D in patients with coronary artery disease. PRKACA (protein kinase cAMP-activated catalytic subunit alpha), DSP (desmoplakin), hsa-mir-4257, hsa-mir-564, hsa-mir-4300, hsa-mir-5694, RGS4, FOXD1, EYA1, TFAP2A and GAB2 might be considered as a novel biomarkers associated with the development of T1D.
In addition, we also performed validation of hub genes by ROC analysis and RT-PCR. Results showed that these hub genes differentiated T1D group from healthy donors group, and may be candidates for diagnostic biomarkers and new therapeutic target. Moreover, EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN are involved in progression of T1D.
Among all few molecules of TSIO8, TSPZ15, TSPZP24, TBIO30, TBIO35 (Fig.7) obtained excellent binding score of 11.084 with PDB code 4 K41, the values are depicted in Table 7. The molecule TSIO8 (Fig. 8) has highest binding score its interaction with protein 4 K41, and formed hydrogen bond interaction of 4″' -NH 2 group with CYS-217, the hetero atom 1 oxygen and 3″ aromatic hydroxyl group (−OH) formed hydrogen bonding interactions with same amino acid LYS-213. The 3″ hydroxyl group also farmed hydrogen bond interaction with ASP-157 respectively. The sulphur of thiazolidinone ring 1′-S formed two hydrogen bond interactions with different amino acids LEU-16 & GLY-15 and 4′-C ring carbonyl formed interaction with GLY-302 respectively. The molecule also formed pi-pi interactions of electrons of aromatic ring with TYR-306, and pi-sigma interaction with GLU-214 and 2′ C=O of thiazolidine ring formed unfavourable interaction with calcium 401 (Ca) following 2′ C=O formed carbon-hydrogen interaction with GLY-13, All interactions with amino acids and metal are depicted by 3D (Fig.9) and 2D (Fig.10) figures.
In conclusion, the present investigation was designed to identify DEGs that may be involved in the progression of T1D. A total of 284 DEGs and 10 hub genes were identified and might be regarded as diagnostic biomarkers and new therapeutic target for T1D. Together, EGFR, GRIN2B, GJA1, CAP2, MIF, POLR2A, PRKACA, GABARAP, TLN1 and PXN might be effective targets in T1D, but more experimental investigations and clinical trials are needed.