Investigation of candidate genes and mechanisms underlying obesity associated type 2 diabetes mellitus using bioinformatics analysis and screening of small drug molecules

Background Obesity associated type 2 diabetes mellitus is a metabolic disorder ; however, the etiology of obesity associated type 2 diabetes mellitus remains largely unknown. There is an urgent need to further broaden the understanding of the molecular mechanism associated in obesity associated type 2 diabetes mellitus. Methods To screen the differentially expressed genes (DEGs) that might play essential roles in obesity associated type 2 diabetes mellitus, the publicly available expression profiling by high throughput sequencing data (GSE143319) was downloaded and screened for DEGs. Then, Gene Ontology (GO) and REACTOME pathway enrichment analysis were performed. The protein - protein interaction network, miRNA - target genes regulatory network and TF-target gene regulatory network were constructed and analyzed for identification of hub and target genes. The hub genes were validated by receiver operating characteristic (ROC) curve analysis and RT- PCR analysis. Finally, a molecular docking study was performed on over expressed proteins to predict the target small drug molecules. Results A total of 820 DEGs were identified between healthy obese and metabolically unhealthy obese, among 409 up regulated and 411 down regulated genes. The GO enrichment analysis results showed that these DEGs were significantly enriched in ion transmembrane transport, intrinsic component of plasma membrane, transferase activity, transferring phosphorus-containing groups, cell adhesion, integral component of plasma membrane and signaling receptor binding, whereas, the REACTOME pathway enrichment analysis results showed that these DEGs were significantly enriched in integration of energy metabolism and extracellular matrix organization. The hub genes CEBPD, TP73, ESR2, TAB1, MAP 3K5, FN1, UBD, RUNX1, PIK3R2 and TNF, which might play an essential role in obesity associated type 2 diabetes mellitus was further screened. Conclusions The present study could deepen the understanding of the molecular mechanism of obesity associated type 2 diabetes mellitus, which could be useful in developing therapeutic targets for obesity associated type 2 diabetes mellitus.


Introduction
Obesity associated type 2 diabetes is one of the most common metabolic disorder worldwide [1]. Type 2 diabetes mellitus is characterized by insulin deficiency due to pancreatic β-cell inactivation and insulin resistance [2]. Genetic factors, hyperinsulinemia, atherogenic dyslipidemia, glucose intolerance, hypertension, prothrombic state, hyperuricemia and polycystic ovary syndrome are the key risk factors for the occurrence and progression of type 2 diabetes mellitus [3]. Obesity associated type 2 diabetes mellitus affects the vital organs such as heart [4], brain [5], kidney [6] and eye [7]. Etiology and advancement of obesity associated type 2 diabetes mellitus is more complex and still understandable. Therefore, it is essential to understand the precise molecular mechanisms associated in the progression of obesity associated type 2 diabetes mellitus and thus to establish valid diagnostic and therapeutic strategies.
Current evidence has shown that genetic predisposition plays a key role in the advancement of obesity associated type 2 diabetes mellitus [8]. Recently, several genes and pathways have been found to participate in the occurrence and advancement of obesity associated type 2 diabetes mellitus [9], including FGF21 [10], proopiomelanocortin (POMC) [11], PI3K/AKT pathway [12] and JAK/STAT pathway [13]. However, the current knowledge is insufficient to explain and understand how these crucial genes and signaling pathways are associated with advancement of obesity associated type 2 diabetes mellitus. Therefore, there is a great need to find new prognostic and diagnostics biomarkers, and to advance novel techniques to enlighten the molecular mechanisms controlling the progression of obesity associated type 2 diabetes mellitus.
Bioinformatics analysis of expression profiling by high throughput sequencing data has shown great promise to discover potential key genes and signaling pathways with significant roles in metabolic disorder [14], to identify new prognostic and diagnostics biomarkers, and biological processes implicated in obesity associated type 2 diabetes mellitus. In this investigation, using bioinformatics analysis, we aimed to investigate expression profiling by high throughput sequencing data to determine differentially expressed genes (DEGs) and significant pathways in obesity associated type 2 diabetes mellitus.
After searching the Gene Expression Omnibus (GEO) database [15], we selected RNA sequencing dataset GSE143319 for identifying DEGs for obesity associated type 2 diabetes mellitus. This dataset gives more information about obesity associated type 2 diabetes mellitus elevates patient's risk of nonalcoholic steatohepatitis (NASH), cardiovascular disease and cancer. Gene Ontology (GO) and pathway enrichment analysis were performed. A hub and target genes were identified from protein-protein interaction (PPI) network, modules, miRNA-target genes regulatory network and TF-target gene regulatory network. Subsequently, hub genes were validated by using receiver operating characteristic (ROC) curve and RT-PCR analysis. Finally, molecular docking studies performed for prediction of small drug molecules.

RNA sequencing data
The expression profiling by high throughput sequencing dataset GSE143319 deposited by Ding et al [16] into the GEO database were obtained on the GPL20301 platform (Illumina HiSeq 4000 (Homo sapiens)). This dataset is provided for 30 samples, including 15 samples of metabolically healthy obese and 15 samples of a metabolically unhealthy obese.

Identification of DEGs
The limma [17] in R bioconductor package was utilized to screen DEGs between metabolically healthy obese and metabolically unhealthy obese. These DEGs were identified as important genes that might play an important role in the development of obesity associated type 2 diabetes mellitus. The cutoff criterion were |log fold change (FC)| > 0.2587 for up regulated genes, |log fold change (FC)| < -0.2825 for down regulated genes and adjusted P value < 0.05.

GO and pathway enrichment analyses
ToppGene (ToppFun) (https://toppgene.cchmc.org/ enrichment.jsp) [18], which is a useful online database that integrates biologic data and provides a comprehensive set of functional annotation information of genes as well as proteins for users to analyze the functions or signaling pathways. GO (https://geneontology.org/) [19] enrichment analysis (biologic processes [BP], cellular components [CC], and molecular functions [MF]) is a strong bioinformatics tool to analyze and annotate genes. The REACTOME (https://reactome.org/) [20] is a pathway database resource for understanding high-level gene functions and linking genomic information from large scale molecular datasets. To analyze the function of the DEGs, biologic analyses were performed using GO and REACTOME pathway enrichment analysis via ToppGene online database.

PPI network construction and module analysis
IMEX interactome (https://www.imexconsortium.org/) [21] online PPI database was using to identify the hub gene information in PPI network. Analyzing the interactions and functions of DEGs might give information about the controlling the progression of obesity associated type 2 diabetes mellitus. Cytoscape (version 3.8.2) (www.cytoscape.org) is a bioinformatics platform for constructing and visualizing PPI network [22]. Therefore, the topological properties includes node degree [23], betweenness centrality [24], stress centrality [25], closeness centrality [26] are analyzed in using Java plugin Network Analyzer to obtain hub genes in the PPI network. The plug-in PEWCC1 of Cytoscape was applied to detect densely connected regions in PPI network. The significant modules in the PPI network was selected using PEWCC1 (https://apps.cytoscape.org/apps/ PEWCC1) [27]. The criteria for selection were set as follows: Max depth = 100, degree cut-off = 2, node score cut-off = 0.2, PEWCC1 scores >5, and K-score = 2.  Obesity associated type 2 diabetes mellitus relating miR-NAs and experimentally validated target genes were identified from miRNet database (https://www.mirnet. ca/) [28]. Obesity associated type 2 diabetes mellitus relating miRNAs and target genes were identified through target genes -miRNA regulatory network. Then the target genes -miRNA regulatory network was constructed and visualized by using Cytoscape software.

Target gene -TF network regulatory construction and analysis
Obesity associated type 2 diabetes mellitus relating TFs and experimentally validated target genes were identified from TFs database NetworkAnalyst database (https:// www.networkanalyst.ca/) [29]. Obesity associated type 2 diabetes mellitus relating TFs and target genes were identified through target genes -TF regulatory network. Then the target genes -TF regulatory network was constructed and visualized by using Cytoscape software.

Receiver operating characteristic (ROC) curve analysis
The ROC curve was used to calculate classifiers in bioinformatics applications. To further assess the predictive accuracy of the hub genes, ROC analysis was performed to discriminate metabolically healthy obese from metabolically unhealthy obese. ROC curves for hub genes were generated using pROC in R [30] based on the obtained DEGs and their expression profiling by high throughput sequencing dataset. The area under the curve (AUC) was evaluated and used to compare the diagnostic value of hub genes.

Validation of the expression levels of candidate genes by RT-PCR
Quantitative RT-PCR was conducted to validate the expressions of these hub genes in obesity associated type 2 diabetes mellitus. Total RNAs were extracted from Primary Subcutaneous Pre adipocytes; Normal Human cell line (ATCC® PCS-210-010™) and 3T3-L1 cells (ATCC® CL-173) using TRI Reagent® (Sigma, USA) according to instruction, followed by reverse transcription with Reverse transcription cDNA kit (Thermo Fisher Scientific, Waltham, MA, USA) and cDNA amplification through 7 Flex real-time PCR system (Thermo Fisher Scientific, Waltham, MA, USA). The expressions of hub genes were normalized to against beta actin expression. The data were calculated by the 2 −ΔΔCt method [31]. A primer used in the current investigation was listed in Table 1.

Molecular docking studies
The Surflex-Docking docking studies for the designed molecules were performed using module SYBYL-X 2.0 perpetual software. Using ChemDraw Tools, the molecules were sketched and imported and saved into sdf format using open free software from Babel. The cocrystallised protein structures of CEBPD, TP73, ESR2, TAB1 and MAP 3K5 of its PDB code 3L4W, 2XWC, 1U3Q, 5NZZ & 2CLQwas extracted from Protein Data Bank [32][33][34][35][36]. Together with the TRIPOS force field, GasteigerHuckel (GH) charges were added to all designed derivatives for the structure optimization process. Furthermore, energy minimization was carried out using MMFF94s and MMFF94 algorithm process. The processing of protein was accomplished after the incorporation of protein. The co-crystallized ligand and all water molecules were expelled from the crystal structure; more hydrogen was added and the side chain was optimized.                     TRIPOS force field was used to minimize complexity of structure. The interaction efficiency of the compounds with the receptor was expressed in kcal / mol units by the Surflex-Dock score. The best spot between the protein and the ligand was inserted into the molecular region. The visualization of ligand interaction with receptor is done by using discovery studio visualizer.

Gene ontology and pathway enrichment analyses
DEGs were divided into up regulated genes and down regulated genes. GO and REACTOME pathway enrichment analysis were conducted for DEGs. Results of GO categories were presented by functional groups, which include BP, CC, and MF, and are listed in Table 3. In group BP, up regulated genes enriched in regulation of ion transmembrane transport and oxoacid metabolic process, while the down regulated genes enriched in cell adhesion and response to endogenous stimulus. For group CC, up regulated genes enriched in intrinsic component of plasma membrane and mitochondrion, while down regulated genes enriched in integral component of plasma membrane and supra molecular fiber. In addition, GO results of group MF showed that up regulated genes enriched in transferase activity, transferring phosphorus-containing groups and transporter activity and down regulated genes enriched in signaling receptor binding and molecular transducer activity. Several significant enriched pathways were acquired through REAC TOME pathway analysis ( Table 4). The enriched pathways for up regulated genes included integration of energy metabolism and neuronal system, while, down regulated genes enriched in extracellular matrix organization and GPCR ligand binding.

PPI network construction and module analysis
PPI network complex consisted of 3648 nodes and 6305 edges, wherein node and edge represented gene and interaction between genes (Fig.3a). Moreover, CEBPD, TP73, ESR2, TAB1, MAP 3K5, FN1, UBD, RUNX1, PIK3R2 and TNF were identified as hub genes and are listed in Table 5. In addition, module analysis was conducted to detect the highly connected regions of PPI network, and two significant modules were identified ( Fig.3b and Fig.3c). Further GO and pathway enrichment analysis revealed that genes in these modules were mostly implicated in regulation of ion transmembrane transport, oxoacid metabolic process, intrinsic component of plasma membrane, extracellular matrix organization and supra molecular fiber.

Target gene-TF s regulatory network construction and analysis
The target genes -TF regulatory network was constructed, including 333 TFs and 204 target genes. As shown in the integrated target genes -TF regulatory network (Fig. 5 Table 6.

Molecular docking studies
In the current research, the docking simulation was conducted to recognize the active site conformation and major interactions responsible for complex stability with the binding sites receptor. Drug design software Sybyl X 2.1 was used to perform docking experiments on novel molecules containing thiazolidindioneheterocyclic ring. Molecules containing the heterocyclic ring of thiazolidinedione are constructed based on the pioglitazone structure and are most widely used alone or in conjunction with other anti-diabetic drugs. Obesity associated type 2 diabetes mellitus is a chronic disorder that prevents insulin from being used by the body the way it should. It's said that people with obesity associated type 2 diabetes mellitus have insulin resistance, oral hypoglycaemic agents are used either alone or in combination of two or more drugs. Pioglitazone (Glitazones) are commonly used either alone or in combination in obesity associated type 2 diabetes mellitus. The one protein in each over expressed genes in obesity associated type 2 diabetes mellitus are selected for docking studies. The X-RAY crystallographic structure of one protein from each over-expressed genes of CEBPD, TP73, ESR2, TAB1 and MAP 3K5, and their co-crystallized PDB code of 4LY9, 2XWC, 2IOG, 5NZZ and 5UP3 respectively were selected for docking. The examinations of the designed molecules were performed to recognize the potential molecule. The foremost of the designed molecules obtained C-score greater than 6 and are said to be active. A total of 24 designed molecules few molecules have excellent good binding energy (C-score) greater than 8 respectively. Few of the designed molecules obtained good binding scores such as molecule TZP20, TZPS8, TZP22, TZPS10 (Fig.8)        bonding interactions with amino acids with protein code 2IOG are depicted by 3D (Fig.9) and 2D ( Fig.10) figures.
In conclusion, with the integrated bioinformatics analysis for expression profiling by high throughput sequencing in obesity associated type 2 diabetes mellitus, ten hub genes associated with the pathogenesis and prognosis of obesity associated type 2 diabetes, including CEBPD, TP73, ESR2, TAB1, MAP 3K5, FN1, UBD, RUNX1, PIK3R2 and TNF. These hub genes were associated with progression of obesity associated type 2 diabetes mellitus and first five (CEBPD, TP73, ESR2, TAB1 and MAP 3K5) of them might be linked with targeted therapy. These hub genes might be regarded as new diagnostic and prognostic biomarkers for obesity associated type 2 diabetes mellitus. However, further in-depth investigation (in vivo and in vitro experiment) is necessary to elucidate the biological function of these genes in obesity associated type 2 diabetes mellitus.