Exploring the gut microbiota in patients with pre-diabetes and treatment naïve diabetes type 2 - a pilot study

Background Compared to their healthy counterparts, patients with type 2 diabetes (T2D) can exhibit an altered gut microbiota composition, correlated with detrimental outcomes, including reduced insulin sensitivity, dyslipidemia, and increased markers of inflammation. However, a typical T2D microbiota profile is not established. The aim of this pilot study was to explore the gut microbiota and bacteria associated with prediabetes (pre-T2D) patients, and treatment naïve T2D patients, compared to healthy subjects. Methods Fecal samples were collected from patients and healthy subjects (from Norway). The bacterial genomic DNA was extracted, and the microbiota analyzed utilizing the bacterial 16S rRNA gene. To secure a broad coverage of potential T2D associated bacteria, two technologies were used: The GA-map® 131-plex, utilizing 131 DNA probes complementary to pre-selected bacterial targets (covering the 16S regions V3-V9), and the LUMI-Seq™ platform, a full-length 16S sequencing technology (V1-V9). Variations in the gut microbiota between groups were explored using multivariate methods, differential bacterial abundance was estimated, and microbiota signatures discriminating the groups were assessed using classification models. Results In total, 24 pre-T2D patients, 18 T2D patients, and 52 healthy subjects were recruited. From the LUMI-Seq™ analysis, 10 and 9 bacterial taxa were differentially abundant between pre-T2D and healthy, and T2D and healthy, respectively. From the GA-map® 131-plex analysis, 10 bacterial markers were differentially abundant when comparing pre-T2D and healthy. Several of the bacteria were short-chain fatty acid (SCFA) producers or typical opportunistic bacteria. Bacteria with similar function or associated properties also contributed to the separation of pre-T2D and T2D from healthy as found by classification models. However, limited overlap was found for specific bacterial genera and species. Conclusions This pilot study revealed that differences in the abundance of SCFA producing bacteria, and an increase in typical opportunistic bacteria, may contribute to the variations in the microbiota separating the pre-T2D and T2D patients from healthy subjects. However, further efforts in investigating the relationship between gut microbiota, diabetes, and associated factors such as BMI, are needed for developing specific diabetes microbiota signatures.


Background
Type 2 diabetes (T2D) is a significant global health challenge, constituting over 90% of diabetes cases worldwide, and with more than 700 million adults estimated to be impacted by 2045 [1].Often there is an extended pre-diagnostic period, and a large proportion of people with T2D are thought to be undiagnosed.T2D pathophysiology involves gradually rising blood glucose levels (hyperglycemia) due to increasing insulin resistance and/or decreasing beta cell function, and is strongly associated with overweight and obesity and/ or central adiposity [1,2].Prediabetic individuals have blood glucose levels higher than normal but below the threshold for T2D, are often overweight, and have an elevated risk of T2D and cardiovascular disease.
Gut microbiota plays a pivotal role in metabolism, immunomodulation and overall human health, and disruptions to the balance of this community are associated with many diseases, including inflammatory bowel disease (IBD) and irritable bowel syndrome (IBS), and metabolic diseases such as T2D [3].Abundant evidence in both animal models and humans with T2D point to an altered or dysbiotic gut microbiota composition as compared to that found in healthy individuals [4][5][6][7][8][9].Individuals with an unbalanced gut microbiota composition may exhibit a plethora of detrimental health outcomes: higher BMI, increased fat mass, reduced insulin sensitivity, dyslipidemia, and an increased inflammatory state [10].It has been proposed that the intestinal bacteria and the metabolites they produce play a role in inducing a harmful chronic low-grade inflammatory state and subsequent development of insulin resistance [3,9].However, so far, no typical microbiota profile for T2D has been identified.
Over the last decade, the most common approach for microbiota profiling has been to target the nine hypervariable regions (V1-V9) of the bacterial 16S rRNA gene.This relatively short (∼1,500 bp) gene region provides phylogenetic signatures on different taxonomic levels.The hypervariable regions are surrounded by highly conserved sequences, which are used for primer design.
The aim of this pilot study was to explore the gut microbiota and bacteria associated with pre-T2D patients, and newly diagnosed, treatment naïve T2D patients, compared to healthy subjects.To secure a broad coverage of potential T2D associated bacteria, two technologies were used: The GA-map ® technology platform 131-plex (GA-map ® 131-plex), utilizing 131 DNA probes complementary to pre-selected bacterial targets (covering the 16S rRNA gene regions V3-V9), and the Long 16S using Unique Molecular Identifiers -Sequencing (LUMI-Seq ™ ) platform, a full-length 16S sequencing technology (V1-V9).

Study population and sample collection
Adult patients diagnosed with either pre-T2D (n = 24) or T2D (n = 18) were recruited, and native fecal samples collected, by three diabetes clinics in Norway (Østerås, Sandnes and Tananger).In addition, healthy adult subjects (n = 52) were recruited, and native fecal samples collected by Oslo Metropolitan University (OsloMet) in Oslo, Norway.The study was approved by the regional Norwegian ethical committee (REC South-East, Norway), and informed consent was obtained from all participants.All samples and information were de-identified before analysis.The samples collected by the clinics were sent non-frozen to Genetic Analysis (GA), Oslo, Norway by mail and frozen upon delivery.The samples collected by OsloMet were frozen upon delivery to the university, before transferal to GA (frozen, on dry ice).All fecal samples were frozen (between -40 °C and -80 °C) within 5 days after collection before further processing.
A case record form was completed for all participants.The T2D patients were all newly diagnosed and treatment-naïve, except for one patient (> 5 months since the last dose of metformin).Criteria for the inclusion of T2D patients included analysis of blood glucose levels, specifically, hemoglobin A1c (HbA1c) ≥ 6.5%.Inclusion criteria for the pre-T2D patients included HbA1c of 6.0-6.4%.Criteria for the inclusion of healthy subjects included no history of diabetes or pre-diabetes, and HbA1c < 6.0%.Exclusion criteria for all groups included recent use of antibiotics (last 4 weeks), and a positive fecal calprotectin (F-cal) test (> 200 mg/kg).See Table 1 for characteristics of the study population.Samples from 86 subjects (16 T2D, 22 pre-T2D and 48 healthy) were analyzed with LUMI-Seq ™ and included in the downstream data analysis (4 healthy, 2 pre-T2D and 2 T2D excluded after sequencing due to low taxa counts).Samples from 78 subjects (18 T2D, 22 pre-T2D and 38 healthy) were analyzed with the GA-map ® 131-plex, after exclusions (11 healthy subjects excluded due to lack of sufficient number of wells in the plate setup, and additional 5 subjects (2 pre-T2D, 3 healthy) due to F-cal > 200 mg/kg).

Sample processing and analysis
Total bacterial genomic DNA extraction was performed by GA, using a protocol previously described [11].Briefly, the extraction was performed by fecal homogenization (using stirring rod) and mechanical cell lysis (FastPrep-96 ™ , MP Biomedicals), followed by chemical/enzymatic heat lysis and automated DNA extraction using a MagMAX ™ Express-96 or KingFisher ™ Flex (Thermo Fisher Scientific) in combination with the mag ™ maxi reagent kit (LGC Genomics GmbH).After extraction, DNA samples were aliquoted, and aliquots were shipped to BIOASTER in Lyon, France.The DNA samples were analyzed using the LUMI-Seq ™ platform (BIO-ASTER, France) and by the GA-map ® 131-plex (Genetic Analysis, Norway) (Fig. 1).

GA-map ® 131-plex (GA-map ® Technology Platform 131-plex)
The GA-map ® 131-plex utilizes a pre-targeted approach, based on DNA probe hybridization to bacterial 16S rRNA gene targets, to identify and characterize bacterial profiles from fecal samples.This research-only panel of DNA probes (bacterial markers) was established to cover major bacterial observations made from the literature relating to the microbiota in healthy, IBS and IBD [11].Disruptions to the regular community of bacteria have also been associated with other conditions, such as diabetes type 2 [3].Each bacterial marker was designed to identify a specific bacterial species or group (e.g., phylum, class, genus), based on their 16S rRNA gene sequence [11].As such, a large number of bacteria are detected at different taxonomic levels.The bacterial markers were intensively tested in-silico (target detection, non-target exclusion, cross-labelling, self-labelling, and cross hybridization), and against bacterial DNA from a selection of culturable bacterial species in vitro.
The same laboratory procedures as described for the standardized and CE-marked GA-map ® technology

Table 1 Study population -main characteristics
The characteristics of the participants in each of the study groups, analyzed and included in the LUMI-Seq ™ and GA-map ® 131-plex data analysis (the 25 th   platform Dysbiosis Test were followed [11], with a few modifications (Fig. 1).Briefly (after DNA extraction), the 16S rRNA gene hypervariable regions V3-V9 are amplified by the polymerase chain reaction (PCR) using a universal primer pair [11].The amplified DNA is hybridized to a 131-plex panel of DNA probes, complementary to regions within the amplicon specific for the targeted bacteria.Hybridized probes are labeled with biotin through single nucleotide extension before hybridization of the probe-set and solid-phase (carboxylated magnetic beads), as well as addition of a detection fluorophore.After washing of the samples, the fluorescent signal (probe signal intensity), corresponding to the abundance of target bacteria in the sample, is detected and quantified using a Luminex ® 200 ™ instrument (Luminex Corp., Austin, TX, USA).

LUMI-Seq ™ platform (Long 16S using Unique Molecular Identifiers-Sequencing)
Synthetic-long read sequencing is now emerging in the microbiome space as a methodology for generation of reliable quality long fragments from Illumina short reads [12][13][14].In that context, BIOASTER recently developed LUMI-Seq ™ to recover thousands of full-length 16S sequences from complex samples [15].For this study, the standard LUMI-Seq ™ workflow was followed (Fig. 1), in which each 16S molecule within each sample was first barcoded using unique molecular identifiers (UMI) for their tracking during the entire workflow.Then, the molecules were amplified to make multiple copies and to increase the signal.The PCR products were then fragmented while keeping the UMI information on all pieces.The fragments were sequenced using the MiSeq ™ platform (Illumina, San Diego, California), in 2 × 200 bp.Preprocessing of the raw data was performed to remove low-quality ends of the reads with fastp [16].Read pairs sharing the same UMI and the same sample barcode were grouped together in silico to make long accurate consensus sequences.Assembly was performed with SPAdes [17].The reconstruction of full-length 16S sequences was performed using V-Revcomp [18] and V-Xtractor [19].On average, 4,812 full-length 16S sequences were reconstructed per sample (range: 1,124-10,511).Based on the UMI redundancy, the LUMI-Seq ™ error rate was assessed at 0.0047%.

Statistical analysis GA-map ® 131-plex data analysis
To account for variable signal levels, the raw signal data (fluorescent intensities) was normalized using a hybridization control, as previously described [11], and background noise was subtracted.Variations in the bacterial profiles between and within the groups (pre-T2D, T2D and healthy) were explored using the non-parametric multivariate methods principal component analysis (PCA) and permutational multivariate analysis of variance (PerMANOVA), using Euclidean and Bray-Curtis methods, respectively.Possible confounding effects on the data due to the clinical variables (e.g., age, BMI, F-cal) were also explored, using the above-mentioned methods.
The non-parametric Wilcoxon Rank Sum Test with Benjamini-Hochberg correction (Stats R package [20]) was used to determine significant difference (adjusted p < 0.1) in abundance of the bacterial markers.Microbiota signatures for separation of pre-T2D or T2D and healthy subjects were calculated using the caret (classification and regression training) R-package [20].To evaluate the robustness and performance of the classification models, a tenfold cross-validation was performed.90% of the cohort was used for model training and 10% for model testing.A parameter, "importance", ranging from 0 (no contribution) to 100 (maximum possible contribution for training of the model), was reported for each bacterial marker forming the basis for the microbiota signatures.The best performing model, multi-step adaptive elastic-net (MSA-ENet) [21], was chosen.A ROC curve was built, presenting the mean value of the area under the curve (AUC).

LUMI-Seq ™ data analysis
After obtaining full-length 16S sequences, QIIME scripts (version 2019.7.0) were used for collapse of 100% identical sequences [22].Each unique sequence was then assigned to a taxonomy by mapping to a custom 16S database made by BIOASTER, as well as the widely used SILVA reference database.However, due to the lower number of sequences assigned down to the species level, the BIOASTER database was chosen for the downstream analyses (76% with the 16S database vs. 60% with SILVA).Taxa with a total count (summed over all samples) lower than five counts were removed.After normalization of taxa counts, eight samples (four healthy, two pre-T2D and two T2D) showed lower counts and were thus discarded from the analysis to avoid interpretation bias.
Principal coordinates analysis (PCoA) and Per-MANOVA, using the Bray-Curtis method, was used to explore variations in bacterial profiles between and within the groups (pre-T2D, T2D and healthy subjects).The above-mentioned methods were also used to explore any confounding effects on the data due to the clinical variables (e.g., age, sex, BMI).Differential abundance analyses were conducted with the DESeq2 package from Bioconductor [23,24].Based on a Wald test, taxa with an absolute log-fold change larger than a 0.5 threshold and an adjusted p-value lower than 0.05 were considered as differentially abundant.
Microbiota signatures for the separation of pre-T2D or T2D and healthy subjects were calculated using one of the most commonly used classifiers, Random Forest [25], in a fivefold cross-validation.The sample assignation was repeated until the three groups were equally distributed in the 5-fold range based on the Fisher's exact test.As the majority of the taxa and genes were not associated with the groups, a univariate selection (Welch test, p-value < 1% or 10%) was performed to reduce the dimensionality.The cross-validation procedure was repeated 300 times.

Differential abundance analysis and microbiota signatures
As found by the combined results from the GA-map ® 131-plex and LUMI-Seq ™ data analysis, mainly bacteria from the phyla Bacillota (Firmicutes), were differentially abundant in pre-T2D and T2D compared to the healthy group (Tables 2 and 3), also confirmed by classification modelling (Figs.3A&B and 4A&B).Note that from the GA-map ® 131-plex analysis, no significant abundance differences were found when comparing the T2D group and the healthy group.
Oppositely, e.g., the Clostridia Dorea spp., gas-and SCFA-producing bacteria [31], was more abundant in pre-T2D according to the GA-map ® 131-plex analysis (Table 2), and, based on classification modeling, Dorea spp.contributed to the separation of pre-T2D and healthy (Fig. 3A).Similarly, as found from the LUMI-Seq ™ analysis, the gas-and SCFA-producing Dorea formicigenerans and D. longicatena [31] were more abundant (adjusted  p < 0.05) in T2D (Table 3), and Dorea was among the top ten contributors for the separation of T2D and healthy according to classification modeling (Fig. 4B).
Further, based on the LUMI-Seq ™ analysis (Table 3), Bacilli, including the lactic acid producing Streptococcus (Lactobacillales order), numerous being opportunistic [36], as well as Eisenbergiella massiliensis (Clostridia class), a potential SCFA producer that may be associated with obesity [37,38], and Neglecta timonensis (Clostridia class), possibly associated with T2D [39], were more abundant in pre-T2D than in healthy.According to classification modeling, Lactobacillales, that may be both commensal and opportunistic [40], was among the top ten contributors for separation of pre-T2D and healthy (Fig. 4A), and the lactate producing Lactobacillus [41] for the separation of T2D and healthy (Fig. 4B).From the GA-map ® 131-plex classification modeling, Bacilli (including Enterococcus faecalis), a diverse bacterial class containing both commensal and opportunistic bacteria [40], contributed to the separation of pre-T2D and healthy (Fig. 3A), and E. faecalis (Lactobacillales order), as well as Bacillota, a highly diverse and abundant group of gut bacteria [40], to the separation of T2D and healthy (Fig. 3B).
Additionally, the opportunistic pro-inflammatory bacteria Pseudomonadota (Proteobacteria) and Shigella spp./Escherichia spp.(Gammaproteobacteria class) [42] were more abundant in pre-T2D according to the GAmap ® 131-plex analysis (Table 2).Based on classification modeling, the Gammaproteobacteria Aeromonas sp., and the Bacillota Clostridium sp., a potential pathogen [40], were among the top ten contributors for separation of pre-T2D and healthy (Fig. 3A), while Pseudomonadota, Campylobacter sp.(Epsilonproteobacteria), typical pathogenic bacteria [43], and the Gammaproteobacteria Haemophilus sp./Mannheimia sp., contributed to the separation of T2D and healthy (Fig. 3B).Pseudomonadota are also found by the LUMI-Seq ™ classification to be among the top ten contributors for the separation of pre-T2D and healthy (Fig. 4A).
For the GA-map ® 131-plex classification modeling, area under the curve (AUC) values of 0.88 or 0.77 were achieved for the separation of pre-T2D and healthy subjects, or T2D and healthy subjects, respectively.For the LUMI-Seq ™ classification modeling, AUC values of 0.78 or 0.64 were achieved for the separation of pre-T2D and healthy subjects, or T2D and healthy subjects, respectively.

Discussion
The primary aim of this pilot study was to explore the gut microbiota and associated bacteria in pre-T2D and treatment naïve T2D patients, compared to healthy subjects.LUMI-Seq ™ and the GA-map ® 131-plex represent two methods which can be used for bacterial profiling and to identify bacterial biomarkers.The results represent a preliminary discovery of possible diabetes specific bacterial patterns.
Differences in the abundance of SCFA producing bacteria in the phylum Bacillota (Firmicutes) between healthy subjects and pre-T2D and T2D patients were revealed.SCFA producing Bacillota were also among the top ten contributors for the separation of pre-T2D and T2D from healthy using classification models.For example, the SCFA producing bacteria F. prausnitzii and Roseburia were found to be less abundant, and the gas-and SCFA-producing Dorea more abundant, in pre-T2D and/ or T2D, and were also among the ten most discriminative bacteria separating the groups from healthy.Additionally, SCFA producing bacteria such A. rectalis (Eubacterium rectale) and H. biformis (Eubacterium biforme), were indicated by classification models to contribute to this separation.
Further, also typical opportunistic bacteria contributed to the differentiation between the groups.For instance, bacteria from the Bacilli class, such as the opportunistic Streptococcus, and typical opportunistic, pro-inflammatory bacteria from the phylum Pseudomonadota (Proteobacteria) were increased in pre-T2D, and as found by classification models, contributed to the separation of pre-T2D and T2D from healthy.
Bacteria and their microbial products can impact the development of T2D by various and connected mechanisms, for instance, by affecting gut permeability, inflammatory regulation, and glucose metabolism (reviewed in [4,8]).Butyrate producing bacteria and the metabolite butyrate are important for promoting anti-inflammatory properties, and maintaining regular gut functions, and may also improve insulin resistance and glucose tolerance [51,52].In contrast, factors such as bacterialderived lipopolysaccharide (LPSs), e.g., coming from Pseudomonadota, as well as the increased abundance of opportunistic bacteria in itself, can promote inflammation, and may contribute to the induction of a low-grade inflammatory state and insulin resistance [53,54].
A strength of this study is that DNA was extracted by the standardized GA-map ® method, and DNA samples split, before analysis by the two analysis platforms, as the choice of extraction method may influence the end results [55,56].Thus, the use of one extraction method enables easier comparisons of the downstream results.Also, the combination of mechanical and chemical lysis (as utilized by the GA-map ® method), has been shown to enhance the extraction of both Gram-negative and -positive bacteria, and to increase bacterial DNA yields [55,56].
Another strength is the inclusion of treatment naïve T2D patients, as it has been shown that the use of the common diabetes drug Metformin may affect the gut microbiota [57,58].Including prediabetic patients and treatment naïve T2D patients may make it more straightforward to understand the connections between the disease development and gut microbiota-by avoiding the effect of treatment or prolonged disease.Further, participants included in this study had not used antibiotics recently, also known to influence the gut bacterial composition [59,60].
Even though the same criteria for age and BMI were used for the inclusion of patients and healthy subjects, the healthy subjects were younger and had lower BMI (and included more females).Also, the pre-T2D group had a slightly higher median F-cal, and the LUMI-Seq ™ analysis included 5 subjects that should have been excluded due to F-cal > 200.While PerMANOVA of the GA-map ® 131-plex data, showed that BMI had a significant effect on the data, F-cal levels did not.This is perhaps not surprising, as diabetes is strongly associated with higher BMI/overweight [1,2].
The GA-map ® 131-plex detects 131 DNA probes representing pre-selected 16S rRNA bacterial targets, while the LUMI-Seq ™ platform entails full length 16S rRNA sequencing.Differences in the targeted 16S regions for the GA-map ® 131-plex and LUMI-Seq ™ (V3-V9 vs. V1-V9, respectively), as well as the selected targets and the pre-determined taxonomic levels of the GA-map ® method, may lead to differences in the phylogenetic resolution.This may be one reason for the two method's limited overlap in genera and species of potential T2D associated bacteria.For instance, Turicibacter sanguinis, found elevated in T2D by LUMI-Seq ™ , cannot be detected directly by the GA-map ® 131 plex -however, may be covered by the broad Bacillota marker.Another limitation may be the use of different statistical methods, chosen to fit the dataset in question.For instance, for the GA-map ® 131-plex data, bacterial abundances were considered as significantly different if p.adj.< 0.1, as a limit of 0.05 gave limited results.
It is critical for researchers to take into consideration the strengths and limitations of different platforms and choose a system appropriate for their experimental design.The GA-map ® platform offers the advantage of a standardized method, utilizing a pre-selected target approach, allowing for a reduced assay turn-around time and less resource-demanding data analysis.At the level of genera, the GA-map ® technology exhibit strong correlation to MiSeq amplicon sequencing [11].Even though the LUMI-Seq ™ follows similar protocols as the standard Illumina sequencing, it is difficult to assess differences due to technical variations since no comparative study has been performed.However, the low error rate and the high number of sequences assigned to the species level in this study illustrates that the LUMI-Seq ™ technology constitutes a robust approach for microbiota profiling studies.
This pilot study focused on a limited number of Scandinavian (Norwegian) participants only, and so the results and interpretation should be taken with caution.The recruitment and inclusion of treatment naïve T2D patients is especially challenging due to, following standard guidelines, the limited time between diagnosis and start of treatment.The lower number of pre-T2D and T2D patients may have affected the outcome and can be one explanation for the pre-T2D group seemingly having the most distinct microbiota composition.Differences in diet may be another factor affecting the results, as no detailed description of the diet was recorded.To strengthen the foundation for developing bacterial signatures for Type 2 diabetes, future studies should be larger, international, multi-site studies, to account for variation in inter-individual microbiota.Possible confounding factors that ought to be controlled closely include medication-use, diet, and lifestyle [59][60][61][62][63].
Multiple studies have provided compelling evidence of an altered state in gut microbiota composition in pre-T2D and T2D individuals as compared to healthy subjects, with a strong correlation to insulin resistance and β-cell dysfunction, detected even prior to glucose abnormalities in these individuals [46,[64][65][66].The implication of an altered gut microbiota composition in diabetic and prediabetic patients was also supported by this study.

Conclusions
This pilot study revealed that differences in the abundance of short chain fatty acid (SCFA) producing bacteria, and an increase in typical inflammation-associated or potentially pro-inflammatory or opportunistic bacteria, may contribute to the variations in the microbiota separating the pre-T2D and T2D patients from the healthy subjects.However, further efforts in investigating the relationship between gut microbiota, diabetes, and associated factors such as BMI, are needed for developing specific diabetes microbiota signatures.

Fig. 1
Fig. 1 Study workflow.Starting with collection of fecal samples, and genomic DNA extraction at Genetic Analysis (GA), through analysis of the DNA samples on the two different platforms: LUMI-Seq ™ (Long 16S using Unique Molecular Identifiers -Sequencing) at BIOASTER and the GA-map ® Technology Platform 131-plex at GA

Fig. 2
Fig. 2 Principal Component Analysis (PCA) of GA-map ® 131-plex data.The PCA score plot illustrates the similarities and variations of the groups healthy (n = 38), pre-T2D (n = 22) and T2D (n = 18), based on scaled and log-transformed normalized signal strength data.90% confidence ellipses are shown for each of the groups