Prediction model for the onset risk of impaired fasting glucose: a 10-year longitudinal retrospective cohort health check-up study

Background Impaired fasting glucose (IFG) is a prediabetic condition. Considering that the clinical symptoms of IFG are inconspicuous, these tend to be easily ignored by individuals, leading to conversion to diabetes mellitus (DM). In this study, we established a prediction model for the onset risk of IFG in the Chongqing health check-up population to provide a reference for prevention in a health check-up cohort. Methods We conducted a retrospective longitudinal cohort study in Chongqing, China from January 2009 to December 2019. The qualified subjects were more than 20 years old and had more than two health check-ups. After following the inclusion and exclusion criteria, the cohort population was randomly divided into a training set and a test set at a ratio of 7:3. We first selected the predictor variables through the univariate generalized estimation equation (GEE), and then the training set was used to establish the IFG risk model based on multivariate GEE. Finally, the sensitivity, specificity, and receiver operating characteristic curves were used to verify the performance of the model. Results A total of 4,926 subjects were included in this study, with an average of 3.87 check-up records, including 2,634 males and 2,292 females. There were 442 IFG cases during the follow-up period, including 286 men and 156 women. The incidence density was 26.88/1000 person-years for men and 18.53/1000 person-years for women (P<0.001). The predictor variables of our prediction model include male (relative risk (RR) =1.422, 95 % confidence interval (CI): 0.923-2.193, P=0.3849), age (RR=1.030, 95 %CI: 1.016-1.044, P<0.0001), waist circumference (RR=1.005, 95 %CI: 0.999-1.012, P=0.0975), systolic blood pressure (RR=1.004, 95 %CI: 0.993-1.016, P=0.4712), diastolic blood pressure (RR=1.023, 95 %CI: 1.005-1.041, P=0.0106), obesity (RR=1.797, 95 %CI: 1.126-2.867, P=0.0140), triglycerides (RR=1.107, 95 %CI: 0.943-1.299, P=0.2127), high-density lipoprotein cholesterol (RR=0.992, 95 %CI: 0.476-2.063, P=0.9818), low-density lipoprotein cholesterol (RR=1.793, 95 %CI: 1.085-2.963, P=0.0228), blood urea (RR=1.142, 95 %CI: 1.022-1.276, P=0.0192), serum uric acid (RR=1.004, 95 %CI: 1.002-1.005, P=0.0003), total cholesterol (RR=0.674, 95 %CI: 0.403-1.128, P=0.1331), and serum creatinine levels (RR=0.960, 95 %CI: 0.945-0.976, P<0.0001). The area under the receiver operating characteristic curve (AUC) in the training set was 0.740 (95 %CI: 0.712-0.768), and the AUC in the test set was 0.751 (95 %CI: 0.714-0.817). Conclusions The prediction model for the onset risk of IFG had good predictive ability in the health check-up cohort.


Background
Diabetes mellitus (DM) has become one of the most vital public health challenges faced by all countries in the 21st century and has become an epidemic in recent decades [1,2]. Studies have shown that the global prevalence of DM was 8.5 % in 2014. It is estimated that the number of affected individuals will increase from 422 to 642 million by 2040 [3][4][5]. In China, despite the high incidence of DM, 50 % of patients are undiagnosed [6]. As early as 1997, the American Diabetes Association introduced the concept of impaired fasting glucose (IFG), which is a prediabetic condition [7][8][9]. IFG can be easily overlooked because of the unapparent clinical symptoms [10,11], which makes it more desirable to have suitable risk assessment models to help individuals assess the risk of IFG. Therefore, the prediction model for the onset risk of IFG appears to be particularly important as an assessment tool. In recent years, both domestic and foreign researchers have developed many risk prediction models of DM, but there is a lack of risk prediction models of IFG for the health check-up cohort [12][13][14].
With the strengthening of health awareness, health check-ups have become an important method of health management [15,16]. Health checkup data have accumulated comprehensive health information for many years, which is part of longitudinal data [17]. Owing to the nature of longitudinal data and the purpose of analysis, methods dedicated to longitudinal data analysis should be used [18]. Therefore, this study was based on the Chongqing health check-up longitudinal cohort to establish a prediction model using the generalized estimation equation (GEE) to assess the onset risk of IFG in the regular health check-up cohort and provide a reference for prevention in the health check-up cohort.

Study population
This retrospective, longitudinal cohort study began with a review of the health check-up records of 4,926 subjects (2,634 men and 2,292 women), whose ages ranged from 20 to 85 years, and who had more than two health check-ups at the Medical Examination Centre of the First Affiliated Hospital of Chongqing Medical University from January 2009 to December 2019. The annual health check-ups record included anthropometric measurements and the laboratory measurements. The inclusion criteria were as following: (1) no IFG or no DM or related diseases at baseline; (2) at least two health check-up records and complete physical check-up data; and (3) age ≥20 years. The exclusion criteria were as follows:(1) had IFG or DM or related diseases at baseline; (2) were taking hypoglycemic drugs; and (3) were lost to the follow-up and lacked key information. The cohort observation time was the period from the first health checkup to the onset of IFG (6.1≤IFG≤6.9) or the last health checkup without IFG. The outcome of this study was the presence of IFG; that is, once the subjects had IFG, the subsequent data were not included in the study.
During the check-up period at a certain university, the person in charge of the check-up at the university will go to the Medical Examination Centre of the First Affiliated Hospital of Chongqing Medical University. witnessed by the person in charge of the university, the researchers of the medical examination center will verbally inform the participants about using the check-up records for scientific research in the future, in order to analyse the influencing factors of chronic diseases in Chongqing. The participants agreed and provided verbal informed consent. All participants were assigned a numeric code that was used throughout the study, and all data were stored in a secure database to maintain anonymity. The selection process for the participants in this study is shown in Fig. 1.

Measurements
The health check-up was performed at the Medical Examination Centre of the First Affiliated Hospital of Chongqing Medical University, which has obtained ISO-15,189 standard certification. The health checkup included the following:

Anthropometric measurements
Anthropometric measurements were performed with the subjects wearing light clothes. Waist circumference (WC) was measured with a soft ruler with a minimum scale of 0.1 cm. The measurer stood in front of the measured person, wrapped the measuring tape around the waist horizontally along the measuring point, repeated the measurement twice and recorded the average value [19]. Weight and height were measured in a standing position using calibrated weighing scales, and body mass index (BMI) was calculated as weight (kg) divided by height (m) squared. After the subjects sat for at least 5 min, a HEM-906 sphygmomanometer (Omron Matsuzak Co., Ltd., Japan) was used to measure the blood pressure of the right upper limb of the subjects. Systolic blood pressure (SBP) and diastolic blood pressure (DBP) were measured for 3 consecutive times with an interval of 30 s, and the average value of the three measurements was taken as the blood pressure value.

Model outcome: IFG definition
According to the Chinese Type II Diabetes Prevention and Control Guidelines (2017 Edition), an FBG between 3.9 and 6.0 mmol/L was consider non-IFG, and FBG between 6.1 and 6.9 mmol/L was consider as IFG [20].

Statistical analysis
A descriptive analysis of the baseline characteristics was performed. Continuous variables were analysed by Student's t-test, which was expressed as mean ± standard deviation (SD); the categorical variables were tested using the chi-squared test. The incidence density was estimated based on the number of new cases and the number of years of observation, and the trend of IFG incidence density with age was analysed using the Cochran-Armitage trend test.
We randomly selected 70 % of the cohort of subjects as the training set, and the remaining 30 % as the test set. We first selected the predictor variables through univariate GEE (P<0.20), then used multivariate GEE to establish the prediction model for the onset risk of IFG through the training set, and finally used the test set to verify the performance of the established model through sensitivity, specificity, and the under receiver operating characteristic curve (AUC). The GEE was implemented by the GENMOD module in SAS9.4, and all data were analysed using SAS9.4 statistical software (version 9.4; SAS Institute Inc., Cary, North Carolina).

Study cohort
A total of 4,926 subjects with an average age of 44.85± 14.80 years were included in the cohort, including 2,634 men and 2,292 women. Subjects in the cohort participated in the check-up at most nine times and at least two times, with an average of 3.87 times. The follow-up results for different sex and ages are summarized in Table 1. During the follow-up period, 442 new cases of IFG were diagnosed, including 286 men and 156 women. The total incidence density was 23.21/1000 personyears, male incidence density was 26.88/1000 personyears, female incidence density was 18.53/1000 personyears, and males were higher than females (P<0.0001). The Cochran Armitage test found that there was a linear trend between IFG incidence density and age, and the incidence density showed an obvious increasing trend with increasing of age (Z=-12.5907, P<0.0001). Table 2 shows that at baseline, the average age of the IFG group was (53.37±14.26) years, and the average age of the non-IFG group was (43.99±14.59) years. The difference was statistically significant. When entering the cohort, the BMI of the IFG group was significantly higher than that of the non-IFG group; the SBP, DBP, TC, TG, LDL-C, blood urea, SCr, and SUA of the IFG group were higher than that of the non-IFG group, while HDL-C was lower than the non-IFG group. Compared with IFG group, except height, there were significant differences in age, weight, BMI, WC, SBP, DBP, TC, TG, HDL-C, LDL-C, blood urea, SCr and SUA between the two groups (P<0.05).

Establishment of the prediction model for the onset risk of IFG
Taking IFG as the dependent variable, the sex, age, BMI, WC, SBP, DBP, TC, TG, HDL-C, LDL-C, blood urea, SCr, and SUA of the subjects were used as independent variables to fit the GEE. As shown in Table 3, we selected all possible predictor variables from univariate GEE (P<0.20). We established a prediction model for the onset risk of IFG by inputting the possible predictor variables.  The performance of prediction model for the onset risk of IFG Figure 2 summarizes the area under the receiver operating characteristic curve (AUC) obtained from the training and test sets of the prediction model for the onset risk of IFG. Figure 2 shows the AUC obtained from the training set and test set of the prediction model for the onset risk of IFG, and judges the discriminant ability of the prediction model based on the AUC.

Discussion
IFG was defined FBG level is higher than normal, but not high enough to diagnose DM, and belongs to the prediabetic condition [7,26]. The survey results revealed that the national rate of IFG in middle-aged and elderly people was 7.3 %, which increased to 9.7 % in a 2018 study [27,28]. Studies have shown that the IFG is an obligatory process from a healthy state to DM. If IFG was detected and immediate measures were taken, the progress of DM may have been delayed or halted. If it was not controlled, it can easily lead to DM [29]. The results of Yeboah et al. [30] indicated that IFG is an independent risk factor for the development of type II DM, which means that interventions to reduce the incidence of IFG will ultimately reduce the incidence of type II DM. Early diagnosis and intervention of IFG are effective in controlling the incidence of DM [31]. IFG can not only prevent the occurrence of DM, but it is also related to the occurrence of cardiovascular disease (CVD). Studies have shown that patients with IFG have a significantly increased CVD risk compared to patients with normal glucose tolerance [32]. A study used a national epidemiological database to explore the different types of FBG (normal, normal-high, IFG, and DM) and the risk of future CVD (including coronary heart disease, stroke, heart failure, etc.) in young adults. The results showed that the risk of CVD increased with an increase in FBG levels, and the risk of myocardial infarction in the IFG category began to increase significantly [33]. Therefore, early identification of IFG plays an important role in the primary prevention of CVD. In 2010, Volpe et al. [34] proposed that understanding the  Note: the gender was female as reference group, BMI was normal as reference group, RR was 1.0. According to the parameters listed in Table 4, we can obtain a formula to compute LogitP of IFG Fig. 2 The receiver operating characteristic (ROC) curves of prediction model for the onset risk of IFG. A for the training set and (B) for the test set relationship between abnormalities in blood glucose metabolism (or dysglycaemia) and CVD complications was a key point in the primary prevention of CVD. Disease risk assessment is a key technique for chronic disease management and is an effective auxiliary diagnostic tool to identify high-risk groups [35]. In general, the development of risk prediction models incorporates general patient characteristics, clinical trial results and other data [36,37]. At present, there is a lack of a risk assessment model for IFG in China. Except for a few studies that only focused on the epidemiological factors of IFG, there was no intuitive risk assessment model to predict IFG. In addition, most of the currently reported studies were cross-sectional studies. In this study, we conducted a longitudinal retrospective cohort study, including up to 10 years of health check-up dynamic change indicators, to predict the risk of IFG. We established a prediction model for the onset risk of IFG based on longitudinal physical check-up data and identified the predictor variables of the model through the GEE. Finally, the performance of the model was verified.
In recent years, health check-ups have become an important method of health management. The predictor variables of our risk model were readily available on health check-ups, including age, sex, BMI, WC, DBP, SBP, four4 items of blood lipids (TC, TG, LDL-C, and HDL-C), three of renal function (SCr, SUA, and blood urea), and others. SCr and TC were found to be protective factors for IFG by multivariate GEE (RR=0.960, 95 % CI: 0.945~0.976; RR=0.674, 95 % CI: 0.403~1.128, respectively). Yoshida et al. involved 7,905 participants in a community-based longitudinal cohort health examination study after adjusting for age, BMI, SBP, and metabolic disease-related variables. This study concluded that the level of SCr is related to the onset of IFG, and the lower the level of SCr, the more likely it is to lead to the development of IFG [25]. SCr acts as a protective factor against IFG, which may be associated with skeletal muscle mass. Skeletal muscle is the main target organ for the control of blood glucose, which produces SCr at a relatively constant rate after creatine and phosphocreatine metabolism [38,39], and SCr is a measure of skeletal muscle mass [40]. When the SCr level is low, skeletal muscle capacity is low, implying fewer insulin targets, which explains why a lower SCr level causes IFG [41]. Hyperuricemia was defined as SUA> 416 mmol/L in men and 357 mmol/L in women [23,24]. In our established model, hyperuricemia acted as an independent risk factor for predicting IFG, which was similar to the results of many studies and different from the results of Taniguchi et al., who did not find an association between SUA levels and the risk of type II DM through a retrospective cohort study [42][43][44][45]. Studies have shown that SUA interacts with the FBG levels.
Hyperinsulinemia elevated SUA levels by reducing the excretion of SUA and the accumulation of SUA products, and an increase in SUA also decreased glucose uptake by insulin [46,47]. We evaluated the performance of the model based on the sensitivity, specificity, and AUC. Sensitivity and specificity represent the ability of the model to identify positive and negative results, respectively. Generally, good quality prediction models have both high sensitivity and specificity [48]. Our risk prediction model could be used to screen undiagnosed individuals with IFG, because of its good sensitivity and specificity. The sensitivity and specificity of the training set were70.5 % and 66.1 %, respectively, and the sensitivity and specificity of the test set were 71.5 % and 66.8 % respectively. The best cut-off point in the model was 2.12 %, that is, when the probability was greater than 2.12 %, IFG occurred. The prediction accuracy of the model was evaluated by the magnitude of the AUC; the more accurate the prediction model, the greater the AUC. In general, an AUC greater than 0.7 can be considered good for the model predictive ability. For the established model, the AUCs of the training and the test sets were 0.740 and 0.751 respectively. The sensitivity, specificity, and AUC showed that the prediction model for the onset risk of IFG was valuable. The advantages of this study include a large sample size and a long cohort time. In addition, all qualified participants underwent a complete health check-up. However, it should be noted that the present study had some limitations. First, we were only able to collect data annually; therefore, data collection was not truly continuous. Second, we investigated the health check-up cohort, which may limit the generalisation of our results to other populations. Finally, this study was a retrospective cohort study, and the lifestyle of the subjects was not investigated. Because many diseases are often closely related to lifestyle, a comprehensive study should be conducted in future research. We should not be limited to health check-up data, but should incorporate lifestyle and so on.

Conclusions
The predictive ability of the risk model based on longitudinal health check-up data in the training and test sets was reliable, with simple predictor variables and risk forms. This model can help individuals assess the risk of IFG, and provide evidence for the primary prevention and control of DM and CVD.