Skip to main content

The BAriatic surgery SUbstitution and nutrition (BASUN) population: a data-driven exploration of predictors for obesity

Abstract

Background

The development of obesity is most likely due to a combination of biological and environmental factors some of which might still be unidentified. We used a machine learning technique to examine the relative importance of more than 100 clinical variables as predictors for BMI.

Methods

BASUN is a prospective non-randomized cohort study of 971 individuals that received medical or surgical treatment (treatment choice was based on patient’s preferences and clinical criteria, not randomization) for obesity in the Västra Götaland county in Sweden between 2015 and 2017 with planned follow-up for 10 years. This study includes demographic data, BMI, blood tests, and questionnaires before obesity treatment that cover three main areas: gastrointestinal symptoms and eating habits, physical activity and quality of life, and psychological health. We used random forest, with conditional variable importance, to study the relative importance of roughly 100 predictors of BMI, covering 15 domains. We quantified the predictive value of each individual predictor, as well as each domain.

Results

The participants received medical (n = 382) or surgical treatment for obesity (Roux-en-Y gastric bypass, n = 388; sleeve gastrectomy, n = 201). There were minor differences between these groups before treatment with regard to anthropometrics, laboratory measures and results from questionnaires. The 10 individual variables with the strongest predictive value, in order of decreasing strength, were country of birth, marital status, sex, calcium levels, age, levels of TSH and HbA1c, AUDIT score, BE tendencies according to QEWPR, and TG levels. The strongest domains predicting BMI were: Socioeconomic status, Demographics, Biomarkers (notably TSH), Lifestyle/habits, Biomarkers for cardiovascular disease and diabetes, and Potential anxiety and depression.

Conclusions

Lifestyle, habits, age, sex and socioeconomic status are some of the strongest predictors for BMI levels. Potential anxiety and / or depression and other characteristics captured using questionnaires have strong predictive value. These results confirm previously suggested associations and advocate prospective studies to examine the value of better characterization of patients eligible for obesity treatment, and consequently to evaluate the treatment effects in groups of patients.

Trial registration

March 03, 2015; NCT03152617.

Peer Review reports

Background

Obesity is a complex but treatable disease with major individual and societal consequences. Although the World Health Organization (WHO) has emphasized the role of society as well as the individual in preventing obesity, the global prevalence close to tripled between 1975 and 2016 [1]. The Global Burden of Disease Study has recently established that obesity is indeed a major global health challenge, demanding population-wide but country-specific initiatives to mitigate the burden of a wide range of diseases [2].

As formulated by WHO, the fundamental cause of obesity and overweight is an energy imbalance between calories consumed and calories expended against a background environmental and societal factors [1]. There are also links to numerous medical and socioeconomic conditions, e.g., psychiatric, endocrine and cardiovascular disorders [3, 4], which in some cases can presumably be mediated via behavioral, inflammatory and vascular pathways [5]. It is thus quite possible that different mosaics of biological and environmental factors in different individuals, contribute to the development of the disease.

Artificial intelligence in the form of machine learning is increasingly used to discern single factors, or combinations of factors, of importance for defining disease or predicting outcomes. Machine learning techniques are well suited to handle large amounts of data, including variables not commonly used to assess risk in clinical practice, and to identify the smallest number of variables needed for accurate prediction. The use of this method has, e.g., been used to explain variations in obesity prevalence between counties, based on demographic, socioeconomic, health care, and environmental factors [6].

The BASUN study is an ongoing prospective cohort study that follows close to 1000 individuals accepted for treatment of obesity, medical or surgical, in clinical practice in Region Västra Götaland, Sweden for 10 years [7]. An important aim of BASUN is to compare effects and complications of surgical and medical treatment of obesity but the overall goal is to improve the care of individuals with obesity as well as reduce adverse outcomes of treatment. In this study, we applied machine learning algorithms on the extensive clinical information (most of which is not collected in studies in the obesity domain) available for the participants of BASUN. The specific aim of this analysis was to seek out factors strongly linked to severe obesity. In turn such factors can be hypothesis-generating, and can be addressed during the follow-up of BASUN, as well as in prospective trials and clinical practice.

Methods

Study design and participants

The design of the BASUN study and the recruited patient cohort have been recently described [7]. To summarize, it is a prospective non-randomized cohort study that originally included 1127 individuals with BMI 35 kg/m2 or higher referred for treatment of obesity in clinical practice in Region Västra Götaland, Sweden, between May 2015 and November 2017. Patients were offered medical or surgical treatment of obesity, based on their wishes and whether they met the usual criteria for the different treatment options. The treatment options have been described in detail [7] and included medical treatment with very low energy diet (VLED) for 12–20 weeks followed by an energy restricted diet up to 12 months or surgical treatment (Roux-en-Y gastric bypass (RYGB) or sleeve Gastrectomy (SG)). Apart from regular clinical visits, follow-up according to the study protocol is planned at 2, 5 and 10 years. The Ethical Regional Board of Gothenburg, approved the protocol (application 673–14). Informed consent to participate was obtained from all study participants.

Anthropometric and laboratory measurements

Demographic data, measurements on height and weight as well as blood tests were collected before the start of treatment. Some blood samples that were important for treatment decisions were analysed directly, while other samples were stored in a biobank [7].

Questionnaires

We used questionnaires to cover three main areas: gastrointestinal symptoms and eating habits, physical activity and quality of life, and psychological health (previously summarized [7]). The questionnaires included to investigate eating habits were the 21-item Three Factor Eating Questionnaire (TFEQ-R21) [8] and the Questionnaire on Eating and Weight Patterns-Revised (QEWP-R) [9]. To gather information on physical activity and quality of life, the Saltin Grimby (SGQ) physical activity level questionnaire [10], the RAND-36 questionnaire [11], and EuroQol five-dimensional questionnaire (EQ-5D) [12] were included. Two questionnaires were included to investigate psychological health, the Becks Anxiety Inventory (BAI) [13], for the measurement of severity of anxiety and the Patient Health Questionnaire-9 (PHQ-9) [14], a self-reported measure of depression. The Alcohol Use Disorders Identification Test (AUDIT) [15], was used to identify individuals with harmful patterns of alcohol consumption.

Statistical analysis

Data analysis were performed using R (R Foundation for Statistical Computing, version 4.0.3). Continuous variables are reported as mean (SD) and categorical variables as numbers (n) and proportions (%). Standardized mean difference (SMD) were used to compare group characteristics; SMD is the difference between sample means divided by their pooled standard deviation. A SMD of less than 0.1 was considered non-significant.

The machine learning algorithm random forest has become more common in medical research [16] and was used here to examine the importance of over 100 clinical variables as predictors for body mass index (BMI). Because of the large number of input variables, using fully parameterized regression models would be difficult because of the risk of overfitting. Using random forests with a large number of trees has been shown to be effective in prediction without overfitting.

The variables were also divided manually into 15 clinically similar domains (Socioeconomic status, Age/Sex, Lifestyle and habits, Metabolic disease, Cardiovascular disease, Potential anxiety/depression, Biomarkers for cardiovascular disease and diabetes, Other biomarkers, Medication for cardiovascular disease or diabetes, Psychiatric disease, Gastrointestinal disease, Endocrine conditions, Musculoskeletal disease, Previous surgery and Other conditions). The predictive values of the different domains were assessed as well as the predictive value of each individual variable. Three thousand trees were used for each binary classification model.

The variable importance (described by van der Laan [17]) was computed using a conditional permutation scheme which minimizes the effect of correlation between variables and reliably reflects the impact of each variable [18]. Using random forest variable importance measures include not only the impact each variable individually but also in multivariate interactions with other included variables. Each variable is permuted (removed) randomly and the effect of this permutation on the prediction accuracy is assessed. The variable importance is thus estimated using the difference of accuracy before and after the variable was permuted. Permuting the variable removes the association between that particular variable and the outcome and for important variables, the accuracy of the model decreases. The value of the importance is arbitrary and has been derived from quantifying the change in accuracy by permutation. The permutation accuracy importance used in this manuscript was developed by Strobl and colleagues [18]. A fully adjusted random forest model was also used to analyse and visualize the relationship between the 10 strongest predictors and BMI. These models included 1500 trees. Missing data was handled using multiple imputation by chained equations (MICE) algorithm (mice package in R). Supplementary Figure 1 shows a graphic description of the results before and after imputation.

Results

The final study population eligible for follow-up in the study and included in this analysis consisted of 1127 individuals of which 971 subsequently started treatment (medical treatment (n = 382), RYGB (n = 388) and SG (n = 201); Table 1). After inclusion, 156 individuals chose not to continue to treatment but are included in the analyses. There were more women in all treatment groups. There were minor differences in mean BMI and age (SMD > 0.1). The majority of the study population was born in Sweden. Using a non-adjusted model, the differences in BMI levels between the sexes and between individuals born in Sweden and outside of Sweden were more pronounced.

Table 1 Characteristics of the BASUN population at baseline

There were differences with regard to marital status and education as well as nicotine usage (SMD > 0.1) With regard to previous diabetes, the groups were similar (SMD < 0.1) but there were slight differences in other reported metabolic disease (hyperlipidemia, hypertension and sleep apnea) (SMD > 0.1) as well as levels of HbA1c, glucose and low-density lipoprotein. Information on previous psychiatric illness was self-reported in questions on known diagnosis and pharmaceutical treatment as well as specific questionnaires. Self-reported previous depression or anxiety and treatment for these disorders differed between the groups as well as the results from the questionnaires focusing on depression (PHQ-9) and anxiety (BAI). There was also a difference in reported usage of antipsychotics between the groups. Factors that might influence the choice of bariatric surgery, such as hemoglobin levels, known deficiencies of vitamins and minerals, eating habits assessed by TFEQ and QEWPR questionnaires, AUDIT scores or previous malignancies were not different between the treatment groups (SMD < 0.1) but there was a difference in known gastrointestinal-, pulmonary- and cardiovascular disease.

The relative importance of the 15 clinical domains and an overview of the variables included in each domain can be seen in Fig. 1. The distribution of variables within each domain can be seen in more detail in supplementary Table 1. The strongest predictive domains observed were: Socioeconomic status, Age/sex, Other biomarkers (hemoglobin, calcium, TSH, T4, liver transaminases, creatinine), Lifestyle and habits, Biomarkers for cardiovascular disease and diabetes (HbA1c, glucose, TG, HDL, LDL, urinary albumin), Potential anxiety and depression, Metabolic disease, Medication for cardiovascular disease or diabetes and Other conditions. The six remaining domains that had little or no predictive value.

Fig. 1
figure1

Predictive value of each clinical domain on BMI as computed using a conditional permutation scheme and the variables included within each domain. CV: cardiovascular, DM: diabetes mellitus, Hgb: hemoglobin, Ca: calcium, TSH: thyroid stimulating hormone, T4: thyroxine, ASAT: aspartate aminotransferase, ALAT: alanine aminotransferase, TFEQ: three factor eating questionnaire, QEWPR: Questionnaire on eating and weight patterns, AUDIT: Alcohol use disorders identification test, EQ. 5D: EuroQol five-dimensional questionnaire, SGQ: Saltin Grimby questionnaire, HbA1c: glycated hemoglobin, BG: blood glucose, TG: triglycerides HDL: high density lipoprotein, LDL: low-density lipoprotein, U-Alb: urinary albumin, BAI: Beck anxiety inventory, PHQ-9: Patient health questionnaire-9, IHD: ischemic heart disease, HF: heart failure, VTE: venous thromboembolism, PPI: proton-pump inhibitors, ADHD: attention deficit and hyperactivity disorder

The 10 individual variables with the strongest predictive value, in order of decreasing strength, were country of birth, marital status, sex, calcium levels, age, levels of TSH and HbA1c, AUDIT scores, binge eating reflected by the QEWPR questionnaire and levels of TG (Fig. 2). The relationship between these 10 variables individually and BMI is graphically presented in Fig. 3. According to the random forest model being born in Sweden, male sex and younger age seem to be associated with higher BMI levels as well as a self-reported tendency of binge eating. Higher levels of triglycerides and thyroid stimulating hormone were also predictors for higher BMI as opposed to lower levels of calcium and HbA1c. For the largest part of the population there was an inverse relationship between AUDIT scores and BMI. Being married (status 1) was associated with lower BMI levels in comparison with living in cohabitation without being married (status 2), a relationship without cohabitation (status 3), being single (status 4) or living with parents (status 5).

Fig. 2
figure2

Relative importance of individual predictors on BMI as computed with a conditional permutation scheme. CV: cardiovascular, DM: diabetes mellitus, Hgb: hemoglobin, Ca: calcium, TSH: thyroid stimulating hormone, T4: thyroxine, ASAT: aspartate aminotransferase, ALAT: alanine aminotransferase, TFEQ: Three factor eating questionnaire, QEWPR: Questionnaire on eating and weight patterns, AUDIT: Alcohol use disorders identification test, EQ. 5D: EuroQol five-dimensional questionnaire, SGQ: Saltin Grimby questionnaire, HbA1c: glycated hemoglobin, BG: blood glucose, TG: triglycerides HDL: high density lipoprotein, LDL: low-density lipoprotein, U-Alb: urinary albumin, BAI: Beck anxiety inventory, PHQ-9: Patient health questionnaire-9, IHD: ischemic heart disease, HF: heart failure, VTE: venous thromboembolism, PPI: proton-pump inhibitors, ADHD: attention deficit and hyperactivity disorder

Fig. 3
figure3

The ten variables with the strongest predictive value for body mass index as analyzed and visualized by random forest. 10% of the population presented by each marking on the x-axis where relevant. Marital status: 1) married, 2) cohabitation, 3) relationship without cohabitation, 4) single, 5) living with parents. TSH: thyroid stimulating hormone, AUDIT: Alcohol use and disorders identification test, QEWP-R: Questionnaire on eating and weight patterns-revised

Discussion

In this study of the baseline characteristics of the BASUN population we found that variables associated with socioeconomic status, age, sex, lifestyle and habits are the strongest predictors for BMI levels. Potential depression and anxiety according to questionnaires also have strong predictive values, stronger than self-reported diagnoses or pharmaceutical treatment of these disorders. The predictive values of clinical laboratory measurements such as serum triglycerides and HbA1c, but also to TSH and serum calcium levels were strong. These results confirm previously suggested associations [19,20,21], but also advocate prospective studies to examine the value of better characterization of patients eligible for obesity treatment, and consequently to evaluate the treatment effects in these groups of patients.

A systematic literature review of machine learning (ML) tools in predicting childhood obesity recently concluded that ML algorithms such as decision trees and artificial neural networks can accurately predict childhood obesity [22]. ML algorithms created to predict obesity in young children mainly focus on height and weight at young age [23], while external factors have been shown to have minor or no influence [24]. Models that focus on predicting obesity in older teenagers have included factors such as eating habits and levels of physical activity [25]. Adult obesity is a complex disease with a multitude of environmental and biological contributing factors. A recent review of various machine learning models used to identify a set of risk factors associated with obesity reported BMI, age, nicotine, blood pressure, blood glucose, lipid profile, adiposity, physical activity, dietary habits and family history as identified risk factors [26]. All of the studies had obesity or overweight as the outcome, not BMI as included in the present study. Models for predicting obesity in adults have to include large magnitude of diverse data. Previous studies have used data from food sales to predict obesity and have shown that the strongest categories in predicting obesity on a country level were baked goods/flours, cheese and carbonated drinks. A reported limitation of this study was that it was unclear if the diet composition was a true cause of obesity or simply a surrogate for sedentary behavior [27]. Predictive decision tree algorithms have also been used to predict metabolic syndrome and to rank behaviors that lead to long-term success after RYGB surgery [28, 29] and data from the Scandinavian Obesity Surgery Registry has been used to compare the capability of different machine learning algorithms in predicting severe complications of surgery. Although the algorithms performed well on the training data, none of the methods included successfully predicted these outcomes when applied to data outside of the training set, indicating the difficulty of applying results from such analyses to real life [30].

The majority of the participants included in our study have similar BMI. Some of the variables presented as having high predictive value, such as liver transaminases, triglycerides and HbA1c levels are more likely to be secondary to obesity. The extreme levels of TSH were only seen in a small percentage of participants with the majority of individuals having levels closer to normal range. The relationship between untreated hypothyroidism and higher BMI levels is not surprising, but the relationship between calcium and BMI was not. Higher BMI is more likely to be associated with lower calcium levels because of relative vitamin D deficiency in individuals with obesity due to d-vitamin sequestration in adipose tissue [31]. Generally, being married has been related to higher BMI levels although this has been shown to differ by gender, age and even ethnicity in studies based on data from the United States [32]. Our results indicate that being married is related to lower BMI levels. The largely Swedish population and lack of variety in ethnicity might in our study explain these differences in comparison to other studies. The value of education as an individual predictor was much lower than many of the other variables although the ‘Socioeconomic status’ domain was the strongest predictive domain. This suggest that it is the actual combination of certain factors that matters.

The high predictive value of answers from questionnaires indicates the importance of these in the evaluation and treatment of individuals with obesity. Scores from the PHQ-9, QEWPR and AUDIT questionnaires were shown to have much higher predictive value than self-reported psychiatric disease or pharmaceutical treatment for anxiety and/or depression. The population included generally reported a low level of physical activity according to the Saltin Grimby questionnaire, with 90–95% of the patients reporting sedentary or low levels of activity only. However, this variable was not one of the individual variables with the strongest predictive value. When the effect of levels of TSH and liver transaminases are considered within the ‘Biomarkers, other’ domain, it is likely that the predictive effect of this domain might be misleading as these markers are more likely to be secondary to obesity. This might also be the case with the effect of TG levels within the ‘Biomarkers, CV/DM domain’.

The prevalence of obesity differs between men and women to varying degrees in different parts of the world. The reasons for this are considered to be multifaceted [33]. In the present study, as well as studies on obesity in general, there were more women, but we observed extremely high BMI levels in younger men and the average BMI was higher among the men. A higher mean pre-operative BMI in males has been reported previously [34]. The inverse relationship between age and BMI as well as the fact that most of the individuals included had normal HbA1c levels might explain the observed association of higher BMI levels with lower HbA1c levels as poor glycemic control might not yet have developed in the younger individuals. The fact that the difference in BMI levels between the sexes and depending on country of birth were more pronounced using a non-adjusted model indicate that at least some of the associations of sex and country of birth are mediated by other factors.

A strength of the present study is the use of a large number of diverse validated variables, including socioeconomic information, biomarkers, psychiatric health, eating habits, alcohol, nicotine, levels of physical activity, previous diseases as well as pharmaceutical treatment. The study includes close to 1000 patients and planned long-term follow-up. The choice of treatment is based not only on clinical guidelines but also on the patient’s preferences. This approach reflects the treatment as it is in clinical practice. The BASUN study includes a heterogenous population of individuals with obesity, not only focusing on established comorbidities. The population included can be considered representative and is comparable to participants in other larger obesity treatment studies such as the Swedish Obese Subjects (SOS) ( [35]), as well as the OPTITWIN [36], DIETFITS [37] and POUNDS LOST studies [38]. All of these included a predominantly female population and similar age groups (39–52 years). The BMI levels in the SOS study were also between 41 and 42 kg/m2 but slightly lower in the other studies (33–39 kg/m2). Our statistical methods, random forest variable importance measure, covers the impact of each predictor as well as multivariable interactions, and using conditional random forest models has the advantage of minimizing the effect of correlations between different variables.

There were also limitations in the study. The individuals included in this study have been referred and accepted for treatment of obesity. The population might therefore differ from the general overweight population seen in society in general as well as in clinical practice in other settings, which can limit the external validity. The differences in certain baseline characteristics were expected due to non-randomization. The study is largely based on self-reported data and there is a risk that individuals that have the most severe psychiatric disorders will not answer the questionnaires or that individuals that do not succeed with their treatment might not report back during the follow-up period. This might introduce bias. There might be a certain economic aspect in the choice of treatment as participants included in the medical treatment group pay for the VLED products themselves. Information on income or employment status was not included, but the VLED diet is not more costly than a normal diet and there were only minor differences between the groups with regards to education and marital status which could reflect economic status to some degree. An inclusion criterion for the study was that participants could understand Swedish which excluded some participants with country of birth other than Sweden, and we have not collected genetic data or the family history of the patients.

An important aim of the analysis in the present study was to seek out variables that could be hypothesis generating and useful in early risk prediction of obesity. Before the results of our study can be applied directly in clinical practice, further studies are needed. However, the factors with the strongest predictive value described in this study, such as scores from questionnaires may be of value when choosing treatment options for obesity, both medical and surgical. Comparing different types of ML methods on the BASUN data and dividing the population by class of obesity might be of value. Prospective studies using ML techniques including individuals that are overweight and not yet obese, might also add valuable information on predictive factors for obesity. Planned analyses of follow-up data from BASUN will be used to find predictive variables for successful obesity treatment.

Conclusions

Variables associated with lifestyle, habits, age, sex and socioeconomic status are the strongest predictors for BMI levels. Self-reported anxiety and depression through questionnaires also have strong predictive value, stronger than self-reported diagnosis or pharmaceutical treatment of these disorders. We propose that future studies should examine the value of wider characterization of patients treated for obesity.

Availability of data and materials

Data and material are not available due to the nature of this prospective study. The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

All methods were carried out in accordance with relevant guidelines and regulations.

Abbreviations

T2D:

Type 2 diabetes

BMI:

Body mass index

MT:

Medical treatment

RYGB:

Roux-en-Y gastric bypass

SG:

Sleeve gastrectomy

References

  1. 1.

    WHO. Obesity and overweight 2020 [updated 03.03.2020. Available from: https://www.who.int/news-room/fact-sheets/detail/obesity-and-overweight.

    Google Scholar 

  2. 2.

    Dai H, Alsalhe TA, Chalghaf N, Riccò M, Bragazzi NL, Wu J. The global burden of disease attributable to high body mass index in 195 countries and territories, 1990–2017: an analysis of the global burden of disease study. PLoS Med. 2020;17(7):e1003198. https://doi.org/10.1371/journal.pmed.1003198.

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Golden SH, Brown A, Cauley JA, Chin MH, Gary-Webb TL, Kim C, et al. Health disparities in endocrine disorders: biological, clinical, and nonclinical factors—an Endocrine Society scientific statement. J Clin Endocrinol Metabol. 2012;97(9):E1579–E639. https://doi.org/10.1210/jc.2012-2043.

    Article  Google Scholar 

  4. 4.

    Polanka BM, Vrany EA, Patel J, Stewart JC. Depressive disorder subtypes as predictors of incident obesity in US adults: moderation by race/ethnicity. Am J Epidemiol. 2017;185(9):734–42. https://doi.org/10.1093/aje/kwx030.

    Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Bray GA, Heisel WE, Afshin A, Jensen MD, Dietz WH, Long M, et al. The science of obesity management: an Endocrine Society scientific statement. Endocr Rev. 2018;39(2):79–132. https://doi.org/10.1210/er.2017-00253.

    Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Scheinker D, Valencia A, Rodriguez F. Identification of factors associated with variation in US County-level obesity prevalence rates using epidemiologic vs machine learning models. JAMA Netw Open. 2019;2(4):e192884. https://doi.org/10.1001/jamanetworkopen.2019.2884.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Höskuldsdóttir G, Mossberg K, Wallenius V, Al Nimer A, Björkvall W, Lundberg S, et al. Design and baseline data in the BAriatic surgery SUbstitution and Nutrition study (BASUN): a 10-year prospective cohort study. BMC Endocrine Disord. 2020;20(1):1-9.

  8. 8.

    Cappelleri JC, Bushmakin AG, Gerber RA, Leidy NK, Sexton CC, Lowe MR, et al. Psychometric analysis of the three-factor eating questionnaire-R21: results from a large diverse sample of obese and non-obese participants. Int J Obes. 2009;33(6):611–20. https://doi.org/10.1038/ijo.2009.74.

    CAS  Article  Google Scholar 

  9. 9.

    Borges MB, Morgan CM, Claudino AM, da Silveira DX. Validation of the Portuguese version of the questionnaire on eating and weight patterns-revised (QEWP-R) for the screening of binge eating disorder. Braz J Psychiatry. 2005;27(4):319–22. https://doi.org/10.1590/S1516-44462005000400012.

    Article  PubMed  Google Scholar 

  10. 10.

    Grimby G, Borjesson M, Jonsdottir IH, Schnohr P, Thelle DS, Saltin B. The “Saltin-Grimby physical activity level scale” and its application to health research. Scand J Med Sci Sports. 2015;25(Suppl 4):119–25. https://doi.org/10.1111/sms.12611.

    Article  PubMed  Google Scholar 

  11. 11.

    Krops LA, Wolthuizen L, Dijkstra PU, Jaarsma EA, Geertzen JHB, Dekker R. Reliability of translation of the RAND 36-item health survey in a post-rehabilitation population. Int J Rehabil Res. 2018;41(2):128–37. https://doi.org/10.1097/MRR.0000000000000265.

    Article  PubMed  Google Scholar 

  12. 12.

    Sullivan PW, Ghushchyan VH. EQ-5D scores for diabetes-related comorbidities. Value Health. 2016;19(8):1002–8. https://doi.org/10.1016/j.jval.2016.05.018.

    Article  PubMed  Google Scholar 

  13. 13.

    Steer RABA. Beck anxiety inventory. In: Wood CP, editor. Evaluating stress: a book of resources. Lanham: Scarecrow Education; 1997. p. 23–40.

    Google Scholar 

  14. 14.

    Kroenke K, Spitzer RL, Williams JB. The PHQ-9: validity of a brief depression severity measure. J Gen Intern Med. 2001;16(9):606–13. https://doi.org/10.1046/j.1525-1497.2001.016009606.x.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Babor TFH-BJ, Saunders JB, Monteiro MG. AUDIT: the alcohol use disorders identification test guidelines for use in primary care (second edition). Geneva: World Health Organization; 2001.

  16. 16.

    Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324.

    Article  Google Scholar 

  17. 17.

    van der Laan MJ. Statistical inference for variable importance. Int J Biostat. 2006;2(1):1-31.

  18. 18.

    Strobl C, Boulesteix A-L, Kneib T, Augustin T, Zeileis A. Conditional variable importance for random forests. BMC Bioinformatics. 2008;9(1):307. https://doi.org/10.1186/1471-2105-9-307.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    World Health Organization. Obesity: Preventing and Managing the Global Epidemic 2000 Available from: https://www.who.int/nutrition/publications/obesity/WHO_TRS_894/en/.

    Google Scholar 

  20. 20.

    Reinehr T. Obesity and thyroid function. Mol Cell Endocrinol. 2010;316(2):165–71. https://doi.org/10.1016/j.mce.2009.06.005.

    CAS  Article  PubMed  Google Scholar 

  21. 21.

    Arruda AP, Hotamisligil GS. Calcium homeostasis and organelle function in the pathogenesis of obesity and diabetes. Cell Metab. 2015;22(3):381–97. https://doi.org/10.1016/j.cmet.2015.06.010.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Triantafyllidis A, Polychronidou E, Alexiadis A, Rocha CL, Oliveira DN, da Silva AS, et al. Computerized decision support and machine learning applications for the prevention and treatment of childhood obesity: a systematic review of the literature. Artif Intell Med. 2020;104:101844. https://doi.org/10.1016/j.artmed.2020.101844.

    Article  PubMed  Google Scholar 

  23. 23.

    Mukhopadhyay S, Carroll A, Downs S, Dugan TM. Machine learning techniques for prediction of early childhood obesity. Appl Clin Inform. 2015;06(03):506–20. https://doi.org/10.4338/ACI-2015-03-RA-0036.

    Article  Google Scholar 

  24. 24.

    Hammond R, Athanasiadou R, Curado S, Aphinyanaphongs Y, Abrams C, Messito MJ, et al. Predicting childhood obesity using electronic health records and publicly available data. PLoS One. 2019;14(4):e0215571. https://doi.org/10.1371/journal.pone.0215571.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Zheng Z, Ruggiero K. Using machine learning to predict obesity in high school students. In Proceedings of the 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, MO, USA. 2017. p. 2132–8.

  26. 26.

    Chatterjee A, Gerdes MW, Martinez SG. Identification of risk factors associated with obesity and overweight—a machine learning overview. Sensors. 2020;20(9):2734. https://doi.org/10.3390/s20092734.

    Article  PubMed Central  Google Scholar 

  27. 27.

    Dunstan J, Aguirre M, Bastías M, Nau C, Glass TA, Tobar F. Predicting nationwide obesity from food sales using machine learning. Health Informatics J. 2020;26(1):652–63. https://doi.org/10.1177/1460458219845959.

    Article  PubMed  Google Scholar 

  28. 28.

    Karimi-Alavijeh F, Jalili S, Sadeghi M. Predicting metabolic syndrome using decision tree and support vector machine methods. ARYA Atheroscler. 2016;12(3):146–52.

    PubMed  PubMed Central  Google Scholar 

  29. 29.

    Robinson AH, Adler S, Stevens HB, Darcy AM, Morton JM, Safer DL. What variables are associated with successful weight loss outcomes for bariatric surgery after 1 year? Surg Obes Relat Dis. 2014;10(4):697–704. https://doi.org/10.1016/j.soard.2014.01.030.

    Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Cao Y, Fang X, Ottosson J, Näslund E, Stenberg E. A comparative study of machine learning algorithms in predicting severe complications after bariatric surgery. J Clin Med. 2019;8(5):668. https://doi.org/10.3390/jcm8050668.

    Article  PubMed Central  Google Scholar 

  31. 31.

    Carrelli A, Bucovsky M, Horst R, Cremers S, Zhang C, Bessler M, et al. Vitamin D storage in adipose tissue of obese and Normal weight women. J Bone Miner Res. 2017;32(2):237–42. https://doi.org/10.1002/jbmr.2979.

    CAS  Article  PubMed  Google Scholar 

  32. 32.

    Sobal J, Hanson KL, Frongillo EA. Gender, ethnicity, marital status, and body weight in the United States. Obesity. 2009;17(12):2223–31. https://doi.org/10.1038/oby.2009.64.

    Article  PubMed  Google Scholar 

  33. 33.

    Garawi F, Devries K, Thorogood N, Uauy R. Global differences between women and men in the prevalence of obesity: is there an association with gender inequality? Eur J Clin Nutr. 2014;68(10):1101–6. https://doi.org/10.1038/ejcn.2014.86.

    CAS  Article  PubMed  Google Scholar 

  34. 34.

    Perrone F, Bianciardi E, Benavoli D, Tognoni V, Niolu C, Siracusano A, et al. Gender influence on Long-term weight loss and comorbidities after laparoscopic sleeve gastrectomy and roux-en-Y gastric bypass: a prospective study with a 5-year follow-up. Obes Surg. 2016;26(2):276–81. https://doi.org/10.1007/s11695-015-1746-z.

    Article  PubMed  Google Scholar 

  35. 35.

    Sjöström L, Lindroos A-K, Peltonen M, Torgersson J, Bouchard C, Carlsson B, et al. Lifestyle, diabetes, and cardiovascular risk factors 10 years after bariatric surgery. N Engl J Med. 2004;351(26):2683–93. https://doi.org/10.1056/NEJMoa035622.

    Article  Google Scholar 

  36. 36.

    Ard JD, Lewis KH, Rothberg A, Auriemma A, Coburn SL, Cohen SS, et al. Effectiveness of a Total meal replacement program (OPTIFAST program) on weight loss: results from the OPTIWIN study. Obesity (Silver Spring). 2019;27(1):22–9. https://doi.org/10.1002/oby.22303.

    Article  Google Scholar 

  37. 37.

    Gardner CD, Trepanowski JF, Del Gobbo LC, Hauser ME, Rigdon J, Ioannidis JPA, et al. Effect of low-fat vs low-carbohydrate diet on 12-month weight loss in overweight adults and the association with genotype pattern or insulin secretion: the DIETFITS randomized clinical trial. JAMA. 2018;319(7):667–79. https://doi.org/10.1001/jama.2018.0245.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Sacks FM, Bray GA, Carey VJ, Smith SR, Ryan DH, Anton SD, et al. Comparison of weight-loss diets with different compositions of fat, protein, and carbohydrates. N Engl J Med. 2009;360(9):859–73. https://doi.org/10.1056/NEJMoa0804748.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Karin Mossberg is the principal investigator and professor Björn Eliasson and professor Lars Fändriks are co-investigators.

Funding

The study was financed Region Västra Götaland and by grants from the Swedish state under the agreement between the Swedish government and the country councils, the ALF-agreement (ALFGBG-725291), the Novo Nordisk Foundation, and an unrestricted grant from NovoNordisk. Open Access funding provided by University of Gothenburg.

Author information

Affiliations

Authors

Contributions

All authors contributed to the conception and design of this manuscript. GH prepared data for analysis and AR performed the statistical analyses. All authors contributed to the interpretation of data. GH and BE drafted the article, and all authors contributed to the critical revision of the article. KM, LF and BE are the guarantors of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors read and approved with final manuscript.

Corresponding author

Correspondence to Gudrún Höskuldsdóttir.

Ethics declarations

Ethics approval and consent to participate

The Ethical Regional Board of Gothenburg, approved the protocol (application 673–14). The ethics committee belongs to the Swedish Ethical Review Authority at the Government Offices of Sweden. Informed consent to participate was obtained from all study participants. All methods were carried out in accordance with relevant guidelines and regulations.

Study nurse obtained written and verbal informed consent from study participants.

Consent for publication

Does not apply.

Competing interests

Professor Eliasson reports personal fees (expert panels, lectures) from Amgen, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Merck Sharp & Dohme, Mundipharma, Navamedic, NovoNordisk, RLS Global, and grants and personal fees from Sanofi, all outside the submitted work. He was also supported by Konung Gustaf V:s och Drottning Victorias Frimurarestiftelse. Other authors report no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary figure 1.

Missing data patterns before and after imputation with MICE.

Additional file 2: Table S1.

Individual variables included in each clinical domain. *Questionnaires.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Höskuldsdóttir, G., Engström, M., Rawshani, A. et al. The BAriatic surgery SUbstitution and nutrition (BASUN) population: a data-driven exploration of predictors for obesity. BMC Endocr Disord 21, 183 (2021). https://doi.org/10.1186/s12902-021-00849-9

Download citation

Keywords

  • Obesity
  • Bariatric surgery
  • Diet
  • Prospective study
  • Cohort study