Cross-sectional study of height and weight in the population of Andalusia from age 3 to adulthood

Background and objectives In Andalusia there were no studies including a representative sample of children and adolescent population assessing growth and weight increase. Our objectives were to develop reference standards for weight, height and BMI for the Andalusian pediatric population, from 3 to 18 years of age for both genders, and to identify the final adult height in Andalusia. Subjects and methods Two samples were collected. The first included individuals from 3 to 18 years of age (3592 girls and 3605 boys). They were stratified according type of study center, size of population of origin, age (32 categories of 0.5 years) and gender, using cluster sampling. Subjects from >18 to 23 years of age (947 women and 921 men) were sampled in 6 non-university educational centers and several university centers in Granada. Exclusion criteria included sons of non-Spanish mother or father, and individuals with chronic conditions and/or therapies affecting growth. Two trained fellows collected the data through February to December 2004, for the first sample, and through January to May 2005, for the second. Reference curves were adjusted using Cole's LMS method, and the quality of the adjustment was assessed using the tests proposed by Royston. In addition, a sensitivity analysis was applied to the final models obtained. Results Data for 9065 cases (4539 women and 4526 men) were obtained; 79.39% (n = 7197) in the up to 18 years of age group. In the first sampling only 0.07% (3 girls and 2 boys) refused to participate in the study. In addition, 327 students (4.5%) were absent when sampling was done. We present mean and standard deviation fort height, weight and BMI at 0.5 years intervals, from 3 to 23 years of age, for both genders. After adjustment with the different models, percentiles for height, weight (percentiles 3, 5, 10, 25, 50, 75, 90, 95, and 97) and BMI (percentiles 3, 5, 50, 85, 95, and 97) are presented for both genders. Conclusion This is the first study in Andalusia with a representative sample from the child-juvenile population to investigate weight, height and BMI in subjects from 3 to 23 years of age. The great variability observed in the values from sample of 18 to 23 years of age individuals, ensures the inclusion of extreme values, although random sampling was not used. There still is a lack of standard reference values for the Andalusian population younger done 3 years of age.


Introduction
Growth is a complex biological process whereby an organism achieves an increase in mass and size, while at the same time it matures morphologically and functionally until it acquires the characteristics of the adult state. It is a process that is genetically determined by the activation of stimulating and inhibiting genes, but modulated by extragenetic factors, so that the rhythm of maturing and the final size are the result of a complex interaction between genes and environment.
Growth is the fundamental physiological process that characterizes childhood. It should be monitored by a paediatrician and the family and considered a health indicator. In a similar way, secular trends in growth show the level of health of the population itself.
References of growth are one of the most valuable and commonly used instruments in the evaluation of the wellbeing of individuals, groups of children and the communities they live in, and for following this process in achieving a series of sanitary and other wider targets related to social equality. This is due to the well-known fact that the improvement in socio-sanitary and nutritional conditions leads to growth acceleration in a determined population.
Auxological anthropometry is a combination of biometric techniques applied to the study and evaluation of growth. The use of this tool brings a series of data about the population (weight, height, perimeters, etc) and creates a model which can be used as a standard for this population, if it is representative of them.
A correct assessment of a growth pattern requires the comparison of the subject's data with standards obtained from a representative sample of the population the subject belongs to. Such standards can be elaborated by using three methods, which differ in the way the sample is chosen and followed: the transversal, the longitudinal and the semi-longitudinal method.
Studies with a transversal design for the creation of weight and height tables carry out a single resolution to determine these parameters in a sample that represents the population it proceeds from. Longitudinal studies follow the sample through its process of growth and development. When the data obtained from longitudinal and transversal studies are adequately compared, they are found to be virtually interchangeable beyond the pubertal growth spurt. In transversal studies, it is not possible to create growth rate tables or to adequately monitor growth during puberty.
From the moment of their publication in 1965, Tanner and Whitehouse's tables [1], which were later updated [2], have been widely used for the evaluation of weight and height of populations even outside Great Britain. In our country, these tables have been used for a long time in clinical practice for the anthropometric evaluation in our patients. Currently, they are still used in hospitals and primary healthcare centres, even though it has been proven that they are now obsolete and do not represent population nowadays [3]. In the United Kingdom, where these tables are no longer used, there has been widespread confusion about which tables are most appropriate for clinical use. A working group at the request of the Royal College of Paediatrics and Child Health, has assessed the available tables and recommended the tables known as UK 90 [4] for clinical use. These have since been validated as a tool for monitoring growth in the British population [5].
A broad study which took place in 10 European countries, including Spain, showed that there are still international differences in the growth of the child population [6]. For this reason, the scientific community recommends the use of local height and weight standards in the clinical assessment of patients.
On the other hand, we live in a complex and multiracial society. It is difficult or even impossible to carry out an auxological evaluation of an immigrant child in our country, due to the lack of standards from their country of origin. In these circumstances, graphs from the adopted country are used, given that environmental factors are fundamental when the child receives adequate nutrition and care from birth [7], while posterior improvement in environmental conditions optimises the genetic load expressiveness. Another option is the use of the WHO graphs. The Multicentre Growth Reference Study (MGRS) aims to generate growth references for infants and children until 5 years of age. The sample was made up of 8,500 children from 6 countries (Brazil, Ghana, India, Norway, Oman and the US). The 60-month study demonstrated that these standards can be used to assess the growth of any child regardless their race, socio-economic level or nutrition so long as optimum environmental conditions are met [8].
In Spain, no home-grown, widely diffused standards were developed, until the Bilbao study that was carried out from 1982 to 1988 [9]. This study used a mixed, semi-longitudinal design with three groups of 600 children who were followed for 9 years in the population of greater Bilbao with middle to low socio-economic level. Data were published for subjects from 0 to 18 years. The same group published the end of its longitudinal study in 2004 (children born between 1978 and 1980) and the transversal study carried out between 2000 and 2001, with a sample of 6443 subjects between 0 and 18 years old [10].
In the Community of Murcia, another transversal study of the child population aged between 4 and 17 (1,930 children) was undertaken [11] while in Madrid a study was carried out with 1,525 schoolchildren between the ages of 6 and 18 [12].
During the same period (2005), an important longitudinal study was published in Aragon on a final sample (of adult size) of 226 subjects [13]. This study is of great importance because it considers diverse anthropometric measures and intellectual development, with the possibility of calculating growth rate in all ages, including puberty, which enables the assessment of subjects with different pubescent "tempos". To date, no transversal study is available for this population.
In 2004, the Barcelona transversal study was published [14], which included newborns, infants, children and adolescents, as well as young adults for the assessment of adult size. The sample of 3-to 18-year olds is made up of 5,257 children and adolescents measured in 2002 and 2003. The study concludes with the demonstration of a secular growth of 3.5 cm in respect to a previous Catalan study carried out in 1985 [15] and the recommendation of periodically updating auxological data.
Weight and body mass index (BMI), weight quotient (kg)/ height 2 (m 2 ), are two widely used parameters for the assessment of nutritional state since they are related to the total amount of body fat. Obesity is becoming a public health problem in our country and statistics warn us that in Andalusia this is going to be a challenge for health professionals dedicated to children's healthcare [16,17]. The EnKid [18] study on the prevalence of obesity in Spain among 2-to 24-year olds has shown a prevalence of obesity in our country of 13.9% and 26.3% if we also consider those who are overweight. Moreover, in this study, Andalusia and the Canaries are the communities with higher prevalence (29.4% ponderal excess in the case of Andalusia). This study was carried out between 1998 and 2000, with a sample of 3,534 subjects.
Data connecting childhood and adolescent obesity with metabolism disorders in adulthood make it necessary to set up studies to find out about the reality of the situation in Andalusia. Such knowledge should be the starting point for the development of intervention programmes to control this emerging problem.
In Andalusia, there was no representative study of the child and adolescent population. Given the scientific community's recommendation of using local standards to assess growth and weight increase in our population, it was decided to initiate a study with the objective of obtaining height and weight tables that can be used as a reference for our population. This would be the first study of growth in our community that would create height and weight tables representative and usable in daily clinical practice. It would also enable us to find out the final adult height of the Andalusian population.
The objective of the work was to describe the growth in height and weight of the child and youth population of Andalusia from the age of 3 to 18, including both sexes, and to create reference standards for weight, height and BMI for that part of the Andalusian population that would be useful in daily clinical practice and to discover the final adult height of the Andalusian population.

Subjects and methodology
Population studied and sample selected Target population and considerations about the sample Once the objectives were clear, it was necessary to make decisions about the sample that would be the object of our study, bearing in mind that a transversal study had been chosen.
In order to achieve the greatest possible representation of the Andalusian population from the age of 3 until they reach final height two samples of individuals were obtained using different sampling systems to assess their weight and height.
Given that the aim of the first sample was to describe the growth in height and weight of the Andalusian child and youth population from the age of 3 to 18 including both sexes and create standards of reference for weight, height and BMI for the Andalusian population within this age range, this first sample needed to represent the whole Andalusian population from the age of 3 to 18. The ideal method, from a methodological viewpoint, for taking a sample of that population would be through simple random sampling taken from the population census; however, this would not be viable, economically speaking. We could obtain a less expensive sample by going to places that are attended by children of these ages, to weigh and measure them.
These places are, of course, education centres, where the population between the ages of 3 and 16 is obliged to attend. Moreover, apart from the age range, if the schooling rate is high enough, a fairly reasonable degree of representation could be achieved.
Data from the Education, Science, and Sports Ministry, show that in 2002-2003 the schooling rate in Andalusia was 96.1% for High school and Professional Education (16-17 years of age) and 17.4% for higher grades of Professional Education (18-19 years of age). Thus, up to the age of 17, the schooling rate in Andalusia is high enough to avoid significant skewness when it comes to taking samples of children and young people from education centres. At the age of 18, the number of students in nonuniversity education is very low and makes it very difficult to sample this age group using only non-university education institutions.
On the other hand, the data from the 2001 census by the National Statistics Institute (Instituto Nacional de Estadística) about rates of schooling (at any level, including university) indicate that, at that time, the schooling rate for 16-year olds was 81.9%, for 17-year olds 71.8% and for 18-year olds 62.2%. Therefore, population aged 16 and 17 would be sufficiently covered by going to education centres, while for 18-year olds it would be necessary to obtain data from other sources in order to cover such deficiencies.
Despite everything, the problem was not serious given that the age where we might encounter problems is one where the variance is very large. This variance is present in all population strata so that the results obtained would not be far from reality in any case.
It was decided to take a random sample per school and within such sample select classes from each year, from which a sample of pupils in the age range being considered would be taken. This sample (despite, as explained later, the risk being reduced to a minimum) could introduce skewness in the assessment due to the resemblance of the pupils in each class. This skewness could artificially reduce the variance of the parameter studied at a certain age, thus strongly affecting the assessment of extreme percentiles. It will be corrected by using adjustments with random effects models to take this fact into account.
For the second sample, concerning the young population between the ages of 18 and 23, it was not viable to use the same sampling method. For this group, 6 non-university education centres were used, where different professional modules were taught and where data were collected from classes in a similar way to the sample of under 18-year olds. In addition, to increase the size of the sample, different university departments in Granada were selected: The School of Health Sciences, The Medicine Faculty, The Civil Engineering School, The Pharmacy Faculty, The Science Faculty and the Computer Engineering School. In all these centres, students in their first, second and third years were measured.
Given the economic determinants, although the statistical quality of this second sample was worse due to the fact that it was not random, this type of sampling was chosen assuming that the strong variance of the different measures would assure a representation of extreme values that would not be reduced even though the sample was not random. On the other hand, if, as was expected, from the age of 18 or 19 the behaviour of the measurements was asyntotic, ages could be put in one group where extreme measures are more probable thus correcting a possible skewness, stemming from underestimating the population's variance.

Types of education centres and sample size
According to data from the Andalusian Board of Education for the academic year 2003-2004, education centres were classified as Private Teaching Centres, Infant and Primary Schools and Institutes for Secondary Education. Moreover, centres are not evenly distributed across the region, but rather they are placed according to the size of the population in a given area. Therefore, there were two already identified criteria to stratify: the type of centre and the size of the population. Table 1 shows data about the types of centres, the size of the population and the number of pupils.
The global sample was distributed into 12 strata, in principle, in proportion to the number of pupils in each stratum. Although knowing the distribution of the population in the centres and the pupils in each stratum, it is always possible to calculate the probability of one unit being selected for sampling, and correct this using inverse probability weighting, in case an imbalance in the sample by strata is considered necessary. As explained below, this was the strategy used for sampling.
Considering our objective, an additional source for stratification would be each of the age groups from 3 to 18, which would give us 16 strata. In fact, for the sake of accuracy, we would consider not 16 strata, but 32, given that we would choose children of an age calculated in years and "half years"; this way strata would reflect the following ages: 3, 3.5, 4, 4.5, 5, 5.5, ..... 17, 17.5, 18, 18.5. Besides, another obvious source of stratification would be the child's sex, due to the fact that the variables measured are different from a very early age in boys and girls.
In terms of the calculation of the sample size we should comment on the normality of the variables involved in the study. For each age, height can be considered to follow a normal distribution, but this is not the case for weight or BMI, calculated based on both of these. Despite this, many authors have managed to normalize these variables using transformations; this way, we consider that if the children's weight for each age group is not a normal variable, once transformed it would become normal, which means that we would not have to re-calculate using nonparametric methods for the size of the sample.
Having discussed the prior considerations, we can concentrate on the calculation of the sample size necessary for the study. Following the methodology set out by Linnet [19] for the calculation of sample size, the key would be the ratio between the width of reference range and the width of the confidence interval for the extremes of the said reference range. Accordingly, for a reasonable quotient between both widths, such as 20%, we would need 126 children per each age group and sex.
However, these calculations do not take into account that the percentiles in the height and weight curves for age are obtained from the ratio of such variables with age using a regression model. Royston [20] addresses this problem, and using his equations the required sample size is reduced due to the variance reduction that occurs when using regression. In figures, such reduction means that if we want a ratio of 0.20 for each group of age and sex, we would only need 74 children. Based on this fact and bearing in mind that it would be applicable to simple random sampling, the size of our global sample would be at least 2,368 males and 2,368 females.
We had opted for a multistage sample. The first stage included a random selection of the education centre, while the second stage included a random selection of the children within the class of their year in the education centre. This leads to a cluster sampling that requires an increase in the size of the sample. Such increase, which is called the design effect, depends on the size of the sample in each centre as well as the degree of resemblance between children in each centre (intraclass correlation coefficient). The larger the effect of the intraclass coefficient, and the larger the sample group in each centre, the larger the design effect. This design effect shows us a coefficient by which we should multiply the size of the sample calculated as a simple random sample so that when we obtain it in clusters we have the same capacity as a simple random sample.
We would choose our sample for each age group (3, 3.5, 4, 4.5 years...) so that in fact our sample would not have to be too big; it would normally be about 4 or 5 pupils in each age group per school. So, supposing we chose 4 children of each age from all the children of the school and that the intraclass correlation coefficient was 5%, the value of the design effect would be 1.15. Therefore, our total sample would be approximately 1.15 × 4608 ≈ 5300 pupils of whom half would be female and half male. This means around 166 pupils per age group of whom half would be female and half male.
The choice of 0.05 as the intra-class correlation coefficient was made on the assumption that although the children in class would resemble each other, variance increases with age. Therefore, in classes of older children variance between observations is guaranteed, while in the case of younger children an important variance is also guaranteed due to the fact that the age range within a class, although small, is strong enough to mean that the intra-class correlation is not very large.
In the case of the sample of students between the ages of 18 and 23, the size of the sample calculated following the same steps included 2000 people.

Sample distribution in education centres
Taking into account the sizes of the samples obtained, the global sample would be distributed in the 12 strata, in principle, in proportion to the number of pupils in each stratum, as we knew the distribution of centres by strata and their mean size. This allocation can be seen in the column "Proportional sample" in Table 2.
As discussed earlier, knowing the distribution of the population in the centres and that of pupils in each stratum, it is always possible to calculate the probability of a unit being selected for sampling and to correct using inverse probability weighting, if an imbalance in the sample strata is considered necessary.
We made use of this methodological resource, with the exact intention of under-representing samples from centres with a smaller population, since otherwise it would be more costly in resources and time to reach those centres, which would then lead to a different weighting of results to take this additional imbalance into account. Bearing this in mind, the distribution shown in the column "Definitive Sample" in Table 2 was obtained, this distribution was used for the study.
The final procedure for sampling that was used as an alternative to simple random sampling based on census, requires two precautionary measures in the analysis phase. In the first place, as has already been mentioned, it is necessary to carry out an inverse probability weighting of the selection of each of the samples in each stratum; in the second place, a random effects model should be used, unless it is proven to be unnecessary, to adjust the sampling by schools, and within them by class and within them by pupils. Such precautions were taken into account as explained in the results section.

Process of sample extracting
A letter was sent to the directors of all centres selected to ask them to transmit a request for permission to carry out the study to the School Board. Anonymity of the data of both individuals and centres was guaranteed.
Three weeks later, the centres were contacted again to set a date and to inform parents in case anyone refused to participate.
Data were collected by two scholarship holders who were specifically trained. On their arrival at each centre, using the lists that included birth dates and had been previously provided by the school, they determined the group of eligible pupils. Eligible pupils were those whose age was within the years or half years corresponding to their class ± 3 months. Of these pupils, four were selected in each class and one more as a possible substitute. If more were missing another child was chosen, but in no case more that two substitutes were required. Pupils with a non-Spanish parent were not eligible nor those with chronic illness or who were receiving a treatment known to affect their growth. For the purposes of the selection, a list of random numbers was used for each centre and for each day, prepared in relation to the different sizes of the eligible subpopulation.
In terms of data collection, incomplete weeks or weeks with exams were eliminated to avoid interrupting schoolwork or causing skewness due to absenteeism.
To obtain the second sample, for the group of 18-to 23year olds, during the sampling in non-university education centres the sample was obtained in a similar manner to that of the sample of 3-to 18-year olds. In the case of university departments, after having received permission from the corresponding dean, we were able to use some time from the last main subject of the day. The information given to students was the same as in the other cases. In these situations, the volume of students who refused to participate was high, in many cases over 40%, having identified these data visually by subtracting those who were originally in class and those who were measured in the end. Data were collected by the same observers and using the same methodology and instruments as in the other samples. The subjects included were healthy, not suffering from chronic illness or undergoing continuous medical treatment that might affect their growth, they were born in Andalusia, were Caucasian and of Spanish origin.
In any case, this "extended" sample of 18-to 23-year olds cannot be considered representative of the Andalusian population and, in the best of cases, it is obvious when it does not differ significantly from the part of the sample correctly selected that overlaps with it and we can conjecture from this, if it proves to be the case, that differences are not great.

Personnel
Data collection was carried out by two scholarship holders who received special training. They were trained in techniques for weighing and measuring the sample being studied. Moreover, in all cases, measurements were taken by the same person.
Error in measurement was determined based on the measurements of 5 children repeated three times, as the standard error of the mean measurement of each child, giving a value of 0.2 cm.
Data were collected from February to December 2004 for the sample of up to the age of 18. For the broader sample, data were collected from January to May 2005, except February when there were no classes.

Parameters studied and technical equipment used
A portable electronic Seca weighing machine, with an accuracy of 100 gr and automatic reset to 0, was used for weighing.
A portable Holtain stadiometer (height rod), with an accuracy of 0.1 cm was used for measuring height. Before each measuring session it was adjusted with an unbendable 65 cm rod.
The Body Mass Index was calculated using the following formula:

Measuring techniques
The pupils selected were asked to confirm that they gave their permission to be weighed and measured, and they were informed that if they preferred, measurement and weighing could be done without the presence of other pupils, that they did not need to undress, their data would not be read out and they would remain anonymous. Information was not provided to the next class unless all pupils for the previous class had been measured.
They were asked to take off their shoes and coats, if they were wearing one. A register of what each pupil was wearing was made, and this was grouped into three categories: light, medium and heavy. These categories were identified size by size in a department store and weighed so that the mean weight of the clothes for each age could be subtracted. A weight of between 300 gr and 900 gr was subtracted for each pupil. This methodological approach has been used in similar studies, in order to reduce the refusal rate. Particularly problematic is the group of older students, where a high refusal rate is expected. In this case the approach was used to reduce refusal due to the need for unclothing in the presence of peers.
With regards to height, using the portable Holtain stadiometer the children stood without shoes so that their heels, glutei and scapulae were in contact with the vertical plane and their heads were leaning against the so-called Frankfurt plane. With their ankles together, their inner malleola touching, and the soles of their feet firmly placed on the hard horizontal plane, the observer gently pushed their mastoid bones upwards. In this position, they were asked to breathe in deeply and the observer measured their height pushing down the mobile top to minimize to the maximum any error due to hair thickness.
A sheet was prepared to write down data on paper format, which was later recorded on electronic format, on a spreadsheet, within a maximum of 48-72 hours. Data were periodically revised to detect and correct any recording errors.

Statistical methods for adjusting Reference Curves
We used Cole's LMS method [21], which models the relationship between percentiles and age using a regression technique and assumes a normal distribution of the transformed variable.
This is a widely used method, especially in Europe, for the creation of reference tables depending on age; moreover, computer software in standard statistical packages is also available.
The method assumes that, in each age group, the anthropometric data can be adjusted to a normal distribution after having been adequately transformed, taking into account the degree of asymmetry (L), central tendency (M) and dispersion (S).

BMI Weight Kg
Height m Using the original data, for each moment of time (t) the following quantities are obtained: • L(t) value of the parameter λ of the transformation of Box-Cox to obtain the normality of the variable.
• M(t) median of the original data in the t instant.
• S(t), coefficient of variation of the original data in the t instant.
Obtained for the different t values, within the time frame considered, they are adjusted, using a penalized likelihood, a method that connects them with age.
Using the formula that appears below (and that is simply the result of undoing the change applied in order to normalize the variable) the α percentile is calculated for the t instant, which is given by this expression: where z α is the value of the function of an N (0.1) distribution that leaves to its left a α probability.
What Cole's method does is to model the skewness and the kurtosis of the variable (through the transformation that has to be made to convert the original variable into a Normal one), the central position of the variable (through the median) and the variance, and also the kurtosis in an indirect way (through the coefficient of variation of data in the t instant). Once these coefficients have been modelled in relation to time, thus obtaining the desired percentiles.
An assessment using the method of penalized likelihood requires specific statistical software. The STATA 8.1 and Splus S 6.0 packages were used.

Measuring the "goodness of fit"
The quality of the fit was assessed with tests proposed by Royston [22] that evaluate whether the model residuals follow a normal distribution based on their average, symmetry and kurtosis. Royston  The tests applied in the assessment of the goodness of fit were as follows: • Q 1 test: If the model adjusts well, the sum of the squares of the mean model residuals in each group, weighed by the size of the group, follows an χ 2 distribution with G-1 g.l.
• Q 2 test: If the model adjusts well, the sum of a variances function in each model residuals group, follows an χ 2 distribution with G-1 g.l.
• Q 3 test: If the model adjusts well, the sum of the squares of the experimental quantities from D'Agostino's Normality test for skewness, in each model residuals group, follows an χ 2 distribution with G g.l.
• Q 4 test: If the model adjusts well, the sum of a function of the significance levels, P values, Shapiro-Wilks' Normality test, combined tests for skewness and Kurtosis, in each model residuals group follows an χ 2 distribution with 2 G g.l.
Each test responds to the imbalance in normality that the model can create. The first test concerns the differences between the residuals and the median of the distribution they should present, meaning, the N (0.1); the second test is related to a model residuals variance which is either too high or too low, that would suggest very or little sharp and therefore abnormal distributions;; the third test characterizes the skewness of the distributions of the model residuals that would indicate a way of non-normality of such distributions and therefore an imbalance; lastly, the fourth test refers at the same time to the skewness and kurtosis, which are two characteristics that mark non-normality. The latter could be significant for this study more easily, given that Cole's method does not directly model the kurtosis of the base distribution.
Royston suggests declaring the imbalance of the model when any of the tests proves to be significant at 5% error, a situation that rejects completely the adjusted model, given the great number of tests to be performed this is easy to happen simply thanks to the accumulation of errors in each one of them.
In our case, these tests were used to determine the number of edf necessary for each model's fit (Cole's LMS method), and we kept those models which achieved tests with more than 10% significance. When significant results were obtained in some tests, a study of the original observations was performed in case there were extreme data possibly contaminating the model to a great extent. The type of analysis applied is shown in the next section.

Method for detecting extreme and/or influential data
In the case in hand, the detection of extreme data should be carefully assessed. With the reference curves depending on time, our objective is to determine extreme values (high or low), thus eliminating extreme data from the sample could affect the curves in an obvious way, "pruning" the distribution of values that are valuable when it comes to determining percentiles. Nevertheless, in this kind of projects, it is not possible to leave out an analysis of extreme data given that this could entail an important risk of working with data which, almost certainly, do not belong to the target population.
There is a classic rule for detecting extreme data, which involves labelling a piece of data x as extreme in the following situations (Q(25) being the 25th percentile of the sample, Q(75) the 75th percentile and IQR the interquartile range): x < Q(25) -1,5 × IQR x > Q(75) + 1,5 × IQR We used, however, a modification to this rule, which makes it much more conservative: x < Q(25) -4 × IQR x > Q(75) + 4 × IQR With this modification we only detected data that were unusually extreme and in this way we do not risk "pruning" the distribution of data that could be influencing the tables.
Data are considered influential if, when eliminated from the sample, they generate a significant change in the model that is being adjusted, giving rise to a profoundly different one. Traditionally, influential data have been identified with extreme or very extreme data, although this is not always the case. A conservative, but reasonable, option would be to look for influential data only among the very extreme data in such a way that will not prejudice areas of distribution that are closest to the centre.
Assuming the previous considerations, the process for the detection and, when necessary, the elimination of data was as follows: 1) Extreme values were labelled for each age, using the rule explained above.
2) Taking labelled data as extreme into consideration, the model was adjusted and the tests were evaluated to determine whether they were significant or not. If this was not the case, it was acknowledged that the most extreme data, up to that moment, were not influential and the process was stopped, conserving all sample data. When the test was significant, the most extreme data were eliminated and the process was repeated until the adjustment quality control tests were not proven to be significant. The iterant process was always carried out on the most extreme data used.
We insist that this is a very conservative process, because this is required by the type of study we are carrying out and, as the results will show, it has led us to reject a very small number of observations.

Analysis of the model's sensibility in terms of the lack of data for absent subjects
The elaboration of tables for height and weight assumes that the individuals included in the study do not suffer from chronic illness that could significantly affect their growth and weight gain. A screening, was performed to avoid including these boys and girls, although probably it was not very rigorous. This is taken into account in the final assessment and it is evaluated, as appropriate.
Those pupils who were absent on the day data were collected were substituted by others. As the number of pupils taken by class (between 4 and 6, in the most extreme case) was small, the volume of data substituted was not very large so the assessment of such values is not very efficient and it is not considered very relevant.
On the other hand, if absence from class was due to illness, this could be a factor generating a strong skewness on the results. A 2003 study provided by the provincial Education Delegations placed the mean daily rate of absenteeism in state schools in Andalusia at 12.8%. It could be argued that a big part of these absences were due to illness and that among the ill children it is more likely that there were some with chronic illness that might affect their growth. In this case, the sample would give a skewness in the tables towards values that are too "high".
An analysis was carried out of the sensibility of the final models using the least favourable scenario. Based on a the daily absenteeism level of 12.8%, it was assumed that 10% were, in relation to the variable of interest, below the median. Calculations were made based on this theoretical situation and the final results were compared with those obtained in reality. In this scenario, if curves did not change sufficiently, this meant that they were not affected by the skewness generated by non-measurements due to illness.
All these considerations are valid for the sample of 3-to 18-year olds. The other sample is subject to higher rates of absence of measurements and diverse possible skewness. Such skewness could strongly affect two characteristics of the distribution of variables: the width of values obtained and a skewness towards higher values in some variables and lower values in others. In the first case, if part of the extreme values had been eliminated we would have obtained tables that would have been too "narrow", which could be compared to the tables that resemble those of our population. In the second case, we would have skewness that would balance out but would be difficult to demonstrate or measure. For this reason, the same criterion of 10% illness, which is below the median, was applied to this sample. In the case of variables where the skewness acted inversely (weight and therefore BMI), such skewness has not been considered so the estimated figures would be, in that case, underestimated.

Results
The total number of cases collected was 9,065; 50.07% were female (4,539 cases) and 49.93% were male (4,526 cases). Considering age, the group of up to 18 years of age made up 79.39% of the cases (7,197).
Of the 4,539 females included, 79.14% (3,592 cases) belonged to the group of up to 18-year olds, while of the 4,526 males, 79.65% (3,605 cases) were in the same age group.
The size of the sample of 3-to 18-year olds is larger than the one initially calculated, which could create a beneficial effect on the efficiency of the estimators. This increase in the sample is due to three reasons. In the first place, 5 male and 5 female pupils were measured in a significant number of classes instead of the 4 originally planned. Secondly, the number of units was higher than the mean estimated by the Education Board; and lastly, the total number of pupils used to make the preliminary estimations was from the previous year, while in several schools the number of children was higher than expected.
In relation to the sample of 3 to 18 year olds, none of the schools refused to participate and neither did any of the parents of the pupils. In this group, 3 women and 2 men, that is 0.07%, refused to participate. In 18 cases, children had some kind of prosthesis (plaster cast, corset) that did not affect there height; their weight was taken without any type of correction. Only 3 subjects from the sample (0.04%) were receiving endocrinological treatment and they were not eliminated from the sample.
In the sample of 3-to 18-year olds, 327 pupils were substituted because they were not present when the sample was taken from their class; they represent 4.5% of this part of the sample. This figure is so low that it does not seem useful for estimating absence, and this could be due to several reasons. On the one hand, the weeks chosen did not contain events that might reduce pupils' attendance (local holidays, bank holidays, etc...), while Mondays, which is the day with the highest rate of absenteeism, was also excluded because it was the day the personnel chose to travel to each centre. On the other hand, since teachers were given a list of the selected pupils beforehand, it is possible that the most 'work-inclined' ones may have caused an indirect skewness by not "remembering" that they had substituted pupils who were not present. This last skewness was not detected until the study was well underway so it could not be corrected and, although it is probably low, it forced us to carry out an analysis of the robustness of the results against this skewness, as has already been mentioned.
The finally selected sample of the population aged between 18 and 23 (1,868 cases) is slightly smaller than that initially planned. Nevertheless, the difference of 132 people is small and should not have generated skewness on the results beyond the one that had already affected this part of the sample and have already been mentioned in the methodology section.
In the sample of over 18-year olds, the sampling technique does not allow such an accurate analysis of the general results. Based on direct estimates, in some cases it is known that the non-response rate might have reached up to 40% which makes us suspect of some significant skewness; in any case, as was stated in the statistical method section, sensibility studies will be performed applying the tables to these facts.
The distribution of the sample of men and women by ages, grouped by years and half years is shown in Table 3.

Height
Sample data A summary of the height data in the samples of men and women is shown in Table 4.

Extreme values
Following the method already explained in the methodology section, the values considered extreme were catego-rized. In the women's group, 1 three cases were corrected because they contained automation errors that had not been detected during the first phase of corrections. Another two were kept because they were not proven significant during the quality adjustment tests.
On the contrary, the two cases detected as extreme in the men's group were eliminated because, when they were included, they were proven significant during the quality adjustment tests.

Model adjustment
The adjustment of the model can be considered adequate since none of the tests used (Q1, Q2, Q3, and Q4) yielded significant results. Adjustment was analysed using a random effects model (the classroom was the grouping unit for the 4-6 pupils chosen in it); following the performance of the likelihood ratio test to compare with the fixed

Sensitivity analysis
As explained in the methods section, a sensitivity analysis was performed on the final model using the most unfavourable scenario. To this end, it was assumed that 10% of the individuals measured could not be measured, and that all of them were below the median. Then, when we compared this hypothetical situation to the real model, we found that for estimating the median, in the worst case (around the age of ten) the difference in the tables would be 8 mm in women and 9 mm in men, an amount that seems small. For the most important situation, from a clinical point of view, which would be that of the percentile 3, the maximum difference was half a centimetre, assuming that 10% of the children were not measured and that all of them were below median measured.  Table 5 summarizes data for weight in men and women. In women, weight stabilizes by the age of 18. To confirm this fact, we compared the mean weight in age groups of half years beyond this age (F exp = 1.36 (10;, 1049) g.l. p = 0.1952). Performing the same comparisons in the men's sample, the mean weight had not been stabilized beyond the age of 18 (F exp = 2.15 (10; 1054) g.l. p = 0.0188), but it had done so in age groups above 20 (F exp = 1.58 (6; 655) g.l. p = 0.1514). For this purpose, in the sample of men and women, the values beyond the age of 20 were grouped together so as, to refer to the weight of Andalusian adults of each gender.

Extreme values
All values, from both samples that were characterized as extreme (5 in women and 6 in men) were eliminated since the goodness of fit test was proven significant in case they were included, and they had a considerable effect on the kurtosis of the distribution.

Model adjustment
The adjustment is adequate because none of the tests assessing its goodness of fit (Q1, Q2, Q3, and Q4) yielded a significant result. The adjustment was analysed using a random effects model (the classroom was the grouping unit for the 4-6 pupils selected in it); following the performance of the likelihood ratio test to compare with the fixed effects model, the values obtained were χ 2 exp = 1.08 (1 g.l.) p = 0.2987 for women and χ 2 exp = 1.49 (1 g.l.) p = 0.2222 for men. The use of the fixed effects model adjustment, produced the height table for men and women, in decimals of years from the age of three to 19 concentrating the values beyond this in 20 (Additional data file 1: Tables  C and D). Figures 3 and 4 show percentiles 3, 5, 10, 25, 50, 75, 90, 95 and 97 for weight, in both women and men.

Sensitivity analysis
In the same way as with height, a sensitivity analysis of the final models was performed, using the most unfavourable scenario. To this end, it was assumed that 10% of the individuals weighed could not be weighed, and that all of them were below the, median. Then, when we compared this hypothetical situation with the real model, we found that for the estimation both of the median and of the percentile 3, differences in women were between 0.1 and 0.8 kg (around the age of 14). In the case of men, differences in the estimation of the median ranged between 0.1 and 1 kg, while in the case of the percentile 3 which has the

Body Mass Index (BMI) Sample data
The BMI data in the men and women's samples are shown in Table 6. For this reason, comparisons were performed of the BMI mean values for ages over 18 grouped by half years, thus confirming that it does not change significantly as of that age either in men or women (women: F exp = 1.25 (10; 1049) g.l. p = 0.2557 and men: F exp = 1.30 (10; 1034) g.l. p00.2264). In the case of BMI, the values for the ages over 20s were also grouped together in the sample of men and women, so as to refer to the BMI of adult Andalusians in each gender.

Extreme values
Both in the case of men and women, all values characterised as extreme were eliminated since the tests showed that they significantly affected the quality of adjustment. 13 cases were eliminated in the women's sample (0.29% of the sample) and 8 in the men's group (0.18% of the sample). With regards to the BMI, the evolution of the percentiles below the median is very different from the evolution of the percentiles that are above such median for both genders, especially that of the most extreme percentiles. In the women's model, percentiles 85, 95 and 97 increase significantly until the age of 14, when they start to decrease. So Andalusian girls from the age of 8 or 9 have a higher probability of being overweight or obese and this probability reaches its maximum at the age of 14, when it starts to diminish. In the case of men, percentiles 95 and 97 show a similar evolution, although they reach their peak around the age of 16.

Sensitivity analysis
In the sensitivity analysis of the final BMI model, it was also assumed that 10% of the individuals measured were not actually measured, and that the BMI values of all of them were below the median. This way, when we compared this hypothetical situation with the real model, we found that for the estimation of the median and of the percentile 85, both in the sample of men and women, differences ranged between 0.1 and 0.3. These differences are noticeably lower in the extreme percentiles, both above and below the median.

Discussion
This work presents the results of the first population study carried out in Andalusia to discover the weight, height and BMI values of individuals between the ages of 3 and 23.  We have carried out a cross sectional study because this allows to obtain applicable results in a short period of time; in the second place, the cost was significantly lower than that of a longitudinal study. Moreover, if the data obtained were not different to that of the longitudinal study performed in Aragon, which proved to be the case, the disadvantages of a transversal study could be compensated for by using the data on growth rate and puberty provided in that study.
Our study is the broadest of those developed in the Andalusian population and we have worked with a random sample, selected via a multistage sampling, representative of the young population of Andalusia with a total sample of more than 9,000 subjects.
We should emphasize that, despite the sample size, all measurements were taken, using precise instruments, by the same, properly trained examiner. This contributes to increasing the internal validity of our study. Moreover, with the aim of avoiding repetitions and other errors, the data were registered in duplicate on a database which was revised centrally at intervals of less than a week. It is worth mentioning the scarcity of non-participation.
In the first place, the fact that no school (public or private) refused to take part is a relevant fact and denotes the level of collaboration of schools in terms of health issues. In the second place, the fact that no parents refused to allow their children to participate could be because they do not perceive the collection of data as offensive or harmful for their children; or this could mean that schools failed to inform them about the process. Nevertheless, no centre has contacted us since to communicate any complaints from parents. We think that the children's almost anecdotal refusal to be measured is the result of keeping them continuously informed about the process and the care with which data were collected. Privacy and anonymity during measurements were especially important, because although it was more time-consuming, it increased pupils' confidence. A small group of pupils (0.6%), most of them female, had some objections about their being measured, but in the end they agreed to participate. In the majority of these cases the origin of the problem was that they were overweight. This suggests that we avoided a skewness that could have produced if we had not been able to collect such data. Moreover, we are convinced that if these subjects had refused, they would have also encouraged others to do the same, in case of a lack of ideal conditions of privacy and anonymity. We believe that an immediate consequence of such a high level of collaboration on the part of schools, parents and pupils shows that if projects are well designed with adequate precaution, education centres can be places for obtaining samples of sufficient quality to substitute simple random samples based on census, thus saving resources.
It was necessary to substitute 327 of the selected pupils because they were absent from school when data were collected. This figure represents 3.7% of the pupils in the final sample, which is a low number, so it is not useful as an estimation of absenteeism on sampling days. There are several facts that contribute to such a low level of absence. As mentioned earlier, data were collected in weeks without foreseeable events (there were no bank holidays, local holidays or general exams, etc) and, in general, data were not collected on Mondays (the day of the week with the highest rate of school absenteeism) because this was the day used for travelling to the centres. Another reason is actual dynamics of the data collection process whereby a list of the pupils selected for the sample was given to the teacher in advance so they could inform them. This way, we discovered that the most work-inclined teachers could be causing an indirect skewness on the sample by not "remembering" that they had substituted pupils who were not present. This last skewness was not detected until the study was well underway so it could not be corrected and, although it could not have been great, it led us to carry out an analysis of the "robustness" of the results to this skewness, as mentioned in the statistical method section.
In summary, with regards to the sample of 3 to 18 year olds, we had a larger sample than anticipated in relation to the young Andalusian population. The non-participation rate among the subjects selected was very low as was the level of non-participation due to absence, although it The 3, 5, 50, 85, 95 and 97 percentile curves for men's BMI Figure 6 The 3, 5, 50, 85, 95 and 97 percentile curves for men's BMI. was not possible to calculate it with precision, we have been able to check that the results are sufficiently robust in the face of this potential skewness.
In the sample of the young population between the ages of 18 and 23, despite the lower quality of the statistical parameters estimation in the population of origin, it was assumed that the great variance in the parameters measured in this group of population would ensure a representation of extreme values, even though the sample was not random. Moreover, this type of sampling is not unusual in studies similar to this one; Van Buuren [23] explains a similar way of broadening the sample size beyond the age of 17 in these cases, although with an extremely high rate of non-participation.
In the sensitivity analysis of the final model, we were able to confirm that the tables obtained are fairly robust against significant fluctuations and skewness. So much so that even assuming that 10% of the subjects had not been measured and that all of them were below the median, the percentile 3 would not have differed by more than half a centimetre. Differences of similar importance could have also occurred in the rest of the parameters analysed, as was shown in the results section.
When we compare our results with those of other authors of contemporary Spanish projects, it is observed that in terms of height there is very little difference in the percentile 50. Mean height in these studies (Bilbao, Enkid, Zaragoza and Barcelona) for men ranges from 176.3 to 177.7 cm, with our study being 176.7 cm. Women's height ranges from 162.1 to 165 cm while that of our study was 163.7 cm. Nevertheless, during growth in intermediary age groups there are important differences which are probably related to growth during puberty ('Proyecto Crece' ['Grow Project'], unpublished data). In terms of weight and BMI, the distribution is distanced from normality with a curve deviation towards the high percentiles above the median. This situation is also reflected in the Spanish studies given the increasing prevalence of weight problems in our population. These data, although reflecting the situation of the population, cannot be considered as a reference for our children because they represent an overweight population. To this end, we should use as a reference the patterns that take into account overweight or obesity cut-offs in adults (25 kg/m 2 and 30 kg/m 2 ). Cole's study [24] was performed using samples from six countries and almost 100,000 subjects of each sex aged between 0 and 25. The objective was to establish the cutoffs for each age, based on the percentiles where the values of 25 kg/m 2 (for overweight) and 30 kg/m 2 (for obesity) were situated at the age of 18. Other studies have compared the use of cut-offs with percentiles 85 and 97 for each ages group [25], without revealing any difference in prevalence for overweight, unlike the case of obesity calculation. When we used the cut-off values established by Cole as a diagnostic criterion, the women in our study increased the ponderal excess (overweight plus obesity) from the age of 4 (25.1%) to 9 (40%) falling to 15.7% at the age of 18. The maximum overweight was 27.4% for 10,5-year olds, while that of obesity was 14.5% for 7-year olds. At 18, the corresponding indexes were 13.4% and 2.3% respectively.
Comparing to Cole's cut off values as well, the males in our study increase their ponderal excess (overweight plus obesity) from the age of 4 (19.3%) to 9 (38.8%) falling to 28.4% at the age of 18. The maximum level of overweight was 26.8% at the age of 12, while that of obesity 14% at the age of 8. At 18, the corresponding indexes were 22.6% and 5.9% respectively.
Our data coincide with a study on the prevalence of obesity in Spain carried out by the Spanish Society for the Study of Obesity (Sociedad Española para el Estudio de la Obesidad/SEEDO), which analysed several studies all over Spain, and quantified the ponderal excess in men at 29.5% and women at 19.1% [26]. The difference in sexes appears in all the studies, with women showing lower ponderal excess than men, probably due to their greater concern for their physical appearance at these ages.
The differences found as compared to other Spanish studies are mostly methodological. This is the broadest and most extended transversal study, based completely on a random sampling process, except for the adjustments applied to very small populations. Therefore, in our opinion, it is the most representative one in terms of the population studied (about 8 million inhabitants).
Limitations mostly affect the sample of 18 to 23 year olds, which only took place in two cities and only included university population. Taking into account that this method is used by other authors and this is the only really accessible population, we should also consider that there are unpublished data from a study in Cordoba, where the average height of male university students exceeded the non-university population by 1.8 cm and that of women by 0.9 cm; both in a significant way (unpublished data).
After this study, we are still lacking reference standards for the Andalusian infant population under 3. This stage of life is particularly important given the vulnerability of the health of infants and young children. Therefore, an adequate assessment of growth in height and weight are indicators of their health status, and even the socio-economic development of the communities they live in.