Personnel
Data collection was carried out by two scholarship holders who received special training. They were trained in techniques for weighing and measuring the sample being studied. Moreover, in all cases, measurements were taken by the same person.
Error in measurement was determined based on the measurements of 5 children repeated three times, as the standard error of the mean measurement of each child, giving a value of 0.2 cm.
Data were collected from February to December 2004 for the sample of up to the age of 18. For the broader sample, data were collected from January to May 2005, except February when there were no classes.
Parameters studied and technical equipment used
A portable electronic Seca weighing machine, with an accuracy of 100 gr and automatic reset to 0, was used for weighing.
A portable Holtain stadiometer (height rod), with an accuracy of 0.1 cm was used for measuring height. Before each measuring session it was adjusted with an unbendable 65 cm rod.
The Body Mass Index was calculated using the following formula:
Measuring techniques
The pupils selected were asked to confirm that they gave their permission to be weighed and measured, and they were informed that if they preferred, measurement and weighing could be done without the presence of other pupils, that they did not need to undress, their data would not be read out and they would remain anonymous. Information was not provided to the next class unless all pupils for the previous class had been measured.
They were asked to take off their shoes and coats, if they were wearing one. A register of what each pupil was wearing was made, and this was grouped into three categories: light, medium and heavy. These categories were identified size by size in a department store and weighed so that the mean weight of the clothes for each age could be subtracted. A weight of between 300 gr and 900 gr was subtracted for each pupil. This methodological approach has been used in similar studies, in order to reduce the refusal rate. Particularly problematic is the group of older students, where a high refusal rate is expected. In this case the approach was used to reduce refusal due to the need for unclothing in the presence of peers.
With regards to height, using the portable Holtain stadiometer the children stood without shoes so that their heels, glutei and scapulae were in contact with the vertical plane and their heads were leaning against the so-called Frankfurt plane. With their ankles together, their inner malleola touching, and the soles of their feet firmly placed on the hard horizontal plane, the observer gently pushed their mastoid bones upwards. In this position, they were asked to breathe in deeply and the observer measured their height pushing down the mobile top to minimize to the maximum any error due to hair thickness.
A sheet was prepared to write down data on paper format, which was later recorded on electronic format, on a spreadsheet, within a maximum of 48–72 hours. Data were periodically revised to detect and correct any recording errors.
Statistical methods
Statistical methods for adjusting Reference Curves
We used Cole's LMS method [21], which models the relationship between percentiles and age using a regression technique and assumes a normal distribution of the transformed variable.
This is a widely used method, especially in Europe, for the creation of reference tables depending on age; moreover, computer software in standard statistical packages is also available.
The method assumes that, in each age group, the anthropometric data can be adjusted to a normal distribution after having been adequately transformed, taking into account the degree of asymmetry (L), central tendency (M) and dispersion (S).
Using the original data, for each moment of time (t) the following quantities are obtained:
• L(t) value of the parameter λ of the transformation of Box-Cox to obtain the normality of the variable.
• M(t) median of the original data in the t instant.
• S(t), coefficient of variation of the original data in the t instant.
Obtained for the different t values, within the time frame considered, they are adjusted, using a penalized likelihood, a method that connects them with age.
Using the formula that appears below (and that is simply the result of undoing the change applied in order to normalize the variable) the α percentile is calculated for the t instant, which is given by this expression:
C
α
(t) = M(t)(1+L(t)S(t)z
α
)1/L(t)
where z
α
is the value of the function of an N (0.1) distribution that leaves to its left a α probability.
What Cole's method does is to model the skewness and the kurtosis of the variable (through the transformation that has to be made to convert the original variable into a Normal one), the central position of the variable (through the median) and the variance, and also the kurtosis in an indirect way (through the coefficient of variation of data in the t instant). Once these coefficients have been modelled in relation to time, thus obtaining the desired percentiles.
An assessment using the method of penalized likelihood requires specific statistical software. The STATA 8.1 and Splus S 6.0 packages were used.
Measuring the "goodness of fit"
The quality of the fit was assessed with tests proposed by Royston [22] that evaluate whether the model residuals follow a normal distribution based on their average, symmetry and kurtosis. Royston proposes a series of tests called Q-tests that characterize certain properties of the model residuals so that if the test yields a significant result it implies that the residuals do not adjust well to the normal random variable and the model does not adjust well either. They are based on the assumption that model residuals should be distributed according to an N (0.1) regardless time, if the model adjusts well.
The tests applied in the assessment of the goodness of fit were as follows:
• Q1test: If the model adjusts well, the sum of the squares of the mean model residuals in each group, weighed by the size of the group, follows an χ2 distribution with G-1 g.l.
• Q2test: If the model adjusts well, the sum of a variances function in each model residuals group, follows an χ2 distribution with G-1 g.l.
• Q3test: If the model adjusts well, the sum of the squares of the experimental quantities from D'Agostino's Normality test for skewness, in each model residuals group, follows an χ2 distribution with G g.l.
• Q4test: If the model adjusts well, the sum of a function of the significance levels, P values, Shapiro-Wilks' Normality test, combined tests for skewness and Kurtosis, in each model residuals group follows an χ2 distribution with 2 G g.l.
Each test responds to the imbalance in normality that the model can create. The first test concerns the differences between the residuals and the median of the distribution they should present, meaning, the N (0.1); the second test is related to a model residuals variance which is either too high or too low, that would suggest very or little sharp and therefore abnormal distributions;; the third test characterizes the skewness of the distributions of the model residuals that would indicate a way of non-normality of such distributions and therefore an imbalance; lastly, the fourth test refers at the same time to the skewness and kurtosis, which are two characteristics that mark non-normality. The latter could be significant for this study more easily, given that Cole's method does not directly model the kurtosis of the base distribution.
Royston suggests declaring the imbalance of the model when any of the tests proves to be significant at 5% error, a situation that rejects completely the adjusted model, given the great number of tests to be performed this is easy to happen simply thanks to the accumulation of errors in each one of them.
In our case, these tests were used to determine the number of edf necessary for each model's fit (Cole's LMS method), and we kept those models which achieved tests with more than 10% significance. When significant results were obtained in some tests, a study of the original observations was performed in case there were extreme data possibly contaminating the model to a great extent. The type of analysis applied is shown in the next section.
Method for detecting extreme and/or influential data
In the case in hand, the detection of extreme data should be carefully assessed. With the reference curves depending on time, our objective is to determine extreme values (high or low), thus eliminating extreme data from the sample could affect the curves in an obvious way, "pruning" the distribution of values that are valuable when it comes to determining percentiles. Nevertheless, in this kind of projects, it is not possible to leave out an analysis of extreme data given that this could entail an important risk of working with data which, almost certainly, do not belong to the target population.
There is a classic rule for detecting extreme data, which involves labelling a piece of data x as extreme in the following situations (Q(25) being the 25th percentile of the sample, Q(75) the 75th percentile and IQR the interquartile range):
x < Q(25) - 1,5 × IQR
x > Q(75) + 1,5 × IQR
We used, however, a modification to this rule, which makes it much more conservative:
x < Q(25) - 4 × IQR
x > Q(75) + 4 × IQR
With this modification we only detected data that were unusually extreme and in this way we do not risk "pruning" the distribution of data that could be influencing the tables.
Data are considered influential if, when eliminated from the sample, they generate a significant change in the model that is being adjusted, giving rise to a profoundly different one. Traditionally, influential data have been identified with extreme or very extreme data, although this is not always the case. A conservative, but reasonable, option would be to look for influential data only among the very extreme data in such a way that will not prejudice areas of distribution that are closest to the centre.
Assuming the previous considerations, the process for the detection and, when necessary, the elimination of data was as follows:
1) Extreme values were labelled for each age, using the rule explained above.
2) Taking labelled data as extreme into consideration, the model was adjusted and the tests were evaluated to determine whether they were significant or not. If this was not the case, it was acknowledged that the most extreme data, up to that moment, were not influential and the process was stopped, conserving all sample data. When the test was significant, the most extreme data were eliminated and the process was repeated until the adjustment quality control tests were not proven to be significant. The iterant process was always carried out on the most extreme data used.
We insist that this is a very conservative process, because this is required by the type of study we are carrying out and, as the results will show, it has led us to reject a very small number of observations.
Analysis of the model's sensibility in terms of the lack of data for absent subjects
The elaboration of tables for height and weight assumes that the individuals included in the study do not suffer from chronic illness that could significantly affect their growth and weight gain. A screening, was performed to avoid including these boys and girls, although probably it was not very rigorous. This is taken into account in the final assessment and it is evaluated, as appropriate.
Those pupils who were absent on the day data were collected were substituted by others. As the number of pupils taken by class (between 4 and 6, in the most extreme case) was small, the volume of data substituted was not very large so the assessment of such values is not very efficient and it is not considered very relevant.
On the other hand, if absence from class was due to illness, this could be a factor generating a strong skewness on the results. A 2003 study provided by the provincial Education Delegations placed the mean daily rate of absenteeism in state schools in Andalusia at 12.8%. It could be argued that a big part of these absences were due to illness and that among the ill children it is more likely that there were some with chronic illness that might affect their growth. In this case, the sample would give a skewness in the tables towards values that are too "high".
An analysis was carried out of the sensibility of the final models using the least favourable scenario. Based on a the daily absenteeism level of 12.8%, it was assumed that 10% were, in relation to the variable of interest, below the median. Calculations were made based on this theoretical situation and the final results were compared with those obtained in reality. In this scenario, if curves did not change sufficiently, this meant that they were not affected by the skewness generated by non-measurements due to illness.
All these considerations are valid for the sample of 3- to 18-year olds. The other sample is subject to higher rates of absence of measurements and diverse possible skewness. Such skewness could strongly affect two characteristics of the distribution of variables: the width of values obtained and a skewness towards higher values in some variables and lower values in others. In the first case, if part of the extreme values had been eliminated we would have obtained tables that would have been too "narrow", which could be compared to the tables that resemble those of our population. In the second case, we would have skewness that would balance out but would be difficult to demonstrate or measure. For this reason, the same criterion of 10% illness, which is below the median, was applied to this sample. In the case of variables where the skewness acted inversely (weight and therefore BMI), such skewness has not been considered so the estimated figures would be, in that case, underestimated.