What drives regional differences in Body Mass Index? Evidence from Spain

sample (roughly 20%). We examine the results obtained using imputation methods in the robustness checks section.


Introduction
The rapid increase of overweight and obesity around the globe has raised concerns both from a health perspective and from an economic point of view, as it represents a high risk factor for several chronic diseases like CVD, stroke, hypertension, diabetes, dyslipidaemia or some cancers (Malnick and Knobler, 2006). Overweight generates negative effects on labour market performance (e.g., Cawley, 2004;Morris, 2006;Lindeboom et al., 2010;Kinge, 2016) and, directly and indirectly, increases health care expenditure (e.g., Finkelstein et al., 2009;Tremmel et al., 2017).
Spain is one of the countries that have been experiencing high trends in the prevalence of overweight and obesity during the last decades. As a response to these alarming trends, the Spanish Ministry of Health developed a national strategy for the prevention and control of obesity in 2005 called the NAOS strategy. 1 The strategy included informative campaigns, agreements with public and private institutions, voluntary working agreements, educational programs and supporting health promotion initiatives (Ballesteros-Arribas et al., 2007). However, due to its nature as a (mainly) school based initiative, sectors including the built environment, transportation and the food environment received less attention (Franco et. al., 2010). This could explain the fact that according to the OECD (2014) adult obesity rates in Spain were higher than the OECD average, while based on the latest data, roughly 18% of adults are obese and 55% are overweight (including obese) in Spain (INE, 2019), doubling the rates shown at the end of the 90s.
Notwithstanding, strong regional discrepancies in excess body weight exist within the country, i.e., the residents of some regions exhibiting much higher average BMI rates than others (Gutiérrez-Fisac et al., 1999;Valdes et al., 2014;Raftopoulou, 2017).
Geographical disparities in health outcomes have been observed in other countries as well. For example, Ellis and Fry (2010) consider several health indicators, including life expectancy, childhood obesity, cancer deaths, smoking and alcohol consumption to document the existence of a divide between northern and southern regions of the UK, in favour of the latter. This result is also confirmed by Hacking et al. (2011), showing a northern excess in all-cause mortality that remained substantial and persistent over the four decades from 1965 to 2008 in England, affecting relatively more males than females.
Investigating the existence and magnitude of regional differentials in BMI and analysing the underlying determinants of such health disparities could be especially relevant for public health policy-makers, in their intent to meet the WHO target of halting the rise of obesity to its 2010 level by 2025 (Global Action Plan, WHO 2017). In addition, in contexts where the NHS is decentralized and health competences are primarily the responsibility of the country's regions as in the case of Spain, local decision-makers need to have evidence on health indicators at the regional level. Therefore, the ultimate goal of this work is to produce evidence regarding the drivers of regional disparities in BMI for the Spanish case. Indeed, highlighting the factors that contribute to differences in BMI across regions could be helpful for a better understanding of the mechanisms behind the regional gaps in productivity, wages and overall economic performance that are observed in Spain as well as in other countries (OECD, 2018).
In this paper, we decompose regional differentials in BMI between northern and southern Spanish regions. 2 First, we decompose the observed average gap into the part attributed to differences in observable determinants of BMI (i.e. the endowments) and the part that is due to differences in the return to observable characteristics, using the classical Oaxaca-Blinder (OB) decomposition. Second, as long as important differences in BMI occur away from the average, we proceed with a distributional analysis by applying the Recentered Influence Function (RIF) regression and the corresponding decomposition (Firpo et al, 2009;Fortin et al, 2011). The RIF regression enables obtaining evidence along the unconditional distribution of BMI, which is especially important for the design of effective health and food policies to combat the obesity epidemic. Indeed, policy-makers are interested in targeting policies to individuals who are (unconditionally) either underweight or obese, rather than those who appear in the two cues of the conditional distribution of BMI (i.e. whether they are obese or underweight given their characteristics). This way, we are able to observe what happens at every part of the distribution and subsequently draw conclusions for the more interesting tails: the upper one (obesity, severe obesity) 3 and the lower (underweight) where relationships might vary. The analysis is carried out separately by gender, as the underlying mechanisms that affect BMI and health outcomes in general appear to be different for women and men (OECD, 2010).
The decomposition approach has been widely used to analyse regional wage gaps. Pereira and Galego (2011) investigated regional differences in wages for Portugal using both the OB decomposition and the Juhn, Murphy and Pierce (JMP) decomposition, which allows for the analysis along the outcome's distribution (Juhn et al., 1993). In subsequent papers, the same authors applied the Machado and Mata (2005, MM) decomposition method  and the RIF-regression decomposition to analyse the drivers of regional differentials along the conditional and the unconditional distribution  respectively. For the Spanish case, Motellón et al. (2011) and López-Bazo and Motellón (2012) implemented, respectively, the JMP and the OB approaches to study distributional and average differences in hourly wages across Spanish regions.
More recently, Murillo-Huertas et al. (2020) also analysed the regional wage gap in Spain, but implementing the more powerful decomposition technique based on RIF-regressions that provides evidence along the unconditional distribution of wages and enables a detailed decomposition (besides other desirable properties explained in section 3).
Finally, the RIF-regression decomposition has been also used by Herrera-Idárraga et al. (2016), to investigate the role of informality in explaining the regional wage disparities along the unconditional distribution for the case of Colombia.
The application of decomposition techniques to explain differences in BMI is more limited. Indeed, only a few papers applied decomposition methods to investigate 3 The WHO defines obesity as a BMI ≥ 30 kg/m 2 , while severe obesity corresponds to BMI ≥ 40 kg/m 2 . geographical differences in BMI. 4 Specifically, Costa-Font et al. (2009) applied the MM decomposition to gauge the drivers of cross-country disparities in BMI at different quantiles among Mediterranean countries, while Costa-Font et al. (2010) implemented an extension of the OB decomposition to a non-linear model to analyse differences in overweight and obesity between Italy and Spain. More recently, Dutton and McLaren (2016) used the RIF-regression decomposition to examine the importance of individuallevel characteristics for explaining geographic variation in BMI distributions for the case of Canada. However, they only focused on aggregated differences, without disentangling the specific contribution of each regressor. With this paper, we are the first in applying OB and RIF-regression decomposition to analyse average and distributional regional differences in BMI for a European country. Moreover, we also add to the literature by decomposing regional differences in BMI along its unconditional distribution into the contribution of endowments and returns to each observable characteristics separately, through the detailed decomposition. This places our study a step ahead with respect to previous evidence, since our analysis enables highlighting the relevance of specific key factors for the design of interventions targeting overweight and obese individuals (such as age, education or lifestyle and food habits) in accounting for the regional disparities in BMI. Nevertheless, the reader must bear in mind that the evidence obtained from decomposition techniques is useful to understand the contribution of endowments and returns of each specific factor in "accounting" for the differences in BMI across regions, but should not be directly interpreted in terms of the underlying causal mechanisms (Fortin et al., 2011). However, the results from decomposition analysis are still relevant to draw policy implications, especially when the analysis highlights the prevalent relevance of some specific factor. This would motivate a deeper analysis of the impact of that factor using the appropriate techniques to identify the corresponding causal relationship.
Our findings indicate that the South to North gap in BMI is mostly driven by women, whereas it is lower and not statistically significant for men (0.546 points, z-stat 2.96 for females relative to 0.157 points, z-stat 0.99 for males). Around 74% of the cross-regional gap in BMI among women is accounted by differences in observable characteristics.
More specifically, women residing in the South have lower education and income levels.
The distributional analysis reveals that the South to North gap in BMI for Spanish women tends to increase over its unconditional distribution, with observable factors (especially schooling) making a growing contribution in explaining the differential across the quantiles of BMI. Overall, differences in the endowment of human capital represent the main factor behind the north-south gap in BMI and obesity in Spain.

Data and Descriptive Statistics
This paper draws on data from the 2014 wave of the Spanish version of the European Health Interview Survey (EHIS), which covers the population aged 15 or more and contains several sociodemographic and health-related variables. Moreover, the Spanish data of the EHIS survey are representative at the regional level (NUTS2), which enables examining regional disparities in BMI and their determinants. The original sample contains 22,842 observations. We keep only native Spaniards aged 18-65 at the time of the survey 5 with valid information on the relevant variables. 6 We also discard observations from the Balearic and Canary Islands, as well as the autonomous cities of Ceuta and Melilla. 7 As standard, body mass index or BMI is calculated as weight in kilograms divided by the square of height in meters (kg/m 2 ). These anthropometric measurements are based on self-reported information. Notwithstanding, we will assess the extent to which our benchmark findings are affected by the (potential) bias in BMI due to the habitual misreporting of weight and height in self-reported survey data (e.g., Boström and Diderichsen, 1997;Kuczmarski et al., 2001;Gil and Mora, 2011). 5 The results are unaffected by the inclusion of migrants in the sample and controls for being born abroad, having a foreign nationality and years since migration (quadratic). These results are available upon request. Old-age individuals are disregarded to reduce the bias arising through larger mortality among the more obese as well as the measurement error affecting self-declared weight and height (and hence BMI) which tends to rise with age (Gil and Mora, 2011). 6 The exception here is the variable family income, which is missing for a non-trivial proportion of the sample (roughly 20%). We examine the results obtained using imputation methods in the robustness checks section. 7 Ceuta and Melilla were excluded because these regions represent Spanish enclaves located in northern Africa. We also removed observations from the Balearic and Canary Islands, since individuals residing in these Spanish islands might differ with respect to mainland inhabitants along many (unobservable) dimensions that are likely to be correlated with health outcomes.
Mostly following the existing literature on BMI, we divide the conditioning factors into three main groups, namely 1) sociodemographic variables, 2) socioeconomic status (SES), and 3) lifestyle variables (see Appendix: Table A). Specifically, we consider several dummies for age cohorts, the number of children in the household and a dummy for being married for the first group of controls. For the second group we proxy socioeconomic status with years of schooling, net family income in intervals and with a dummy variable for being employed. Since both lifestyles and food habits have been identified as key obesity-risk factors in the literature, we also include indicators for sedentary behaviour at work, physical activity during leisure time, daily smoking, alcohol consumption and consumption of meat, fruits, vegetables and legumes as our last group of controls. 8 Figure 1 exhibits average BMI by ACs for the pooled sample and by gender, where it is evident that geographical differences in BMI are much more pronounced for women.
Since our aim consists in disentangling the BMI between northern and southern Spanish regions, we divided Spain into three groups. The group named "South" consists of the regions or Autonomous Communities of Andalusia, Extremadura and Murcia and the second group, named "North", comprises Asturias, Cantabria, Galicia, Navarra, the Basque Country and Rioja. The remaining continental Spanish regions are considered to form part of the centre of the country and are excluded from our empirical analysis.
Notwithstanding, the results obtained under other grouping of regions will be analysed in the robustness checks section. Table 1 shows the resulting two groups of regions, with the corresponding observations contained in the estimation sample and some basic descriptive statistics for BMI. We report a statistically significant difference of 0.36 units in mean BMI between the South (26.04 kg/m 2 ) and the North (25.68 kg/m 2 ).
[ Figure 1 and Table 1 around here] 8 Specifically, we consider the consumption of fruits, vegetables and legumes (meat) of between 4 to 6 times per week or higher intakes (less than once per week or never) as high (low) frequency of consumption. Tables 2 and 3 report the sample means of the BMI indicator and its determinants differentiating by regional group, for women and men respectively. As can be appreciated, there are substantial differences in the endowment of characteristics between the two groups of regions, which are generally statistically significant and more pronounced for women.

Descriptive statistics
Specifically, in Table 2, we document a large and significant difference in mean weight level of around 1.61 kg (average height is roughly the same) between women in the South and the North. As a result, the South to North BMI gap amounts to a significant 0.55 kg/m 2 (0.12 standard deviations apart). In terms of household composition, a higher proportion of females in the South are married compared to those living in the North.
Interestingly, the data show the existence of a large and significant difference in years of schooling, with females residing in the North having almost 1.5 extra years of schooling (11.99 vs 10.48). Similarly, noticeable differences to the detriment of females living in the South exist regarding income and working status endowments. With respect to lifestyle characteristics, women in the South are less likely to work in a sedentary job compared to their counterparts in the North and are more likely to smoke on a daily basis and drink less alcohol per week. In terms of food habits, women in the South tend to consume less red meat (26% vs 36%) and less fruit. Differences in the consumption of vegetables and legumes among women are not statistically significant between the two groups of regions. Table 3 exhibits the same descriptive statistics for males. Interestingly, we evidence the absence of any significant difference in BMI across the two areas. Less remarkable differences in endowments between the South and the North are shown as well.

Average BMI differentials between groups of regions.
The starting point of our empirical methodology is a simple OLS regression 9 that explains BMI as a function of a vector of control variables (Xi) divided into the three main groups we mentioned before, namely 1) sociodemographic variables, 2) SES, and 3) lifestyle variables. We estimate the equation separately for Southern and Northern regions, that is, where the superscripts S and N indicate that the corresponding estimates are allowed to be different for South and North, respectively. Next, with the aim of appreciating the contribution of the covariates on the observed BMI disparities between the groups of regions, we utilize the Oaxaca-Blinder (OB) decomposition (Oaxaca, 1973;Blinder, 1973). This widely used decomposition method disentangles average outcome differentials into the contribution of the (average) endowment of observable characteristics (i.e. the explained or composition component) and the contribution of unexplained factors or structure effect (which is captured by differences in the estimated coefficients). Furthermore, as suggested by Fortin (2008) and Fortin et al. (2011), we estimate the non-discriminatory reference BMI structure from a pooled regression with all the selected regions together, imposing an identification restriction that ensures that the BMI advantage of one group of regions equals the disadvantage suffered by the other group, that is: Equation (2) is estimated using the pooled sample, and contains indicators for belonging to the North or South (N = 1 if North, S = 1 if South). The estimated vector of β coefficients thus represents the non-discriminatory BMI structure that is used in the decomposition. From the estimates of equation (2) we decompose the raw BMI differentials between the groups of regions into different components as follows: The term ( ̅̅̅̅ − ̅̅̅̅ )̂ represents the composition effect (i.e. share of average BMI gap due to differences in observable characteristics), whereas the term (̂−̂) = ) corresponds to the part of the mean BMI differential that can be attributed to different coefficients or returns to observable characteristics across regions (including the intercept). 10

Distributional BMI differentials
One limitation of the OB decomposition is that it provides evidence about average BMI differences across the groups of regions, whereas by focusing only on average gaps one may miss important differences at other points of the BMI distribution (especially at the top, corresponding to obesity and severe/morbid obesity categories). Therefore, we investigate distributional BMI differences by means of the Unconditional Quantile where ( ) is the unconditional density of BMI evaluated at the τth quantile and I(·) an indicator function. By replacing the unknown elements of equation (4) by their sample estimators it is possible to obtain an estimate of the RIF, which is, where ̂(̂) corresponds to a Kernel density estimator of the unconditional density . In other words, it is possible to examine the contribution of both the endowment of observable characteristics and the returns to these characteristics, in explaining the estimated unconditional BMI gap across groups of regions, applying the decomposition for average outcomes described by equation (3) to the RIF, that is: Here ̂ corresponds to the non-discriminatory BMI structure (estimated from a pooled RIF regression) at quantile τ estimated in a similar fashion as equation (2). Similar to equation (3), the term ( ̅̅̅̅ − ̅̅̅̅ )̂ represents the composition effect and the term captures the unexplained component of BMI differential evaluated at the τ-quantile of the unconditional distribution of BMI. There are several advantages of this method. Its computational cost is minimal and it provides path independent detailed decompositions of both components.

OB decomposition results
Tables 4 and 5 present the aggregated and detailed OB decomposition results respectively, differentiating by gender. The decomposition analysis shown in Table 4 evidences that up to 74% ( Moving on to the detailed decomposition (see Table 5), we identify differentials in average years of schooling as by far the single most important contributor in explaining the greater mean BMI level for Southern women. Differences in income are also relevant factors in the explained part to the detriment of women in the South, but with a more modest contribution. In contrast, healthy (unhealthy) lifestyles such as low consumption of meat (daily smoking), as well as the number of children in the household are in favour of women living in the South (though their contribution is low). As shown in Table 5, as a whole, the unexplained part or returns to certain characteristics, which accounts for 26% of the total gap for females, is not statistically significant. The OB decomposition analysis suggests that the average BMI differential across regions for males is small (0.157 BMI points) and insignificant. The contribution of explained factors is also insignificant for males, since the more advantaged endowment of SES variables for those residing in the north of the country tends to compensate with their unfavourable distribution of sociodemographic and lifestyle variables (relative to men residing in southern regions).
In what follows, we move a step ahead from the simple decomposition of average differentials and by means of RIF-regressions we disentangle the factors behind the North-South gap for males and females over the entire unconditional distribution of BMI.
[Tables 4 and 5 around here] Figure 2 presents the aggregated RIF decomposition results separately for women and men at the different deciles of the unconditional distribution of BMI. Since we obtained no evidence of significant regional gaps at any point of the BMI distribution for men, we

RIF decomposition results
show the tables of the aggregated and the detailed RIF-decomposition results only for women ( Table 6 and Table C of the Appendix 11 respectively). As shown in Table 6, BMI differences in women between the two sets of regions appear to be quite stable from Q2 to Q8 (except Q7) since the increasing contribution of the explained part tends to be compensated with the decreasing contribution of unexplained factors. Interestingly, the data also reveal that the explained (unexplained) portion of the gap steadily increases (decreases) over the quantiles, revealing that what really matters to deal with the obesity epidemic among overweight women is to focus the attention on regional disparities in endowments. Note that the contribution of differences in observable characteristics is always statistically significant and reaches its highest values at the 8 th and 9 th deciles that correspond to high levels of overweight or pre-obesity statuses among women. [ Figure 2 and Table 6 around here]

Robustness checks
In this section we consider the robustness of the previous findings with respect to three main issues that might affect our estimations, namely 1) the presence of missing values in the family income variable, 2) the potential bias in BMI due to the self-reported nature of the variables height and weight in the EHIS survey and 3) the grouping of regions that we adopted in this work.
Regarding the first issue, so far we considered that the relatively high proportion (around 20%) of missing values in net family income is at random, and the corresponding observations were excluded. In order to deal with the potential selectivity bias due to the non-randomness of non-reporting in the household income variable, which is reported in intervals, we repeat the estimation including all the observations, plus an additional dummy variable for missing family income. Moreover, we also replace missing values in income categories with predicted values obtained from an ordered probit model based on demographics, SES status and other information of the head of the household and spouse (Allison, 2001). 12 As it can be appreciated in columns (1) and (2) of Table E in the Appendix, the overall results from the OB decomposition is mostly unaffected by 12 We also ran the OLS regressions for BMI (by groups of regions and gender) after replacing missing values of family income using Multiple Imputation techniques based on either ordinal models or interval regression (given the nature of the family income variable). The results, available upon request, are virtually the same than those shown in Table B of the Appendix, in terms of both the magnitude and significance of the dummies for family income.
imputing missing values of family income using the two selected techniques. We only observe a small decrease in the contribution of endowments for females.
Second, we adjust self-declared weight, height and the subsequent computation of BMI to deal with the misreporting of such information by adopting the procedure proposed by Gil and Mora (2011). Also under this alternative scenario, the results from the decomposition of average BMI differentials is virtually unaffected, as shown in column (3) of Table E.
Third, we analyse the results obtained under alternative groupings of Spain's regions. In column (4) of Table E we adopt the Eurostat NUTS-2 classification rather than our adhoc classification. This implies adding the region of Aragon in the group of northern regions and Extremadura is excluded from the group of southern regions. Once again, the overall results from the OB decomposition are unaffected by the alternative grouping of regions. Overall, the sensitivity checks presented in this section points to the robustness of our results.

Conclusion
This paper investigates the conditioning factors behind the North-South BMI divide in Spain. We use decomposition analysis that enables us to disentangle the contribution of each covariate and the corresponding coefficients to this difference. Starting with the OB decomposition, we reveal that the average gap in BMI between the South and North of Spain is mostly driven by differences in characteristics between women residing in the two areas of the country. A large and significant part of this regional average gap in BMI (74%) is due to differences in endowments related to SES status (basically years of education), whereas differences in returns to such characteristics play a minor and insignificant role in accounting for the observed BMI differential. Indeed, in view of the epidemic of obesity as a global public health concern, policy-makers are mostly interested in designing effective policies against the overweight and obese. Hence, we proceed with the distributional analysis and the corresponding decomposition, since the findings at the upper tail of the BMI distribution are the ones actually capturing overweight and obesity problems. Interestingly, we evidence that differences in SES endowments and particularly schooling, in detriment of women residing in the South, explain again a very significant part of the North to South differential at the top of the BMI distribution.
Moreover, the relative weight of the importance of observable socioeconomic factors increases along the quintiles, mostly driven by schooling. This suggests that difference in education attainments contributes substantially in explaining the higher prevalence of obesity among women in the South. Notice these findings prove to be quite robust to alternative scenarios dealing with missing information, BMI bias and regional grouping.
Therefore, we show that a significant part of the cross-regional BMI gap can be mitigated by implementing regional policies focused on improving human capital. One alternative could be rising the amount of compulsory schooling, since this exogenous increase of the amount of human capital has been found to be an effective policy to reduce BMI (Brunello et al., 2013). Moreover, the increase in compulsory education might reduce BMI and obesity indirectly, due to its effect on health-related behaviours and habits (Brunello et al. 2016). However, Spanish regional authorities do not have the competences to change the years of compulsory schooling autonomously, since these kind of policies are determined at the state level. Therefore, regional policy-makers should design policies aimed at reducing school dropout and improving education quality, also through the introduction/improvement of health education programs during the first stages of the education process. Moreover, given that the education gradient in obesity seems to be much stronger in women than in men (as in Devaux et al., 2011), efforts aimed at increasing the endowment of schooling in the South would be especially beneficial in mitigating differences in overweight and obesity between the two groups of regions. Such policy interventions would additionally reduce differences in obesity-related diseases and improve health in general, inasmuch obesity constitutes a key risk factor for many chronic conditions and health complications. However, it must also be stressed that even equalizing the schooling endowments across the two groups of regions, there would still be a certain differential in BMI that penalizes southern Spanish regions in terms of the prevalence of overweight and obesity problems.
It is worth mentioning that the decomposition framework applied in this paper is not aimed at estimating causal relationships, but rather at disentangling the contribution of observable characteristics and their corresponding coefficients to regional differentials in BMI in an "accounting" sense (Fortin et al., 2011). Therefore, the estimates regarding years of education (as well as other covariates) should not be interpreted as causal evidence, since we are not able to address the potential endogeneity of schooling in the BMI regressions with the available data. However, the results are strong and robust enough not only in order to motivate additional research, based on methods that enable causal inference, but also to derive policy recommendations.
Indeed, the evidence reported in this paper is in line with the results from existing related research, suggesting that regional inequalities in education are responsible for regional health inequalities (Safaei, 2014;Ergin and Kunst, 2015). Indeed, Ballas et al. (2012) report that inequalities in education between regions observed in several EU countries tend to reinforce inequalities between income, wealth, social status and health, contributing to persistent inter-regional disparities. How educational inequalities translate into income, employment and health disparities through a complex set of mechanisms is a research question beyond the aims of this work. Albeit in this specific paper we do not provide causal evidence, the results exhibit a very strong conditional correlation between education and BMI, being the endowment of the former variable responsible for a substantial share of the gap in BMI between individuals (especially women) residing in different Spanish regions. Therefore, investigating the causal effect of education in mitigating regional disparities in BMI, overweight and obesity and consequently other health related variables should be the subject of future research.