Industry location in Mexico. Isidro Soloaga, Irvin Rosas and Nataly Hernández Draft 20-10-2018
Introduction DO THIS AT THE END
Literature review on industry location Patterns of industry location are important to understand development processes. Industry location decisions generate at least three types of interactions with long run implications. The first type of interaction is known in the literature as Jacobs externality. The idea is that competition in a given local market
Location models Industry location models can answer two types of questions. First, they are useful to identify the factors that can explain a firm’s location in a given territory. Second, they can be used to identify the characteristics that make territories attractive to firms. There are two types of empirical models use in the literature: discrete choice models (DCM) and counting data models (CDM)/ When one has data at the firm level, DCM models can be used to model the decisions of firms regarding location in territories. When one only has aggregate data, CDM are appropriate. DCM can identify factors at the firm level, while DCM can only capture territorial factors. For estimation purposes, the advantage of CDM is that they exploit information from the data even in those territories where no location occurs while estimating DCM exploits only data from those territories where location happened. The type of data often conditions the level of detail one can achieve in these studies. In our paper, we us aggregate data at the SUN (EXPLAIN SUN) and municipality characteristics aggregated at the SUN level of the territories. Thus, our analysis focuses on the characteristics of the territory that makes location likely to occur.
We can think of a territory-based model with a CDM in terms of Jofre-Monseny et al. (2011) where there is a supply of entrepreneurs wanting to locate their firms within a set of territories according to a set of characteristics varying across territories. There is also a set of characteristics determining the demand for products depending on the characteristics of territories but that do not determine the decisions of entrepreneurs. With CDM we can estimate the equilibrium as a reduced-form equation of the number of established firms as a function of the observed characteristics that determine supply and demand. Given the nature of our data, a first type of econometric model one can use to estimate a CCDM is a Poisson model, since a linear model would be inappropriate given the counting nature of the data. Nevertheless, the limitation of using a Poisson model is that it is often the case that the counting of location decisions has much more zeros than what a theoretical Poisson distribution would suggest. A possible solution is the use of a negative binomial or Poisson-Gamma model. These kinds of models are used in the literature of FDI and international trade. Arauzo (2005) estimates a Poisson model in which the unit of analysis are the Catalunya municipalities. In the aggregate analysis, the model’ dependent variable is the number of times a firm establishes in a municipality between 1987 and 1996. The authors also perform the analysis dividing by the data by type of firm, given the data availability at the firm level. Davis and Schulter (2005) use as dependent variable the number of new firms established in each US county between 1991 and 1997. Their procedure to compute the variable is as follows. For each year, one counts the difference between the existing firms and the existing firms in the previous year. If the number of firms at t is smaller than the number at t-1, the number of new firms is 0. One repeats this calculation over the entire period of analysis and sums over the total number of periods. These authors estimate a negative binomial model with the characteristics of the territories circa 1991. Similarly, Lamber et al. (2006) study the location of manufacture firms at the county level. Their dependent variable is the number of new firms in each county between 2000 and 2004. They estimate linear and Poisson models. Testing for overdispersion, they reject the null hypothesis of Poisson distribution. They estimate a Poisson model where they rescale the covariance matrix using the sum of squares of the chi-square Pearson residuals normalized by the degrees of freedom. CDM are not very appropriate when data has overdispersion or excess of zeros. In the studies of location, there is overdispersion when using data of heterogenous geographic areas, as is the case of counties and municipalities. Negative binomial models account for the overdispersion but not for the excess of zeros (Liviano, 2013). Other types of models that account for the excess of zeros are the zero-inflated models or hurdle models.
An empirical model of location Following Arauzo (2005) and Davis and Schluter (2005), we study the role of territorial characteristics and local policies on the number of newly established firms in each SUN between 2009 and 2014. We characterize the set of territorial characteristics at the sun level in the vector
X. The characteristics X are measured circa 2005, so we can think of them as exogenous for the changes in employment for the 2010-2015 period. The variables included in X are detailly defined in Appendix A1. These characteristics include indicators of human capital quality (maternal mortality and ELANCE exam average school-level score), an index of the quality of public services, the population density, the share of indigenous population, the average turnout, the distance to the nearest port and the nearest airport, and indexes of specialization, (non-diversity, and competition). The empirical model we estimate is the following: ′ 𝑦𝑖,2010−2015 = 𝑓(𝑋𝑖,2005 𝛽) + 𝜀𝑖
Where the dependent variable 𝑦𝑖,2010−2015 is the number of new jobs between 2010 and 2015 in a SUN-manufacture sector 𝑖. To control for differences in the initial levels of employment, we estimate our models including the levels of employment and the average remuneration per worker in 2005 in vector X. We also control for the potential size of the market each SUN faces. For this purpose, we follow the spread and backwash literature by controlling for different characteristics of the nearby SUNs (see Ganning, Baylis, and Lee, 2013 and Berdegue and Soloaga, 2018). A first strategy we follow is to identify, for each SUN, the nearest SUN and classify it according to its population in one of the following categories: 1) less than 49,999 inhabitants, 2) 50,000 to 249,000 inhabitants, 3) 250,000 to 349,999 inhabitants, 4) 350,000 to 499,999 inhabitants, 5) 500,000 to 999,999 inhabitants, 6) 1,000,000 to 4,999,999 inhabitants, and 7) over 5,000,000.1 We use ArcGIS to calculate the cost in minutes from each SUN to any other SUN using the public available data on roads circa 2005. Then, we control for the travel time to the nearest SUN and the interaction of this cost with the SUN size type. The limitation of this approach is that a SUN that is one kilometer farther than the nearest SUN and with a much bigger size surely has a bigger impact than the nearest SUN but this is completely ignored. A second strategy to control for potential market effects is constructing a “synthetic sun”, as follows. For each SUN, we identify the set of those other SUNs located in a 300 minutes radius plus the Mexico City SUN. We then construct row-standardized weights proportional to the inverse of the distance to each SUN. Using these weights, we calculate the cost to the synthetic SUN and classify each synthetic SUN according to the population criteria described before. We then include in X the distance to the synthetic SUN and its interaction with a dummy for the synthetic SUN size category. The purpose of this strategy is to account for the market effects of a larger set of SUNs. A third strategy seeks to provide robustness to the second one by directly controlling for the travel time and population of the three nearest SUNs
1
This classification follows with slight modifications the categories used by RIMISP. Following this classification, only one SUN, that corresponding to Mexico City, appears in category 7.
A fourth strategy is aimed to include the characteristics of all different types of SUNs. We then control for the travel time and the population of the nearest SUN of each type. For the Mexico City SUN we only control for the travel time since the population is the same for all other SUNs.
Ordered probit for new employment categories So far, we have defined our dependent variable as the number of “new” jobs. That is, in the case of a decrease in jobs, the number of new jobs is coded as zero in a given SUN-sector. To incorporate in the analysis situations with a decrease in employment (negative new jobs), we use an ordered probit model. In this case, the dependent variable is defined as a categorical variable according to the number of created jobs, using the INGEI (2011) definition of firm size. The first category includes SUN-sector pairs where there was a decrease in the number of jobs or no change, category 2 includes SUN-sector pairs in which there was a generation of 1 up to 50 jobs, category 3 includes the generation of 51 up to 100, while category 4 includes the generation of more than 100 jobs.
Technological intensity of industries In our aggregate models, we use data from 18 manufacture sectors. Nevertheless, it is likely that factors affecting the location across territories have differentiated effects across types of manufactures. For example, those sectors with a higher technological intensity might be attracted by a highly educated labor force, different to the way education affects the location of low-intensity manufactures. To analyze these possible differential effects, we estimate the negative binomial model interacted with indicator variables that classify sectors according to their technological intensity. Table 1 presents a classification of sectors based on Pereira and Sologa (2012), where we have split the agri-food and tobacco industry from the rest of low-intensity manufactures.
[Table 1. Classification of industries by technological intensity] Very low Agri-food, beverages and tobacco
Low Medium-low Textile Coal and petroleum Clothes products Leather and shoes Plastic y rubber Wood Non-metallic minerals Paper Basic metals and Printing metallic products Furniture and mattresses Other industries Source: adapted from Pereira and Soloaga, 2012
Medium Machinery and equipment Transportation equipment Computer equipment Electric goods Chemistry
SUN size We also expect to see differences in the effect of different SUN characteristics on the amount of created employment by different sizes of SUN. We group SUNs into three broad categories using the RIMISP classification described earlier.
Results In Table 2, panel A, we present descriptive statistics of the variables used in this paper. Recall that our dependent variable is defined as the number of new jobs from 2010 to 2015. Thus, if in a given SUN-sector the number of jobs or plants in 2015 was smaller than in 2010, then the number of new jobs in such a territory-sector pair is zero. For this reason, the dependent variables in our analysis show a large proportion of zeros, which conditions the types of models we can use. Figure 1 shows the density of the dependent variable, where the share of observations with zero new jobs or very few new jobs more than half of the total number of observations. More descriptive statistics on the main dependent variable is presented in the panel B of Table 2 [Table 2. Descriptive statistics] [Figure 1. Two panels with kernel densityof dependent variable]
.01 0
.005
Density
.015
.02
Kernel density estimate
0
200
400
600
800
difW Note: sample restricted to SUN-industry pairs with less than 1,000 new jobs
Model choice
1000
We start by estimating a linear model to detect simple correlations in column 1 of Table 3. Child mortality exhibits a negative correlation with an increasing rate. The population density is also negatively correlated to employment growth, which indicates a tendency of jobs generation outside the big cities. The travel minutes to the nearest airport is negatively correlated to the generation of employment as one would expect. The controls for the initial level of employment and remunerations and their corresponding quadratic. Although informative, these correlations are hard to interpret and the coefficients likely to be consistently estimated since, as explained before, given the nature of our dependent variable a large fraction of zeros is present. Given the counting nature of the data, a better model to the linear one might be a Poisson model. In Table 2 we also include the results of the Poisson version of the same three specifications estimated before. [Table 3. Linear and Poisson models]
Form the descriptive statistics in Table 2, the variance of the dependent variable does not correspond to that from a theoretical Poisson distribution. Furthermore, in our results in Table 3 for the estimated Poisson models, we also report a test of the goodness of fit. The extreme significance in this test indicates that the Poisson model is inappropriate. An additional piece of evidence against the Poisson model is the significance at the 99% of almost all the coefficients in the regression, contrasting to the correlations we found in the liner models. For these reasons, a negative binomial, which accounts for the overdispersion of data, is preferred. These kinds of models have been used in other location studies, and in studies on trade and FDI. We perform a likelihood ratio test that rejects the null that our data are distributed as Poisson and conclude that, conditional on the data being Poisson, the likelihood of observing these data is almost zero. Thus, we prefer the use of the negative binomial model for interpreting the results. An extension to this negative binomial model is a generalized negative binomial model that parameterizes dispersion with observed characteristics. In the negative binomial model, dispersion is assumed constant across observations while in the generalized model, dispersion depends on characteristics of the SUN-sector observations. We estimated a zero inflated generalized negative binomial model using observed characteristics and the conclusions we get are very similar to those under the negative binomial. Since the distribution underlying the negative binomial is much simpler and the results easier to interpret, we prefer the negative binomial model as the main empirical tool in our analysis.
Main results Our baseline specifications use negative binomial models with the dependent variable being the count of new jobs in the 2010 – 2015 period. Table 4 presents the main results of our analysis. Columns 1 through 5 present different specifications where we account in diverse ways for the market effects of nearby SUNs. Column 6 presents the results of an ordered probit version of the model in column 1. All models except that in Column 4 show robust results.
Results by technological intensity of industries In columns 1 through 4 of Table 5 we estimate the model in column 1 of Table 4 by each of the industry types according to their technological intensity as in Table 1. In column 5 we estimate together observations in categories 1 and 2 (very low and low technological intensity).
[Table 5. Negative binomial by technological intensity]
Results by SUN size In Table 6 we present the results of our analysis by SUN size. We used an aggregated version of RIMISP categories to create three groups of SUNs.
[Table 6. Negative binomial by SUN size]
References Arauzo Carod, J.M., (2005). Determinants of industrial location: An application for Catalan municipalities. Papers in Regional Science, 84(1), pp.105-120. Berdegué, J. A., and Soloaga, I. (2018). Small and medium cities and development of Mexican rural areas. World Development, 107, 277-288. INEGI. (2011) Micro, pequeña, mediana y gran empresa. Estratificación de los establecimientos. Censos económicos. Davis, D.E. and Schluter, G.E., 2005. Labor-force heterogeneity as a source of agglomeration economies in an empirical analysis of county-level determinants of food plant entry. Journal of Agricultural and Resource Economics, pp.480-501. Ganning, J. P., Baylis, K., and Lee, B. (2013). Spread and backwash effects for nonmetropolitan communities in the US. Journal of Regional Science, 53(3), 464-480. Pereira M, and I. Soloaga. (2012). Determinantes del crecimiento regional por sector de la industria manufacturera en México, 1988-2008, Documento de trabajo V-2012, El Colegio de México. Jofre-Monseny, J., Marín-López, R. and Viladecans-Marsal, E., 2011. The mechanisms of agglomeration: Evidence from the effect of inter-industry relations on the location of new firms. Journal of Urban Economics, 70(2-3), pp.61-74. Lambert, D.M., McNamara, K.T. and Garrett, M.I., 2006. An application of spatial Poisson models to manufacturing investment location analysis. Journal of Agricultural and Applied Economics, 38(1), pp.105-121.
Liviano, D. and Arauzo-Carod, J.M., 2013. Industrial location and interpretation of zero counts. The Annals of Regional Science, 50(2), pp.515-534.
Figures and Tables [See in Excel]