GEOG 6000 Advanced Geographical Data Analysis

Analyzing Social, Economic and Political Factors that Impact Environmental Sustainability through Principal Component Analysis

Introduction

This year WalletHub, a financial statistics website, produced a metric for determining the most environmentally friendly and sustainable states in the US. (Kiernan, 2018)  The article used three areas of impact to determine a score for a state's’ overall performance, and ranking among other states, in “green” or environmentally friendly behavior.  The three areas were environmental quality, for factors such as air, water and soil quality, eco-friendly behavior, which accounted for green energy, recycling and water usage, and climate change contributions, measuring output of greenhouse gases.  When the results of this analysis were used to create a map of the US with states ranked visibly by level of “greenness”, it was obvious there was a spatial pattern in the state’s relative scores. Areas such as New England, the Pacific West and the Great Lakes region had similar and higher rankings relative to other states.  The Deep South, the Grain belt (Indiana, Illinois, Iowa, Kansas, Nebraska) and the Southwest states had low rankings within a similar range. This obvious geographic distribution indicates other potential underlying factors that influence a state’s approach to sustainability.    

Figure 1. Map of US created with “Green” ratings from WalletHub

Figure 1. Map of US created with “Green” ratings from WalletHub

 There are many different conditions that influence environmentally sustainable practices. In this study we will examine several social, economic and political indicators and their relationship to the results of the WalletHub study.  The main method of analysis will be a type of multivariate analysis known as principal component analysis. Principal component analysis (PCA) has proved beneficial in revealing variables responsible for geographic phenomenon in previous studies (Petrisor et al., 2012).   Petrisor et al. detail the results of using PCA in combination with GIS in four case studies in Romania to understand the underlying variables that influence development. In each case they found that often the development could not be explain with a single category or variables (education, economics, demographic etc), but rather was best modeled by variables from different categories.  PCA has also been used to identify features of communities that are aware and supportive of green energy (Bhowmik et al. 2018). The study found certain commonalities, such as education, income, political involvement, in communities that support the use of green energy. These commonalities were subsequently identified and recommended for development in regions that are not as involved or supportive of green energy (Bhowmik et al. 2018)

Social and economic conditions have a documented role in environmental sustainability. (Lange et al. 20009, Pebley 1998, Rudra et al. 2018, Orenstein 2011, Merkel 1998, Bhowmik et al. 2018) The argument behind this is rather obvious, when individuals have a good living standard, education and access to basic necessities there is time and opportunity to focus on environmental concerns.  Among the most impactful social factors are education levels, and access to healthcare or federal assistance programs for lower income groups. (Bhowmik et al. 2018, Lange et al. 2009) Economic factors with significant bearing on environmental behavior are household income, the political entity’s GDP. (Lange et al. 2009, Pebley 1998) Wealth disparity can be a tremendous factor in waste creation and huge jumps in energy consumption.  In developing areas, new wealth tends to mimic consumption habits of Western nations. This can create large amounts of energy and resource consumption in regions without refined practices for reducing pollutants and handle waste responsibly (Lange et al. 2009)

Political conditions have an impact on environmental practices as well.  Rudra et. al. discusses the negative impact political heterogeneity can have on a political entity’s environmental policy.  Without cohesive political leadership or support, enforcement of regulations can be unreliable. (Rudra et al. 2018) Most entities face issues of variant and often divergent political groups, which can make unified legislation difficult (Orenstein et al. 2011)  

Methods

The green score is the result of a study conducted by WalletHub, a financial advisement platform.  The scores are potentially from 0 - 100, with the higher score resulting in a lower number rank. In the data, the lower the rank the better the state did in the scoring matrix. (Kiernan, 2018)  For the PCA I used the overall score out of 100, so factors that positively correlate to the score will be an indicator of improved environmental practices.

Social variables represent the quantity of educational facilities in the state, percentage of high school graduates, and percentage of college graduates with a bachelor’s degree.  Also in the social category is the percent of the unemployed labor force (temporary unemployed) that collect public healthcare. The health care percentage does not account for privately insured unemployed, which is approximately double the publicly insured percentage, or those who do not collect insurance through either private or public providers.  This statistic also does not account for long term unemployed citizens who may or may not by insured by a public healthcare program.

Economic variables represent several different metrics.  Orenstein et al. discuss the importance of diversified industry in the economic, and conversely environmental, health of a region.  In order to create a statistic to reflect this and index was created containing the gap between the industry with the largest number of employees and the the industry with the lowest number of employees.  Average median income and unemployment rates are meant to test the overall prosperity of individuals in each state, while the averaged state GDP will illustrate the economic prosperity of the state over the last ten years.   

 
Figure 2. Variables and descriptions

Figure 2. Variables and descriptions

 

Political homogeneity was created by applying a uniformity percentage to the political offices of the Governor and two Senators from each state.  As these are elected officials the relative homogeneity of the state should be reflected in their parties. The FRASE Index is a weighted score applied to each state and represents the relative economic impact federal legislation has on the state.  Much of the source of  greenhouse gases and other pollutants is derived by industry and the private sector. The FRASE index represents the ability of the federal government to regulate industry in each region. (Quantgov)  The final variable is a percentage of voter aged population who participated in the 2016 general election in minimally casting a vote for the highest office on the ballot.  

The variables used were collected from public data sources, see Figure 2. Each variable was read into a single file to facilitate converting to a matrix in R.  The variables were not originally in the same scale, so once the data was read in the variables were scaled.  For the principal component analysis, the prcomp() function was employed, as it has been documented this approach provides more accurate results (STHDA).  

Results

Initial output from the PCA indicates that the majority of the variance within the model is explained by the first through the sixth axis.  Five percent is not a great increase, but the threshold for explanation in this model was 85% (see Figure 3), so I have chosen to analyze the six principal components (PC).  Review of the coefficients for each PC indicates some expected correlations. The first axis shows a positive correlation between green scores and higher levels of education, higher percentage voter turnout  and higher median income.  It appears the first axis is capturing the positive influence of affluence in a developed nation of environmental practices.(Pebley, 1998)  This axis also conveys a negative correlation between the green score and the FRASE index and higher unemployment rates. The negative impact from unemployment is anticipated, as unemployment rates increase it is an indicator of other issues in the state and may make environmental policy less of a priority. The FRASE index seems to indicate the less government regulation is linked to higher green scores, however it is not entirely clear the impact of that variable.

Figure 3. Proportion of model explained by each principal component

Figure 3. Proportion of model explained by each principal component

Figure 4. Variables plot from PCA

Figure 4. Variables plot from PCA

The second axis is covering the number of universities and colleges and the state GDP.  These variables both have slight negative relationships to the green score which is an interesting revelation.  The four largest GDPs are California, Texas, Florida and New York. This can be seen plainly in Figure 5, the four states are outliers on this axis.  New York and California both have relatively high greenscores but these are also the four most populous states and Florida and Texas both ranked relatively low in environmental quality from the WalletHub assessment.  The number of universities in a state having a negative correlation seems contrary to the strong positive correlation between green score and education rates.    

The third, fourth and fifth axis do not seem to represent any related trend in the variables.  That the political homogeneity would have a negative correlation with the variation in industry does not seem like a legitimate relationship.  Other relationships are already expressed in their impact on the green score, so the fact that as unemployment escalates voter turnout decreases, while a logical conclusion, is not pertinent to this analysis. 

Figure 5. Plot of individual states from PCA

Figure 5. Plot of individual states from PCA

Conclusions

Environmental practices are a result of the natural environment, but also how people interact with their environment.  The social, economic and political variables chosen for this analysis illustrate this complex relationship between the community and the environment.  Sustainable practices are not simply a result of legislation and regulation, but of the overall well being of the population. The correlation between public access to health care, education, income, involvement in politics and a diverse economy communicates what Lange et al. discussed; when a society reaches a certain level of affluence, and is no longer struggling to meet with daily necessities, there is more potential and resources to devote to creating a sustainable relationship with the environment.  

While most of the states in the WalletHub report had scores within a five to ten point range, the lowest scoring states and highest scoring states had a significant point difference from the mean.  For the states that ranked on the low end, West Virginia and Kentucky, it may be beneficial to understand that there is a correlation between social and economic well being of the population and their impact on the environment.  Improving one could lead to improvements in environmental conditions.


For more information, read the full report here.

MSGIS Program Skills

  • GIS Analysis

  • Spatial Data and Algorithms

  • GIS Workflow

  • Spatial Analysis

  • Data Models and Structures

  • Communication Skills

  • Basic Programming or Scripting